Monitoring

Monitoring assists with system performance management, regulatory compliance, and billing analyses.

Stackdriver

Stackdriver is one of those buy-ups that the big G has begun. Along with Quiklabs, they have grabbed a very useful tool. Folk who are waaay past the need for GCP introduction notes may be using Stackdriver to monitor their AWS entities and their GCP entities.

Stackdriver is a tool for:
  • monitoring
  • logging
  • debugging

Stackdriver coordinates collection of performance metrics, event logs, and event data from multiple sources including GCP resources.

Example Predefined Metrics

Service Metric Monitored
VM CPU % utilised
BigQuery Time for executions
Functions Execution count

Using Stackdriver

To monitor a VM, Stackdriver must have an agent installed on that VM. If you have a Linux machine, then using curl from bash you can call and install the agent via the SSH connection to that VM:

curl -sSO https://dl.google.com/cloudagents/add-monitoring-agent-repo.sh
sudo bash add-monitoring-agent-repo.sh && \
sudo apt-get update

You will have to select a version number and install the latest as per the GCP’s directions.

The agent needs a Stackdriver workspace to send data to. Stackdriver can be setup via the GCP console. Each project can have a space or projects may be added to another project’s space.

Stackdriver can be set up to email reports on a cycle or to send alterts. These setting are held in a policy. The policy configuration can use labels given to VMs, or general properties such as zone, region, project ID. If no labels were applied to VMs, instance ID can be applied to pick up individual VMs.

The Stackdriver report is highly customisable:

  • Aligning, grouping data, e.g. data every 20s can be reported as av. per min
  • Reducing, consolidating data into max, min, mean, standard deviation

Alerts can be set for these reports, e.g. CPU usage >85% for 2 minutes. Alert channels include:

  • email
  • Slack / HipChat
  • GCP console
  • PagerDuty
  • Campfire

Custom Metrics

If none of the predefined variables measures what you need, then create custom metrics.

There are 2 APIs that GCP recommends:
  • Stackdriver’s own
  • OpenCensus

A custom metric must be programmed to call the monitoring API in a language such as Python, etc.

Configuring Log Sinks

The default data-retention period in Stackdriver is 30 days. To assist with long-term storage or advanced analytics, Cloud Storage (with lifecycle management) and BigQuery are useful tools.

Stackdriver> Exports form assists with exporting data to a “sink”. You need to configure the sink:

  • name
  • service (BigQuery, Storage/Bucket, Pub/Sub, custom)
  • destination

When setting up a GCP service, the form with allow export to an existing location or create a new service item.

Cloud Trace

Cloud Trace is a service that collects latency data from a GCP app. This assists with performance monitoring, e.g. to identify bottle-necks.

Traces are only generated when applications are programmed to call Cloud Trace. As with Stackdriver, the reports are customisable.

Cloud Debugger

Cloud Debug assists developers to inspect the state of a running instance. It can take snapshots of the state of an app in App Engine and can also be enabled on CE and GKE.

Using Debugger

The GCP console form provides a drop-down of deployments such as an App Engine app. From here the code can be examined line by line, and any line of interest can be clicked to create a snapshot of the instance when that line of code executes. Alternatively a log point may be placed at that line.