Google Cloud Operations suite updates the cloud provider's Stackdriver tool with new features and upgrades, while retaining the core functions of monitoring and log analysis for cloud instances.
Google acquired Stackdriver in 2014. Then, Stackdriver became Google Cloud Operations suite in 2020. It provides two services -- Cloud Monitoring and Cloud Logging -- and their corresponding agents.
The Google Cloud Operations suite provides an in-depth view of system metrics and application logs. Cloud Monitoring gleans system-level metrics such as CPU, memory and disk space, while Cloud Logging captures log data from applications such as the web server Nginx, displayed within the console.
Learn what the two services of Cloud Operations accomplish, then understand how they work by following along with the tutorial. It covers setup as well as how to make queries and create dashboards.
The Cloud Monitoring service in Google Cloud Operations collects system metrics and log data points, aggregates them and visualizes the information on a dashboard. It also has alert features, so cloud admins can respond to issues rapidly.
Monitoring system health is a critical part of maintaining service-level agreements for cloud deployments.
Capturing application logs in a centralized location eases log management and provides a view of the application's condition. On a single VM, log monitoring is a trivial task, easily accomplished by connecting to the machine and inspecting application or system logs. However, in a distributed environment, or a deployment of numerous VMs working together, log monitoring becomes daunting. Operations Suite aggregates all of these logs into a single location using an agent.
Cloud Logging enables a cloud admin to track down systems where an issue is occurring, even across numerous VMs. Log data can be filtered by using a query language. Admins can fine-tune filters to separate noise from actual issues that require action.
Cloud Operations tutorial
In the following example, the infrastructure consists of an Nginx web server running on a VM. Cloud Operations relies on the two agents to generate a picture of the system and application that lives on it.
- The Cloud Monitoring agent provides a view of the memory on the VM. The agent is based on collectd and interacts with the Google API to report memory. Here's how to install it.
- The Cloud Logging agent picks up information on the status of an application from application and system logs. The log agent is based on fluentd. Here's how to install it.
With both agents installed, move on to the Nginx web server. This tutorial uses Nginx on a Debian 9 Linux VM. To install it, execute the following command:
sudo apt install Nginx –y
With the setup complete, the agents begin sending data to the Google Cloud Console. Now, use the following tips and examples to gain useable insights from all the data available.
Depending on how much data is sent to this management console, it can be a challenge to filter through the noise. Google's Monitoring Query Language allows an operator to filter noise and track issues as data is reported within the console. Instead of looking through a stack of data searching for a specific metric, you can pull all of the errors or relevant data points to get a glimpse of the infrastructure's status.
The query below will display the amount of memory being used by a VM. This example uses the fetch selection operation to specify that we're retrieving the information, then one of the metrics available from Google through Cloud Monitoring:
fetch gce_instance | metric 'compute.googleapis.com/instance/memory/balloon/ram_used' | group_by 1m, [value_ram_used_mean: mean(value.ram_used)] | every 1m
Some metrics, such as disk space and CPU, are available in the console without requiring the agent. However, the Cloud Monitoring agent is required to retrieve the status of memory.
Google Cloud offers predefined as well as custom dashboards for data collected by Cloud Operations suite. Users can build a custom dashboard for a query so that the information is readily available.
Dashboards provide a focused view of key data points within the cloud deployment. However, it can take time to create a dashboard. To set one up, you must create the various widgets -- visualizations of information collected by the agents -- and verify that the data displayed is accurate. Automation can ease the burden of creating and updating dashboards.
Google Cloud lets users interact with the dashboard API. To do so, submit POST requests to an HTTP endpoint or the gcloud command-line interface, which is the primary CLI tool for Google Cloud. Both options enable a Google Cloud user to create dashboards programmatically.
To create a dashboard using the API, execute the following command:
gcloud monitoring dashboards create --config-from-file=your-dash.json
The file your-dash.json defines what widgets will be created within your dashboard, formatted with JSON.
To get familiar with Cloud Logging, set up monitoring for the Nginx access and error logs.
Add the configuration below into /etc/google-fluentd/google-fluentd.conf on the VM. This setup ensures that access.log and error.log entries go to the Google Cloud Logging console.
<source> @type tail format apache2 path /var/log/Nginx/access.log,/var/log/Nginx/error.log pos_file /var/lib/google-fluentd/pos/Nginx-access-log.pos read_from_head true tag Nginx-access-error </source>
After adding the configuration entry, restart the google-fluentd agent. After approximately two to five minutes, the Cloud Logging agent will begin to send data to the Google Cloud Console.
You can query the data from Logs Explorer, Cloud Logging's user interface for analysis. Use this query to display any log data from the Nginx access and error logs: