Monitoring operational metrics is essential to guarantee reliability, availability and performance within cloud services.
Amazon Aurora is a database engine in AWS' Relational Database Service (RDS). It supports compatibility with open source databases MySQL and PostgreSQL. A key feature is its optimization for cloud-based deployments. AWS boasts that Aurora delivers performance up to five times faster than other RDS engines. When evaluating monitoring capabilities, as well as performance and availability features, the RDS Aurora engine is recommended for MySQL and PostgreSQL deployments.
AWS users provision and group Aurora databases within a cluster. A cluster consists of a primary node and optional read replicas, as well as an optional standby of a primary node in another availability zone. Aurora also supports multiregion clusters, where AWS replicates live data to one or more of its regions.
To run an optimal Aurora deployment, IT organizations should identify key metrics for capacity and operations. Through native AWS tools, Amazon CloudWatch and DevOps Guru, application owners can configure the database setup right from the start and then track health metrics in production.
Monitor with CloudWatch
Effective monitoring is a critical cloud operational requirement to ensure appropriate availability, performance, cost and security. The Amazon CloudWatch monitoring service offers a range of built-in metrics for RDS Aurora deployments. As of early 2022, it had 56 metrics at the database level. The available metrics in Aurora relate to:
- disk storage;
- disk read/write operations;
- network throughput;
- data replication lags;
- specific latency;
- throughput metrics, for commit, delete, data definition language and data manipulation language operations, select, update and insert; and
- aggregated read/write latency and throughput.
Application owners must configure CloudWatch alarms and dashboards with the relevant metrics.
The most essential CloudWatch metrics to monitor in an RDS Aurora deployment are:
- ReadIOPS; and
CPUUtilization, FreeableMemory and DatabaseConnections are key metrics to ensure the right RDS database instance family and size is deployed.
The maximum amount of DatabaseConnections a database instance can handle is directly related to memory capacity. Thus, an instance type can have 1,000 connections while another has 2,000, for example. This number can vary depending on the database engine. Reaching the limit of database connections is one of the most common complications users face, so monitor this metric closely.
If there are read replicas configured in the Aurora cluster, monitor metrics for both the primary node, as well as all replicas. In the case of Aurora Serverless, metrics will help configure the right amount of Capacity Units, instead of database instance class and size.
ReadIOPS and WriteIOPS are essential metrics to indicate if the right type of storage is attached to a database instance. The default General Purpose SSD storage type provides a number of IOPS proportional to the allocated storage size, meaning that the higher the storage, the higher the allocated IOPS. RDS also offers the option to deploy Provisioned IOPS SSD storage, which comes with a baseline of guaranteed IOPS. Either way, application owners must ensure the deployed database can handle the read and write IOPS an application needs to perform optimally. This is where CloudWatch's ReadIOPS and WriteIOPS metrics come into play.
A useful and exclusive feature of RDS Aurora is Performance Insights, a tool that users can access through the AWS console, through the RDS API or the AWS command-line interface (CLI). It provides visibility into metrics related to database load such as active sessions, active SQL executions and top SQL executions. Application owners can see in detail which queries and database clients contribute the most to database load. Understanding this information, they can optimize applications and prevent future performance and availability issues.
Troubleshoot with CloudWatch Logs
RDS Aurora also can publish logs to CloudWatch Logs, which collects and stores logs from all AWS services. CloudWatch Logs is essential for troubleshooting and optimizing a database and the applications that connect to it in AWS. Aurora MySQL publishes general, slow, audit and error logs. Aurora PostgreSQL publishes query and error logs. Combined, these logs allow application owners to keep track of which clients connect to a database, specific SQL statements, query executions that exceed a configurable response time threshold, as well as diverse errors.
Once logs are available in CloudWatch Logs, analyze them using CloudWatch Logs Insights. The latter supports a query language to filter and aggregate log records. Users can visualize results from CloudWatch Logs Insights and place them in CloudWatch dashboards for easier troubleshooting.
One key consideration regarding logs is cost. CloudWatch Logs' data ingestion fee of $0.50 per gigabyte, and stepped levels according to volume, can quickly turn into hundreds or even thousands of dollars for high-volume applications. One strategy to reduce cost is to enable RDS configurations. If the log ingestion cost is high, another option is to activate logs only for limited periods of time, such as during troubleshooting activities.
DevOps Guru and third-party tools
As of 2021, the Amazon DevOps Guru service supports monitoring RDS databases. It relies on Performance Insights' database load tracking and, through anomaly detection machine learning algorithms, surfaces issues. Application owners can visualize these findings through the AWS Management Console and also configure notifications through Amazon Simple Notification Service topics or Amazon EventBridge serverless event bus.
Third-party metrics aggregation and analysis tools, such as Sumo Logic and Datadog, can help monitor RDS Aurora databases, but the native monitoring tools available in AWS are more than sufficient. Sumo Logic or Datadog display metrics that already exist in CloudWatch.