Sergey Nivens - stock.adobe.com
Custom Amazon CloudWatch metrics: When default isn't enough
Transform your AWS monitoring beyond basic CPU and network stats. Discover how CloudWatch custom metrics unlock application-specific insights that default EC2 monitoring cannot provide.
As a cloud administrator, you know that default EC2 metrics can only tell you so much. While CPU utilization and network traffic provide valuable insights into infrastructure, they leave critical gaps in understanding application performance, user behavior and business-specific operational requirements. When you need more than out-of-the-box visibility, Amazon CloudWatch custom metrics can give you application-specific data points that matter most to your organization.
See how custom metrics can transform your cloud infrastructure monitoring strategy.
What is Amazon CloudWatch?
CloudWatch is a monitoring service with a wide range of features that enable critical operational tasks and visibility into AWS resources and applications. The main features in CloudWatch support areas include:
- Logs.
- Alarms.
- Dashboards.
- Visualization.
- Custom actions.
- Uptime monitoring.
- Metric management.
- Automated incident troubleshooting.
- Application performance management.
All of them are highly relevant to achieving application reliability, performance, security, optimal cost and efficient operations.
EC2 natively publishes CloudWatch metrics related to compute infrastructure utilization. Given that EC2 is also one of the most important services in AWS, managing metrics between EC2 and CloudWatch is a critical area to understand when launching and maintaining applications in AWS.
By default, EC2 publishes infrastructure usage metrics to CloudWatch such as CPUUtilization, DiskReadOps, NetworkIn and StatusCheckFailed. These metrics enable application owners to monitor compute infrastructure usage, assess system health, automate notifications and apply optimizations. They can be visualized in the AWS CloudWatch console or by creating custom CloudWatch dashboards. CloudWatch alarms can be configured based on parameters related to metric thresholds, statistics and time periods. Metric data can also be retrieved using the AWS SDK for custom automation processes.
Why use CloudWatch custom metrics?
Custom metrics enable monitoring of specific application behavior, configurations and requirements. Although the metrics published by EC2 are very helpful, there are many situations where more application-specific metrics are required to properly monitor a deployment, such as:
- Knowing the usage count for specific requests in a web application, such as /login and /landing-page.
- Keeping track of completed processes like customer onboarding/offboarding and specific product checkouts.
- Monitoring specific error codes returned by an application.
The range of use cases for custom metrics is wide and depends on specific application implementations and requirements. Developers can publish these metrics using the AWS CLI or SDK and visualize them using the same mechanisms as standard metrics published by AWS services.
Features and parameters
Custom metrics also support the high-resolution feature, which allows metrics to be aggregated at a higher granularity than standard resolution. Standard-resolution supports a period of 1 minute, while high-resolution supports 1, 5, 10 or 30 seconds, which is useful for scenarios that require monitoring at a higher granularity. That being said, standard-resolution is sufficient in most use cases.
Custom metrics are published using the PutMetricData API, which can be called from the application's source code deployed in EC2 instances or any other compute infrastructure. The main parameters for this API are:
Namespace. Namespace is a way to group metrics according to a naming pattern that is relevant to a particular deployment. For example, all AWS services publish metrics under a namespace that follows the pattern AWS/<service>. In custom metrics, namespaces can follow useful patterns such as <application>/<component>/<deployment-stage>, which helps to group metrics in a relevant way for monitoring and troubleshooting purposes.
MetricData. MetricData contains the metric name to publish and data associated with measurable parameters, such as data count, timestamp, unit metric values and statistics. Metric dimensions is an important parameter in MetricData, since it provides additional context for each published metric. The combination of metric namespaces and dimensions is essential for grouping the published custom metrics data.
Each PutMetricData API call allows a maximum of 1,000 metric records and a request size of up to 1 MB. For applications with a very high usage volume, it is very important to assess the frequency of API calls that will be required. In cases where usage volume exceeds these limits, applications can execute custom data aggregation or asynchronous processing of metric data during periods of extremely high usage.
CloudWatch agent
A very useful method for publishing metrics is the CloudWatch agent, which can be installed in EC2 instances, custom on-premises servers and containers. The CloudWatch agent is open source and available in GitHub.
The CloudWatch agent requires a configuration file with details on the metrics and application logs to be exported into CloudWatch. The CloudWatch agent enables the publication of key EC2 instance metrics, such as those related to memory utilization and disk space, which are not available in CloudWatch by default. Even though the CloudWatch agent extracts custom values from relevant sources, such as application logs, in many cases it is more practical to use the CloudWatch API in order to publish application-specific metrics as custom metrics to CloudWatch.
How much will custom metrics cost?
Metrics are considered active when there is data sent to CloudWatch, which is prorated on an hourly basis. If an application stops sending data for a particular metric, that metric will no longer be counted towards the metric count fee.
To plan AWS costs effectively, IT teams must assess the number of metrics that will be published and the expected API call volume. An application with a combination of dimensions that results in 10,000 metrics would incur $3,000 in monthly metric count fees. An application that sends 10 PutMetricData requests per second would generate approximately 26 million requests per month and incur a fee of nearly $260 in API calls. When publishing high-resolution metrics, you need to calculate the expected API call volume since it will likely be higher compared to standard-resolution metrics.
Ernesto Marquez is owner and project director at Concurrency Labs, where he helps startups launch and grow their applications on AWS. He enjoys building serverless architectures, building data analytics solutions, implementing automation and helping customers cut their AWS costs.