AWS monitoring best practices extend beyond CloudWatch
AWS users can choose from a range of native tools to monitor their environments -- but, ultimately, a combination of these services will produce the best results.
While Amazon CloudWatch is the central AWS monitoring tool, there are several other services AWS users should include in their monitoring strategy to provide data for real-time tracking and analysis.
The question then becomes how to properly apply and connect these various services to create a comprehensive logging and monitoring system that:
- collects and logs the right data;
- monitors AWS telemetry in ways that highlight abnormal, insecure or unreliable behavior; and
- triggers both manual and automated corrective actions.
Let's examine AWS monitoring best practices, briefly review core AWS monitoring tools and discuss how to combine these services for effective resource management.
AWS monitoring best practices: Log, monitor, act
Monitoring is the process of collecting data -- which means monitoring tools require data sources. Users might confuse several Amazon cloud management tools for monitoring tools, when they're actually more useful for log generation and collection. Though, monitoring requires logging, including telemetry from other Amazon services and user activities.
On a Linux server, the syslog daemon automatically collects many system parameters. IT teams can customize the daemon via a configuration file to collect various additional parameters. Applications running on the server can also log internal parameters using syslogd. Admins can configure services such as EC2 or Amazon Relational Database Service to log data, however, the AWS environment itself can also generate records via several management services.
The primary Amazon logging and auditing services that are useful for an AWS monitoring strategy include:
- CloudTrail: records API calls, including access to the AWS Management Console, command-line interface and other Amazon cloud services. Example uses for CloudTrail include policy changes on S3 storage, state changes on EC2 and additions or alterations to Identity and Access Management users and groups.
- CloudWatch Events: records changes to AWS resources according to preconfigured rules. The service records events continually or at set times, similar to cron, and can trigger notifications or remediative actions. CloudWatch can currently record events from almost 20 services, including EC2, Lambda functions and AWS Batch jobs.
- AWS Config: tracks a service's state, such as any configuration changes, rather than activity. CloudWatch Events can monitor AWS Config.
- AWS Trusted Advisor: provides best practices for AWS service deployments, configurations, security and fault tolerance. Users can pair Trusted Advisor and AWS Config with CloudWatch Events to get a log of any violations related to compliance or established best practices, as new services are deployed or reconfigured.
- VPC Flow Logs: records information about network traffic over AWS VPCs, such as the basic IP parameters -- source, destination and protocol. VPC Flow Logs also monitors optional data, including the traffic flow's AWS account, bytes and packets transferred, along with start and stop times. This information can help debug networking problems such as capacity bottlenecks.
- AWS Inspector: checks the network security of EC2 instances and logs various metrics to CloudWatch for further analysis.
- Amazon GuardDuty: performs more advanced security checks for malicious threats and compromised services or accounts.
Aggregate data to take action
To apply AWS monitoring best practices, aggregate raw logs into actionable intelligence. Admins can aggregate the various sources of AWS configuration, security and network data listed above in a couple of ways: send them to CloudWatch, or record data captured by CloudWatch Events to an S3 bucket, which is useful to trigger processes such as a Lambda function or an Amazon Simple Notification Service (SNS) notification. CloudWatch aggregates logs from all sources into a single user interface and records data at one-second intervals for up to 15 months of historical trending.
CloudWatch is the core AWS service for event monitoring, analysis and automated response, and AWS regularly updates it. For example, AWS recently added anomaly detection to CloudWatch, a feature that analyzes historical data to automatically find predictable patterns for a selected metric. It then uses the discovered normal values as trigger points to warn of measures outside expected limits.
The goal of an AWS monitoring strategy isn't just to provide pretty pictures or red-green dashboard indicators, but to trigger actions -- especially in response to unusual, unauthorized or non-compliant behavior. While CloudWatch can perform some of these actions with features like anomaly detection, Amazon SNS is a key part of this strategy as well. Users can trigger actions via CloudWatch alarms sent to Amazon SNS. Various endpoints can then subscribe to different types of notifications, using them to trigger events such as an email or SMS message, mobile app push notifications or a Lambda function.
When it comes to monitoring your AWS environment, a cursory look at the provider's services might seem to reveal redundancies. That, however, is not the case. When appropriately used and combined, each tool supports AWS monitoring best practices.
Compare CloudWatch vs. Datadog and New Relic for AWS monitoring
Compare Grafana vs. Datadog for IT monitoring
Learn how New Relic works, and when to use it for IT monitoring