How do I apply ECS auto scaling to my containerized apps?

Break down ECS auto scaling options for container-based applications on AWS. Learn how to automatically scale at the service and cluster level to match user demand.

Ernesto Marquez, Concurrency Labs

Published: 07 May 2020

Developers have several configuration choices to make if they want to automatically scale workloads in Amazon Elastic Container Service.

Amazon Elastic Container Service (ECS) manages the deployment and orchestration of Docker containers, including the underlying compute infrastructure. ECS groups Docker containers into defined tasks, which are then grouped as services and deployed in clusters. ECS supports two launch options for container clusters: EC2 instances and Fargate serverless compute.

With Fargate, developers don't have to provision compute infrastructure, which means AWS handles all the scaling for you. However, many IT teams opt for the EC2 launch option for more control over their environments. In that case, they'll have to take additional steps to add the elasticity needed to match changes in user demand.

For EC2-based clusters, there are two types of AWS Auto Scaling levels to consider:

Service-level, to manage how many tasks -- or groupings of running Docker containers -- to launch in your service; and
Cluster-level, to manage the number of EC2 instances provisioned in the cluster.

Service-level auto scaling

An ECS cluster can host multiple services, each with a measurable CPU and memory consumption. You can configure ECS Service Auto Scaling to launch additional ECS tasks when certain metrics exceed a configurable value -- for example, when service CPU is more than 60%.

This will ensure your service is healthy, with an appropriate number of containers provisioned. You can register containers with an Application Load Balancer, which routes traffic to tasks and ensures a smooth user experience when tasks are either added or removed from a service.

Cluster-level auto scaling

For EC2-based clusters, the appropriate number of EC2 instances is based on generic metrics defined at the cluster level, such as instance CPU and CPU reservation percentages. To handle this, you create EC2 Auto Scaling Groups to add or remove EC2 instances based on those metrics.

However, the generic nature of EC2 Auto Scaling policies don't account for all the scenarios that might require more capacity. For example, your cluster might not scale quickly or efficiently with EC2 Auto Scaling if you have multiple workloads hosted on the cluster or if your workload needs to scale out rapidly. To deal with this challenge, use the ECS Cluster Auto Scaling feature to ensure the necessary number of instances can run in an EC2 Auto Scaling Group.

To use ECS Cluster Auto Scaling, you must create cluster Capacity Providers. Capacity Providers are associated with EC2 Auto Scaling Groups and enable you to configure the compute capacity behavior for ECS deployments.

You configure the desired capacity utilization percentage in a cluster -- for example, no more than 80% of its capacity -- and ECS adjusts the instance count to make sure the right number are deployed in the ECS cluster. This ensures tasks can be launched when needed.

How do I apply ECS auto scaling to my containerized apps?

Break down ECS auto scaling options for container-based applications on AWS. Learn how to automatically scale at the service and cluster level to match user demand.

Service-level auto scaling

Cluster-level auto scaling

Dig Deeper on Cloud provider platforms and tools

AWS Auto Scaling tutorial

What is an Amazon EC2 Instance? Types, features and pricing

Amazon Elastic Compute Cloud (EC2)

Amazon Elastic Container Service (Amazon ECS)

Related Q&A from Ernesto Marquez

Amazon RDS vs. Aurora Serverless: What's the better option?

Compare Amazon Redshift, Athena and EMR for data analysis

What is the difference between Amazon MSK and Kinesis?