Getty Images

How to set up autoscaling in Azure Kubernetes Service

Explore scaling options in AKS, such as the horizontal pod and the cluster autoscaler. Then, follow a step-by-step tutorial on how to manually and automatically scale resources.

Whether it's for a week, a month or a quarter, organizations will go through busy and slow times. Because of the jump from more users to fewer users, engineers need a way to manage infrastructure and applications to handle both types of traffic. This is where autoscaling comes into play.

Typically, when engineers think about autoscaling, they think about scaling up infrastructure or pods. However, that's not always the case. If it's a slow time for an organization, users can scale back infrastructure and pods to reduce cost and overhead. Autoscaling can be vertical, in which you add more resources, such as RAM or CPU, or horizontal.

Horizontal scaling, the more popular option, is when you add or remove pods or servers. For example, if you have two worker nodes in an Azure Kubernetes Service (AKS) cluster, and the two worker nodes are at 75% capacity, you can add a third worker node. This allows pods to be scheduled on the new worker node instead of the two worker nodes that are under heavy load. From a pod perspective, you can add more pods in a ReplicaSet to handle the traffic that may be coming to the containerized application.

In AKS, there are various scaling options, such as the horizontal pod autoscaler and the cluster autoscaler. In the following steps, learn how to manually scale workloads to build a basic understanding of the scaling process, then review the automated way.

Diagram of a Kubernetes cluster.
A representation of a Kubernetes pod encapsulating a container and how the pods are gathered into nodes.

Cluster vs. horizontal pod autoscaler

Regardless of the workload, AKS will either create more worker nodes or remove worker nodes. The two primary ways that autoscaling monitors resource demands is with the horizontal pod autoscaler and the cluster autoscaler.

Horizontal pod autoscaler. The horizontal pod autoscaler is a default service on AKS that monitors a pod's resource needs. It checks the Metrics API every 15 seconds for any changes. If the containerized app is under heavy load, the number of pods automatically goes up. If the pods are no longer needed, it scales down.

Cluster autoscaler. The cluster autoscaler watches for pods that can't be scheduled via the Kubernetes scheduler because of resource -- such as CPU and memory -- constraints. If this occurs, the autoscaler automatically scales the cluster. It checks the Metrics API server every 10 seconds for any changes. It is usually used together with the horizontal pod autoscaler

How much AKS autoscaling you can do depends on your Azure account. With Azure, and every other cloud, an account limits how many VMs you can run. Confirm that you're able to create the needed VMs in Azure before you begin.

Scale manually

Before automating a workload, you must first learn how to manually configure it.

Step 1. Sign into Azure and go to the AKS service.

AKS service homepage within Azure.

Step 2. Click the drop-down to create a new AKS cluster.

Step 3. Fill in the basic information for subscription, resource group and more.

Basics page under Create Kubernetes cluster.

Step 4. Scroll down to the bottom of the Basics page.

Step 5. Underneath the primary node pool, set whether you want autoscaling or not, along with the worker node range.

Set autoscale within primary node pool.

Step 6. The completed node pool will show on your node pool dashboard.

Completed node pool will be visible in the node pools section.

Scale automatically

See how to turn on autoscaling in Terraform for AKS.

Step 1. Specify the Azure Resource Manager provider.

provider "azurerm" {
 features {}

Step 2. Specify the AKS resource.

resource "azurerm_kubernetes_cluster" "k8s" {
 name        =
 location      = var.location
 resource_group_name = var.resource_group_name
 dns_prefix     = "${}-dns01"
 network_profile {
 network_plugin = "azure"
 network_policy = "azure"

Step 3. Specify the identity.

identity {
  type = "SystemAssigned"

Step 4. Within the node pool, turn on auto-scaling -- it can be true or false.

default_node_pool {
  name    = "default"
  node_count = var.node_count
  enable_auto_scaling = true
  vm_size  = "Standard_A2_v2"

Dig Deeper on Cloud provider platforms and tools

Data Center