Scalability is the cornerstone of public cloud. But just as it's important to scale up resources when needed, it's important to scale back resources for unnecessary or underused workloads. This reduces public cloud costs, speeds up patches and updates and enhances security.
Manual instance management, however, is virtually impossible in dynamic cloud environments. Instead, IT teams should use cloud auto scaling. Here are some tips to get started.
Identify unnecessary workloads and resources
In a production environment, it's likely that a cloud workload or application will need to keep running at some level. You don't need to determine whether the workload runs or not; you need to employ cloud auto scaling services to increase or decrease compute resources as workload demands change.
Public cloud providers like Google Cloud Platform, Microsoft Azure and Amazon Web Services (AWS) all provide some manner of monitoring, scaling and load balancing services. When combined, and after admins configure scaling policies, these services can scale cloud workloads using a high degree of autonomy.
However, organizations often overlook lesser-used workloads, such as soon-to-be-retired production apps or temporary apps like test and dev instances. As a result, these workloads remain in the cloud, driving up costs, long after they've provided value.
Removing an unneeded workload takes more than automation; careful attention to policy is crucial. For example, a test instance could be associated with some form of lifecycle management service that puts an expiration date on instances, and alerts owners when expirations approach. The object lifecycle management capabilities found in AWS Simple Storage Service instances, for example, allow organizations to delete storage objects or move them to a lower-cost storage option.
Cloud tagging can also help admins identify questionable resources. This kind of service applies tags to cloud resources, representing workload or application names, owners, departments, cost centers and more. During billing review cycles, tags expose unneeded or forgotten cloud resources.
Use cloud auto scaling alongside other services
Automatic scaling is a crucial service for many public cloud deployments, but it is not the only service. Organizations typically use scaling with some form of monitoring, as well as load balancing.
Cloud auto scaling services are responsible for adding or removing resources from a group. For example, AWS users typically create an Auto Scaling group and allow the Auto Scaling feature to add resources, such as Amazon Elastic Compute Cloud (EC2) instances, to the group when utilization is high. They can also remove resources from the group when usage is low. Microsoft Azure controls scaling through VM Scale Sets and Google Cloud Platform includes automatic scaling in Compute Engine.
But cloud auto scaling isn't magic, and typically requires the use of a cloud provider's monitoring service. This allows admins to select the metrics and thresholds that dictate scaling activity. For example, AWS CloudWatch can watch the CPU utilization of an EC2 auto scaling group and add or remove EC2 instances based on CPU utilization thresholds.
Connection draining with cloud-based scaling services
When AWS Auto Scaling makes the decision to shut down unneeded instances, it doesn't necessarily mean those instances aren't doing any work; they may simply be underutilized. If Auto Scaling closes network connections and terminates instances before those instances have finished servicing requests, those requests may be disrupted.
The idea behind connection draining is to build in a cool down period for any instance being shut down. Rather than break a network connection and discard an instance immediately, the instance is allowed time to complete current requests. AWS has included connection draining with its Elastic Load Balancing service, and users can select from one second to 60 minutes, depending on the workload.
For effective auto scaling, IT teams also need to direct network traffic. For example, traffic must be redirected to additional instances as scaling increases the number of compute instances. Traffic must also be consolidated to fewer instances, as scaling reduces the number of compute instances available.
Consider third-party tools for workload scaling
In terms of third-party tools, Botmetric can scan an AWS infrastructure to audit security, performance, backups and cost analytics, and help with tasks, such as starting or stopping EC2 machines.
RightScale Cloud Management enables AWS users to deploy more resilient architectures, scale and operate in an automated fashion and manage workloads across accounts and regions. In addition, CloudCheckr for Continuous Monitoring enables organizations to identify their cloud resources, location, users and history, while enforcing standard policies.
However, each of these tools offer varied feature sets and may not be suited for every use. Organizations that need to employ third-party scaling support should perform extensive testing and proof-of-principle exercises before selection.
But, considering that each cloud provider delivers the basic services needed for workload scaling, these outside tools are meant to supplement the monitoring and decision-making processes that a cloud provider's internal tools can't handle.
AWS rolls out application-level load balancing service
Using AWS tagging for audits
Seven must-have cloud computing tools for admins