Cloud bursting never took off the way some expected it to, but many organizations still aspire to use this hybrid cloud construct. For those undaunted by the potential challenges, several key considerations come into play.
Most enterprises use public and private cloud infrastructure to host their applications. One survey found 85% of IT leaders consider a hybrid cloud to be the ideal operating environment, and half say that it meets all of their needs.
However, the term "hybrid cloud" encompasses a wide variety of scenarios, from simple, passive disaster recovery (DR) environments to complex, redundant active-active applications. The latter -- simultaneously active and load-balanced public and private cloud environments -- was once considered the ideal cloud operating model since it enables system architects to exploit the benefits and minimize the drawbacks of both.
Cloud bursting challenges
"Cloud bursting" became the meteorological metaphor to describe this best-of-both-worlds scenario. In this model, private cloud infrastructure handles baseline resource demands and hosts sensitive, legacy databases and back-end systems; public cloud infrastructure addresses seasonal load spikes, temporary bursts and scale-out web front-end systems.
Grandiose visions of cloud bursting even extended to cost optimization. In a concept known as "multi-cloud arbitrage," an enterprise uses multiple IaaS providers, along with real-time cost analysis software, and directs bursts to the cheapest vendor for a particular workload at a particular time.
When overcoming cloud bursting challenges is worth the effort
Cloud bursting requires a substantial amount of design expertise and implementation planning. Whether it's worth the effort depends on the unique characteristics of each workload and an application's business value.
Bursting is ideally suited for revenue-generating applications that experience extreme variability in capacity demand. In this scenario, on-premises systems meet baseline resource requirements and low-cost, cloud-based spot instances support those systems when demand spikes.
Unfortunately, as is often the case, an elegant theory ran head-on into the messy reality of multi-cloud infrastructure and application design. As a result, few companies are able to deploy cloud bursting architectures, even if more wish they could. Indeed, skeptics contend that cloud bursting is largely a myth. Consulting firm Architecting IT recently pointed out four significant challenges that typically impede adoption:
- Networking, namely building low-latency, high-bandwidth, redundant connections between public and private clouds and automatically routing incoming connections to the optimal location.
- Security, as it pertains to consistent policies and controls, for both users and systems, across multiple environments.
- Data consistency, or the problem of synchronizing data stores on multiple sites, particularly during periods of high transaction load.
- Data protection, the related problem of keeping backups consistent when they are fed from multiple sources.
I would add a fifth challenge: automatically deploying and dynamically scaling resources. This relates to the ability of a public cloud -- primarily compute or container instances, but also storage I/O -- to handle transient, bursted demand.
These are solvable but complicated problems, which has led many organizations to conclude that cloud bursting isn't worth the trouble, at least not until they have a new generation of distributed applications models designed to work across multiple clouds. For those that remain undeterred, here's how to address each cloud bursting challenge.
Networking and security
Networking and security are the most fundamental cloud bursting challenges since they must be solved for any hybrid environment, whether using cloud bursting or not. Fortunately, this is where IT teams will find the widest variety of technologies and services, including those from the major cloud providers, telecommunications companies, ISPs and colocation operators. There are several secure ways to link private and public clouds for improved hybrid cloud connectivity:
- Virtual private clouds that use standard VPN protocols -- usually IPsec -- and a virtual router to link an on-premises network to one or more private subnets in the public cloud. An enterprise can extend its corporate network into a cloud subnet via a VPN and maintain full control over traffic flows by incorporating virtual services such as routers, NAT gateways and internet gateways.
- Private circuits using a cloud provider's service like AWS Direct Connect, Azure ExpressRoute or equivalent products from Google, Oracle and IBM. These offerings provide a dedicated, low-latency link between a customer's private and public cloud networks. Due to limitations in endpoint locations, service is typically terminated in a colocation center, where it can be connected to a customer's private racks rather than a corporate data center.
- Private circuits via a service from a telco, ISP or colocation provider. These are similar to the services offered by cloud providers, but they exploit the extensive interconnections with major carriers to offer far more terminating locations. Services like AT&T NetBond or Equinix Cloud Exchange also simplify the design of multi-cloud interconnects since they peer with all the major IaaS and SaaS providers.
Both types of private circuit services enable enterprises to have complete control over network routing, traffic management and security policy. However, cloud provider services like Direct Connect are the best option for overcoming cloud bursting challenges. These services rely on fiber circuits that terminate in a cloud provider's data center, so they offer the highest performance and lowest latency.
Data consistency and protection
Similar to a cloud DR implementation, cloud bursting requires the duplication of application components from on premises to public cloud IaaS. An organization must manually build the necessary infrastructure resources and network topology, unless it has already fully implemented a cloud-agnostic infrastructure-as-code system like Terraform, Chef, Ansible or others that allow programmatically instantiating resources and configurations. However, once in place, cloud providers' native tools, like AWS CloudFormation or Google Cloud Deployment Manager, can automate scaling or duplicate resources in other cloud regions.
Data synchronization can then be addressed in a couple of ways, depending on your app architecture:
- Leave databases on premises and ensure reliable, high-speed hybrid cloud connectivity through the aforementioned private circuit. This strategy works for applications that aren't latency-sensitive and don't continually access a database, since the front-end and business-logic application tiers are the primary bottlenecks when load increases.
- Replicate databases across locations, especially for applications with heavy transactional database loads. Replication inherently raises data consistency problems, but these issues have been solved by database systems like Oracle and Microsoft SQL Server that support multisite replication. Applications that require real-time consistency must use a more sophisticated method such as Oracle RAC, GoldenGate, or various third-party or open source products like Qlik Replicate or SymmetricDS.
Dynamic resource allocation
Once the network plumbing and application infrastructure are in place, the operational part of cloud bursting entails dynamically directing and balancing workloads between on-premises and cloud environments.
The primary tools for this are load balancers, application delivery controllers and virtual network interfaces. Modern load balancers have ample configuration options for specifying load weighting factors, resource/CPU limits and fallback provisions to accommodate any bursting scenario. Cloud load balancing services such as AWS Elastic Load Balancing, Google Cloud Load Balancer or Azure Load Balancer automatically scale capacity to handle increased traffic.
Another challenge with cloud bursting is dynamic capacity management. Containerized applications have an inherent advantage here since Kubernetes can automatically scale clusters and pods, which are groups of containers on a host. All the major cloud Kubernetes services include cluster autoscaling.
Capacity management is trickier with VM-hosted applications, though the major cloud platforms do support autoscaling of instance groups through services such as EC2 Auto Scaling and Azure Scale Sets. These tools respond to policies like virtual CPU utilization, network traffic or other metrics from their corresponding monitoring services.