Organizations of all types and sizes rely on cloud services. With this increased use comes a dark side: a critical dependence on cloud applications and services that may impair business functions if the cloud fails.
As more organizations turn toward the cloud, learn about the top provider outages, and discover strategies that will help prevent disruption from downtime.
Examples of cloud provider outages
Notable outages from four of the top cloud providers include the following:
- AWS outages. Three outages at AWS in November and December 2021 led to sustained unavailability of many well-known sites and services, including Slack and Epic Games. One outage lasted more than five hours. Amazon stated that automated systems caused "unexpected problems" that led to system downtime.
- Google outages. In February 2021, Google Assistant for home devices, including smart security technology and thermostats, stopped working due to a "limited experiment" that was rolled out to a select group of users. In November 2021, Google Cloud Platform sustained a two-hour outage due to a network configuration error, leading to downtime on sites such as Home Depot, Snapchat, Spotify and Etsy.
- Meta outages. Facebook, Instagram, Messenger and WhatsApp were down for roughly six hours in October 2021. Facebook said routing configuration changes were to blame, and many felt that larger-scale changes to the Border Gateway Protocol configuration for Facebook led to the series of failures.
- Microsoft outages. Azure experienced a six-hour outage in October 2021 that took down VM workload services and more. The outage was attributed to a failure condition experienced by VM queries for an artifact Microsoft 365 has experienced a number of outages in the past several years, too, including an Exchange Online outage in April 2021 that affected email delivery and an almost complete outage of all Microsoft 365 services, including Exchange, SharePoint, Teams and OneDrive, in September 2020.
Why are these cloud provider outages significant?
All these cloud outages beg the same question: Have we become overly reliant on cloud provider infrastructure? If so, should we consider cloud service providers as critical infrastructure?
Government bodies have shown no indication that a change in designation will happen anytime soon, but the topic remains hotly debated as more organizations shift traditionally in-house applications, services and infrastructure to third-party cloud environments.
Marking cloud providers as critical infrastructure is likely unwarranted if we're just talking about losing access to email, collaboration services or file shares for a relatively short period of time. However, the largest providers are now hosting IoT platforms, payment processing for global financial organizations, and healthcare patient data processing and application integration.
Take, for example, Azure Health Data Services, used by large organizations such as Humana, SAS and others to process patient and healthcare research data. Likewise, AWS is increasingly targeting the energy sector with products that include oil exploration and drilling models and petroleum production monitoring. The automotive industry can now take advantage of Google Cloud's Connected Car Telemetry Platform to collect and coordinate data from self-driving vehicles and those with telemetry reporting for speed, location, camera footage and more.
How to prevent disruption from downtime
The processing power of the cloud will continue to attract new technology models and use cases. Critical infrastructure industries will inevitably determine the risk of third-party cloud services is lower than building and maintaining in-house workloads and applications.
For now, organizations of all types should double down on disaster recovery (DR) and business continuity planning. Some strategic considerations include the following:
- Instead of creating a replica cloud infrastructure within the same provider's environment, consider a fallback infrastructure in a second provider's cloud. This model, however, increases complexity and cost.
- Invest in backup products or DR-as-a-service providers that can replicate and store cloud workload and application data externally to the primary cloud services in use.
- Push SaaS vendors to offer more flexible and accessible backup options through API integration, where possible.
- Perform a thorough business impact analysis for all major cloud applications -- particularly SaaS because it's difficult to replicate -- to align organizational risk tolerance with cloud usage.
Dig Deeper on Cloud security
Twitter CEO Elon Musk pauses datacentre downsizing push in wake of outage
Expensive datacentre outages: Untangling messy collaborations, contributing costs and complexity
Length, cost and severity of datacentre outages continue to rise, Uptime Institute research confirms
Uptime Institute to help financial services organisations reduce infrastructure outage risks