Many IT administrators went into 2020 calling for better disaster preparedness to mitigate damage from the unexpected. Little did they know a global pandemic and economic shutdown would bring those concerns into an even sharper focus.
COVID-19 has forced many organizations to reevaluate their disaster recovery (DR) strategy. At the same time, businesses accelerated their adoption of public cloud resources as employees shifted to remote work. The convergence of these two trends has likely led more enterprises to evaluate the disaster recovery options on AWS, Microsoft Azure, Google Cloud Platform (GCP) and other public clouds.
A primer on disaster recovery on AWS, Azure and GCP
Compared to traditional approaches, public cloud IaaS simplifies and accelerates the DR process and cuts costs by replacing standby hardware with on-demand cloud resources. If the primary hosting site fails, disaster recovery as a service (DRaaS) systems automatically launch applications by initiating the necessary cloud infrastructure, installing application images, and attaching databases and volumes.
Later, when a primary site comes back online, the DRaaS software synchronizes data from the remote location, triggers a restart of applications at the primary site and decommissions virtual infrastructure at the secondary cloud site.
Most of the major hyperscale cloud providers were late to the DRaaS market, having initially focused on providing cloud infrastructure and APIs to third-party disaster recovery specialists. However, AWS and Microsoft have since released competitive products, though Google still only provides documentation on building DIY, cloud-based DR, instead of a full-fledged DRaaS offering.
Because the foundation of DRaaS is cloud infrastructure, many organizations will look there first for a cloud-based DR. Here's what they'll find.
AWS filled a hole in its portfolio when it acquired CloudEndure in 2019, whose products now comprise AWS' cloud-based disaster recovery and workload migration offerings. CloudEndure DR agents can replicate workloads from on-premises or other IaaS environments to AWS or between AWS regions. It provides:
- Continuous data replication;
- A low-cost workload staging area using only enough AWS compute and storage resources to support data replication;
- Automated machine conversion from the native format to a supported AWS instance and image;
- Support for popular enterprise software, operating systems and cloud environments, including Azure, GCP, IBM Cloud, Oracle Cloud, OpenStack and VMware;
- Point-in-time recovery, i.e., ability to recover to either the most current application state or some prior time; and
- Non-disruptive DR testing.
Administrators can also use VMware Site Recovery Manager with vSphere Replication to make AWS a DR target for on-premises workloads. The Site Recovery Manager DR process is the same as using a secondary private data center since VMware Cloud on AWS runs the native VMware software stack.
Similar to Amazon, Microsoft used an acquisition to bolster its DRaaS product. However, it had the foresight to do so back in 2014 when it folded business continuity technology from InMage into Azure Site Recovery.
Administrators can replicate applications from on-premises systems running on Azure Stack or another virtualization platform. Site Recovery can also replicate between Azure cloud regions or even from AWS Windows instances to Azure. Like CloudEndure, it supports on-demand creation of VM instances during recovery incidents, non-disruptive DR testing, and customized targets for recovery point objectives and recovery time objectives.
Site Recovery also supports customized recovery plans and runbooks that can use Azure Automation or PowerShell scripts for complex scenarios. These can be used as part of Site Recovery plans, which defines the sequence where machines fail over and restart to accommodate applications with many dependencies.
Google Cloud does not offer a packaged DRaaS. Instead, it provides documentation on cloud-based disaster recovery planning and how to use GCP services as a DR platform. Aside from its IaaS products, Google's documentation focuses on various automation tools to eliminate manual DR processes when replicating and failing over from an external environment. GCP services useful to a DIY DR process include:
- Cloud Monitoring and Cloud Status Dashboard for application monitoring, metrics and events;
- Cloud Deployment Manager to automatically create GCP environments from predefined templates; and
- Third-party infrastructure templating and configuration management software with GCP support such as Ansible, Chef and Terraform.
The GCP partner directory also lists several companies that provide DRaaS or DR managed services on top of GCP infrastructure.
Beware rebranded data protection products
Gartner's DRaaS definition identifies the primary features of a managed DR service, namely:
- Server and application image collection and replication to cloud infrastructure;
- Data replication to cloud storage and database services;
- Creation and management of automated DR runbooks;
- Provisioning of the cloud resources, namely IaaS servers, storage and networks, needed to "rehydrate" an application;
- Automated server recovery using the previously collected resources;
- Automated fallback after an incident is resolved; and
- Compliance with predefined SLAs covering application recovery time, performance and other KPIs.
Clearly, a lot goes into cloud-based disaster recovery service beyond data backup and cloud infrastructure. And, as Gartner points out, some vendors miscategorize and market tools as DRaaS even though they only provide some of the required components, such as data replication or IT infrastructure orchestration software.
The complexity of complete cloud-based disaster recovery tools explains why the market remains fragmented.
Other CSPs and selection considerations
IBM Cloud doesn't offer a complete DRaaS product, but partners with Veeam and Zerto to provide workload backup, migration and replication between on-premises VM environments and IBM infrastructure. IBM Global Services does offer two business continuity products: DRaaS with customer-managed setup and recovery and Resiliency Orchestration Services, which is managed by IBM.
Oracle Cloud Infrastructure (OCI) doesn't offer a DRaaS product either. However, it does provide documentation for using Data Guard, its high availability, DR and data protection software for Oracle databases with OCI.
If your organization already uses AWS, Azure or GCP as its primary public cloud, those platforms should cover your cloud-based disaster recovery needs. If you have complex systems that span multiple clouds or on-premises systems, you might want to look beyond the hyperscale providers and consider one of the many third-party options.