alphaspirit - Fotolia


Craft a business continuity plan to maximize AWS uptime

Strong business continuity plans start with an evaluation of needs. Native and third-party disaster recovery tools fit into a BC plan to protect cloud data and workloads.

Disruptions are inevitable and could happen at any time. Issues such as fire, flood or change of ownership could hamper business operations. Similar to lines of business, IT teams must also follow a strict business continuity plan to improve an organization's ability to remain up and running in the event of a disruption.

Enterprise IT must consistently analyze, test and update its business continuity (BC) plan to mitigate risks and continuously deliver products and services to its end customers. This article covers the basic concepts for developing a business continuity plan, and part two will explore business continuity features and capabilities to ensure AWS uptime.

Constructing a BC plan

An enterprise business continuity plan should identify resources, such as infrastructure protection, information, legal counsel, personnel, financial allocation and data protection to support business continuity. IT teams must also plan, measure and arrange a process to ensure the continuous delivery of critical services and products in terms of facility, data and assets.

When creating a business continuity plan (BCP), identify business-critical operations and services. To identify the critical areas, enterprise IT must clearly understand how performance degradation of just a single workload could affect the business.

Generally, a business continuity plan must address the following:

An enterprise business continuity plan should identify resources, such as infrastructure protection, information, legal counsel, personnel, financial allocation and data protection to support business continuity.

Business impact analysis: Identify key factors of how a disruption will affect business operations. IT teams could generate a business impact analysis report by conducting workshops with key business personnel to instruct on business function and process.

Prepare questionnaires and review personnel to validate this information. Information gleaned from these reports should match the service-level agreement (SLA) an enterprise has with a public cloud provider like AWS. It should outline required uptime per service, specify availability and support. Enterprise IT should also create SLAs for internal and external customers and end users.

Strategy development: Create a list of possible threats that could impede business operations. Assessing risks helps identify vulnerabilities and develop possible fixes. When creating a strategy, plot a framework and recovery plans, organize recovery teams and define recovery procedures.

Recovery point objective (RPO) and recovery time objective (RTO): This step involves setting up a detailed response and recovery plan to ensure a business remains up and running. An RPO includes a list of business operations, such as data facilities, that must be in place to continue business as usual. Based on SLA requirements, take specification of the recovery plan one level lower and include technical thresholds, such as time and the point at which a system must resume after a fault.

Testing a business continuity plan

After an enterprise creates its BCP, testing the plan is critical. Testing allows IT teams to verify how effective the backup and recovery process will be, prepare a to-do list for a real-time crisis and identify areas that must be optimized. BCP testing should cover three points:

  1. Study the business continuity plan thoroughly and conduct a quarterly review to determine that the plan is up to date. Meet with key personnel within the organization to review the plan and determine any areas that need improvement; provide personnel with training so they are familiar with their responsibilities in a given emergency scenario.
  2. Develop a simulation of a real-time scenario to check the plan's readiness. In this process, teams should go through each step of the plan according to the SLA and response time.
  3. Evaluate the test results to determine weak points of the BCP. Based on the test result, it's easy to identify areas to update. This is a continuous process -- every updated scenario of the plan goes through a testing phase until the plan fulfills all requirements.

When running on AWS, it's essential to have automated backup monitoring as well as recovery tests to make the most of AWS uptime. These tests should reflect RTO and RPO and include data completeness and service consistency. Using tags and AWS CloudFormation can help automate business continuity operations and align them with specific SLA and recovery goals. An important component of an AWS test plan is being able to draw the lines between automation and manual actions that are necessary to return to AWS uptime after a failure.

Finding gaps in a business continuity plan on AWS

Constructing a BCP and testing the plan doesn't always ensure success. And there are several reasons why a BCP can fail.

A BCP must clearly outline roles and responsibilities of everyone within the organization. Because the plan requires several different approaches, it's necessary to integrate each part to identify interdependencies. Defining a checklist of roles and responsibilities can help identify a key communicator who will take charge and implement DR plans to restore AWS uptime when an event occurs.

Having a communication plan is also a key component of a successful BCP. A good BCP will fail if there is not a clear path outlining internal and external communication paths. Identify necessary levels of communication in the event of disaster and include various lines of business managers and stakeholders.

Several resources play key roles in running and optimizing business operations within an enterprise. Therefore, it's important to maintain a priority list of resources. Ensure the plan includes business-critical resources and functions. Closely monitor activity to ensure the enterprise's business continuity plan remains up to date -- and keep stakeholders informed of any changes.

Next Steps

You need these elements in an AWS disaster recovery plan

Minimize damage caused by AWS outages

Resilient cloud apps change BC plans

Dig Deeper on AWS management

App Architecture
Cloud Computing
Software Quality