disaster recovery (DR)
What is disaster recovery?
Disaster recovery (DR) is an organization's ability to respond to and recover from an event that negatively affects business operations. The goal of DR methods is to enable the organization to regain use of critical systems and IT infrastructure as soon as possible after a disaster occurs. To prepare for this, organizations often perform an in-depth analysis of their systems and create a formal document to follow in times of crisis. This document is known as a disaster recovery plan.
Read on to learn more about why DR is important, how it works, and the difference between disaster recovery and business continuity. You'll also discover what to include in a disaster recovery plan and the major types of DR, as well as major DR services and vendors.
What is a disaster?
The practice of DR revolves around events that are serious in nature. These events are often thought of in terms of natural disasters, but they can also be caused by systems or technical failure or by humans carrying out an intentional attack. They are significant enough to disrupt or completely stop critical business operations for a period of time. Types of disaster include:
- Cyber attacks such as malware, DDoS and ransomware attacks
- Power outages
- Equipment failure
- Epidemics or pandemics, such as COVID-19
- Terrorist attacks or threats
- Industrial accidents
Why is disaster recovery important?
Disasters can inflict many types of damage with varying levels of severity, depending on the scenario. A brief network outage could result in frustrated customers and some loss of business to an e-commerce system. A hurricane or tornado could destroy an entire manufacturing facility, data center or office.
The monetary costs can be significant. The Uptime Institute's Annual Outage Analysis 2021 report estimated that 40% of outages or service interruptions in businesses cost between $100,000 and $1 million, while about 17% cost more than $1 million. A data breach can be more expensive; the average cost in 2020 was $3.86 million, according to the 2020 Cost of a Data Breach Report by IBM and the Ponemon Institute.
Additionally, many businesses are required to create and follow plans for disaster recovery, business continuity and data protection in order to meet compliance regulations. This is particularly important for organizations operating in financial, healthcare, manufacturing and government sectors. Failure to have DR procedures in place can result in legal or regulatory penalties, so understanding how to comply with resiliency standards is important.
Preparing for every potential disaster may seem extreme, but the COVID-19 crisis illustrated that even scenarios that seem farfetched can come to pass. Businesses that had emergency measures in place to support remote work had a clear advantage when stay-at-home orders were enacted.
Thinking about disasters before they happen and creating a plan for how to respond can provide many benefits. It raises awareness about potential disruptions and helps an organization to prioritize its mission-critical functions. It also provides a forum for discussing these topics and making careful decisions about how to best respond in a low-pressure setting.
What is the difference between disaster recovery and business continuity?
On a practical level, DR and business continuity are often combined into a single corporate initiative and even abbreviated together as BCDR, but they are not the same thing. While the two disciplines have similar goals relating to an organization's resilience, they differ greatly in scope.
BC is a proactive discipline intended to minimize risk and help ensure the business can continue to deliver its products and services no matter the circumstances. It focuses especially on how employees will continue to work and how the business will continue operations while a disaster is occurring. BC is also closely related to business resilience, crisis management and risk management, but each of these has different goals and parameters.
DR is a subset of business continuity that focuses on the IT systems that enable business functions. It addresses the specific steps an organization must take to resume technology operations following an event. DR is also a reactive process by nature. While planning for it must be done in advance, DR activity is not kicked off until a disaster actually occurs.
Elements of a disaster recovery strategy
Before an organization can determine its DR strategies, it must first analyze existing assets and priorities. Two different analyses typically factor into DR decision-making:
Risk analysis or risk assessment is an evaluation of all the potential risks the business could face, as well as their outcomes. Risks can vary greatly depending on the industry the organization is in and its geographic location. The assessment should identify potential hazards, determine who or what these hazards would harm, and use the findings to create procedures that take these risks into account.
Business impact analysis
Business impact analysis (BIA) evaluates the effects of the risks identified above to business operations. A BIA can help predict and quantify costs, both financial and non-financial. It also examines the impact of different disasters on an organization's safety, finances, marketing, business reputation, legal compliance and quality assurance.
Understanding the difference between risk analysis and BIA and conducting the assessments can also help an organization define it goals when it comes to data protection and the need for backup. Organizations generally quantify these using measurements called recovery point objective (RPO) and recovery time objective (RTO).
Get started with your own analysis by reading our guide to BIA and free template.
Recovery point objective
RPO is the maximum age of files that an organization must recover from backup storage for normal operations to resume after a disaster. The RPO determines the minimum frequency of backups. For example, if an organization has an RPO of four hours, the system must back up at least every four hours.
Recovery time objective
RTO refers to the amount of time an organization estimates its systems can be down without causing significant or irreparable damage to the business. In some cases, applications can be down for several days without severe consequences. In others, seconds can do substantial harm to the business.
RPO and RTO are both important elements in disaster recovery, but the metrics have different uses. RPOs are acted on before a disruptive event takes place to ensure data will be backed up, while RTOs come into play after an event occurs.
Read more about calculating recovery objectives and the difference between RPO and RTO.
What's in a disaster recovery plan?
Once an organization has thoroughly reviewed its risk factors, recovery goals and technology environment, it can write a DR plan. The DR plan is the formal document that specifies these elements and outlines how the organization will respond when disruption or disaster occurs. The plan details recovery goals including RTO and RPO as well as the steps the organization will take to minimize the effects of the disaster.
The components of a DR plan should include:
- A DR policy statement, plan overview and main goals of the plan.
- Key personnel and DR team contact information.
- A step-by-step description of disaster response actions immediately following an incident.
- A diagram of the entire network and recovery site.
- Directions for how to reach the recovery site.
- A list of software and systems that staff will use in the recovery.
- Sample templates for a variety of technology recoveries, including technical documentation from vendors.
- A communication that includes internal and external contacts, as well as boilerplate for dealing with the media.
- Summary of insurance coverage.
- Proposed actions for dealing with financial and legal issues.
An organization should consider its DR plan a living document. Regular disaster recovery testing should be scheduled to ensure the plan is accurate and will work when a recovery is required. The plan should also be evaluated against consistent criteria whenever there are changes in the business or IT systems that could affect DR.
For more details and guidance, download a free DR plan template and planning guide.
How disaster recovery works
DR initiatives are more attainable by business of all sizes today due to widespread cloud adoption and availability of virtualization technologies that make backup and replication easier. However, much of the terminology and best practices developed for DR were based on enterprise efforts to recreate large-scale physical data centers. This involved plans to transfer, or fail over, workloads from a primary data center to a secondary location or DR site in order to restore data and operations.
Disaster recovery sites
An organization uses a DR site to recover and restore its data, technology infrastructure and operations when its primary data center is unavailable. DR sites can be internal, external or cloud-based.
An organization sets up and maintains an internal DR site. Organizations with large information requirements and aggressive RTOs are more likely to use an internal DR site, which is typically a second data center. When building an internal site, the business must consider hardware configuration, supporting equipment, power maintenance, heating and cooling of the site, layout design, location and staff.
An external disaster recovery site is owned and operated by a third-party provider. External sites can be hot, warm or cold.
- Hot site: A fully functional data center with hardware and software, personnel and customer data, which is typically staffed around the clock and operationally ready in the event of a disaster.
- Warm site: An equipped data center that doesn't have customer data; an organization can install additional equipment and introduce customer data following a disaster.
- Cold site: Has infrastructure to support IT systems and data, but no technology until an organization activates DR plans and installs equipment; they are sometimes used to supplement hot and warm sites during a long-term disaster.
A cloud recovery site is another option. An organization should consider site proximity, internal and external resources, operational risks, service-level agreements and cost when contracting with cloud providers to host their DR assets or outsourcing additional services.
Disaster recovery tiers
In addition to choosing the most appropriate DR site, it may be helpful for organizations to consult the tiers of disaster recovery identified by the Share Technical Steering Committee and IBM in the 1980s. The tiers feature a variety of recovery options organizations can use as a blueprint to help determine the best DR approach depending on their business needs.
Another type of DR tiering involves assigning levels of importance to different types of data and applications and treating each tier differently based on the tolerance for data loss. This approach recognizes that some mission-critical functions may not be able to tolerate any data loss or downtime, while others can be offline for longer or have smaller sets of data restored.
Types of disaster recovery
In addition to choosing a DR site and considering DR tiers, IT and business leaders must evaluate the best way to put their DR plan into action. This will depend on the IT environment and the technology the business chooses to support its DR strategy.
Types of DR can vary, based on the IT infrastructure and assets that need protection as well as the method of backup and recovery the organization decides to use. Depending on the size and scope of the organization, it may have separate DR plans and implementation teams specific to departments such as data centers or networking. Major types of DR include:
Data center disaster recovery
Organizations that house their own data centers must have a DR strategy that considers all the IT infrastructure within the data center as well as the physical facility. Backup to a failover site at a secondary data center or a colocation facility is often a large part of the plan (see "Disaster recovery sites" below). IT and business leaders should also document and make alternative arrangements for a wide range of facilities-related components including power systems, heating and cooling, fire safety and physical security.
Network disaster recovery
Network connectivity is essential for internal and external communication, data sharing and application access during a disaster. A network DR strategy must provide a plan for restoring network services, especially in terms of access to backup sites and data.
Virtualized disaster recovery
Virtualization enables DR by allowing organizations to replicate workloads in an alternate location or the cloud. The benefits of virtual DR include flexibility, ease of implementation, efficiency and speed. Virtualized workloads have a small IT footprint, replication can be done frequently, and failover can be initiated quickly. Several data protection vendors offer virtual backup and DR as a product.
Cloud disaster recovery
The widespread acceptance of cloud services allows organizations that traditionally used an alternate location for DR to be hosted in the cloud. Cloud DR goes beyond simple backup to the cloud. It requires an IT team to set up automatic failover of workloads to a public cloud platform in the event of a disruption.
Disaster recovery as a service (DRaaS)
DRaaS is the commercially available version of cloud DR. In DRaaS, a third party provides replication and hosting of an organization's physical and virtual servers. The provider assumes responsibility for implementing the DR plan when a crisis arises, based on a service-level agreement.
Learn more about matching your business needs with available DR options.
Disaster recovery services and vendors
Disaster recovery vendors can take many forms, as DR is more than just an IT issue. DR vendors include those selling backup and recovery software as well as those offering hosted or managed services. Because DR is also an element of organizational risk management, some vendors couple disaster recovery with other aspects of security planning, such as incident response and emergency planning. Options include:
- Backup and data protection platforms
- DraaS providers
- Add-on services from data center and colocation providers, and
- Infrastructure as a service providers
Choosing the best option for an organization will ultimately depend on top-level business continuity plans and data protection goals, and which option best meets those needs along with budgetary goals.
Some of the major disaster recovery software and DRaaS providers include, but are not limited to:
- Dell EMC
Emergency communication vendors are also a key part of the recovery process, and include Everbridge Crisis Management, Cisco, Rave Alert, AlertMedia and BlackBerry AtHoc.
Download a free SLA template for use with disaster recovery products and services.
While some organizations may find it a challenge to invest in comprehensive disaster recovery planning, none can afford to ignore the concept when planning for long-term growth and sustainability. Additionally, if the worst were to happen, organizations that have prioritized DR will experience less downtime and be able to resume normal operations faster.