Creating a disaster recovery plan involves more than just making a backup that can be restored following a data loss event. If a disaster recovery plan is to be successful, then it must be carefully constructed to account for any risks to the organization and provide solutions that will allow the organization to operate normally in times of crisis. Here are 12 key steps in preparing for such disasters.
1. Take inventory of hardware and software
The first step in creating any disaster recovery plan is to take a comprehensive hardware and software inventory. It's impossible to adequately plan for disaster response unless you have identified the resources that need to be protected. Such an inventory should also list any assets that might be useful in either preventing or recovering from a disaster.
2. Determine equipment needs
Once you've created a hardware and software inventory, then the next thing that must be done is to determine any equipment needs. For example, you might find that you need a higher capacity backup target, or perhaps you might need to install some additional servers to provide redundancy for various workloads. Your goal during this phase of the planning process should be to identify what disaster mitigation and recovery strategies currently exist and where you might be able to overcome deficiencies by adding some additional equipment to your data center.
3. Set up RTO and RPO
Another key step in preparing a disaster recovery plan is to define a recovery time objective (RTO) and a recovery point objective (RPO). The RTO essentially defines the maximum amount of time that it should take to recover from disaster, whereas the RPO determines how frequently data is backed up as well as the volume of data that could potentially be lost in a disaster due to not yet having been backed up.
Some organizations establish a formal service level agreement (SLA) guaranteeing that the established RPO and RTO will be met. Unless an organization requires a formal SLA, you should carefully consider whether or not creating an SLA is in your best interest. On one hand, an SLA could potentially work against you. IT pros have been fired for not meeting an SLA during a recovery process. On the other hand, an SLA guarantees that you'll have a certain amount of time to work through the recovery process and can prevent upper management from claiming you didn't restore everything quickly enough.
4. Conduct risk assessment and business impact analysis
As you work toward creating a disaster recovery plan, it's important to conduct a risk assessment and business impact analysis. The goal behind doing so is to identify risks to your organization's ability to do business, and then quantify those risks based on the likelihood that they'll occur and the severity of the impact that those risks might have to your organization. Natural disasters, for example, are likely to occur at some point and could potentially be devastating to the organization. As such, natural disasters are something that you should definitely plan for.
Fire is another type of disaster that you should plan for. If this disaster occurs, it could severely harm your organization, so it's important to include the potential for fire among your disaster planning efforts. Of course, natural disasters and fire are just two examples of adverse situations that can impact an organization. A good disaster recovery plan identifies potential risks beyond just fire or natural disaster.
5. Identify roles and responsibilities for the team
A good disaster recovery plan can only succeed if an organization's disaster recovery team is prepared to execute that plan. Determine who will be a part of the disaster recovery team, and what each person's responsibilities will be. This must be done in a way that ensures that each disaster recovery team member knows exactly what is expected of them in times of crisis.
6. Outline and detail prevention mitigation
The best disaster recovery plan is the one that prevents a disaster from happening in the first place. Organizations should look for ways to prevent disasters from impacting their business. For example, automated fire suppression systems might mean the difference between a small fire and a large fire that destroys the entire data center. Similarly, redundant storage arrays might prevent certain types of data loss events.
7. Choose disaster recovery sites
A big part of disaster recovery planning is to form a business continuity plan. A business continuity plan defines how the organization will function after a disaster. One of the primary aspects of any business continuity plan is to define a disaster recovery site. If an organization's primary data center is impacted by a disaster, then key workloads must be able to fail over to an alternate location where they can continue to run. This might be a secondary data center, or it could be a location in the public cloud. In any case, organizations must decide what they'll use as their recovery site.
8. Outline and detail response procedures
A good disaster recovery plan should include steps used to recover from a disaster. It's one thing for a disaster recovery plan to indicate that a certain workload should be moved to a public cloud if it becomes impossible to run it in its normal location. It's an entirely different thing for the disaster recovery plan to include the actual steps that a technician will need to perform to move the workload.
There should be no ambiguity in times of disaster. Those who are responsible for recovering from the disaster should ideally have a list of very specific steps that they can take to complete the required task. Otherwise, the stress involved in trying to recover from a disaster is likely to cause technicians to forget to perform a necessary step in the process.
9. Create a crisis communication plan
An often overlooked step in a disaster recovery plan is to create a crisis communication plan. Having stakeholders constantly pestering the IT staff about the progress it's making in recovering from a disaster only adds to the stress of the situation and does nothing to expedite the recovery process. That being the case, someone from IT should be designated as the point person who will relay key information to stakeholders throughout the organization.
10. Plan for failback
If a disaster impacts an organization's hot site -- the location at which workloads are normally running -- and those workloads must be moved to an alternate location, there should be a plan for bringing those workloads back to their original location once the disaster is over. Although some organizations have used disasters as an excuse to permanently migrate workloads to the cloud, disaster induced migrations should be the rare exception, not the rule. Presumably, workloads are running in their current location for a reason and they should ideally be brought back to that location once it's safe to do so.
11. Be prepared on what to say to the press
When a large organization suffers a major outage, it always draws attention from the outside world. That being the case, it's important to have a plan for dealing with the press. Statements to the press must be carefully constructed, because those statements can impact the organization's stock prices, future revenues and, depending on the type of disaster, can even expose the organization to civil or regulatory penalties.
12. Run continuous practice tests and update to ensure effectiveness
Finally, make sure that the disaster recovery plan is regularly tested. Even a perfect disaster recovery plan won't be perfect forever. Organizations change over time. They adopt new workloads and retire old workloads. Changing technology and business needs will eventually render a disaster recovery plan obsolete. Continual testing is the only way of making sure that an organization's disaster recovery plan keeps pace with the changes that are occurring throughout the organization.