Technology disaster recovery is an essential activity for organizations of any size and business type. Protection of technology assets from unplanned disruptions ensures that the organization will continue to function.
Disaster recovery plans give IT departments the policies and procedures to quickly respond to incidents, launch remedial actions and recover critical systems and data so the business can return to normal operations. Organizations and their disaster recovery teams must take into account many different aspects of a comprehensive and actionable DR plan.
One major aspect of technology disaster recovery is asking the right questions. This may include general questions on DR planning, questions about service providers and questions for senior management. Below are 10 disaster recovery questions for 2023 and thereafter. They include technology considerations, threat awareness and preparation, and determining if the plan effectively addresses specific event scenarios.
1. Does senior management understand the DR program and its value to the business?
While no C-level executive wants to be faced with a loss of IT resources, senior leaders in all major business units must be aware of the DR program, what it is supposed to do and how it can help ensure the continued operation of the business. Senior management will probably be regular users of technology and various IT resources, yet they might not understand what is needed to keep those resources running.
The DR team – and, to an equally important level, C-level executives -- must be able to reinforce the importance of the disaster recovery program to other senior leaders. As such, it is incumbent for DR team leadership to not only sell DR to the IT department but also to communicate the value of DR to all other departments.
2. Is the DR team prioritizing the most important IT assets for recovery?
Before a plan can be developed, there are a series of other DR activities team members must perform. These include a business impact analysis (BIA) and a risk analysis.
When conducting a BIA, the organization gathers process-level information about the various business functions performed by the company. A thorough BIA must identify the following:
- Systems, data and databases, network services and other IT resources the organization uses to support business processes.
- Internal and external dependencies each business unit has with regard to technology.
- Key recovery performance metrics, such as recovery time objectives and recovery point objectives.
- Financial, operational and reputational impacts from a loss of IT resources.
- People resources needed to manage recovery.
- The priority of recovery based on the identified criticality of each IT resource.
- Primary and alternate vendor resources.
The risk assessment takes information gathered in the BIA and examines risks, threats and vulnerabilities the company faces that could disrupt IT operations. Risk assessments are especially important for mission-critical functions and those with the shortest recovery time frames. A regularly updated inventory of IT assets can also be beneficial for DR planners.
3. What is the backup strategy for systems, data and other critical resources?
Assuming the company has a data management program, a backup and recovery plan should be part of those processes. DR teams, in collaboration with other IT personnel, should identify the most appropriate approach for backups. They should determine what must be backed up, the frequency of backups, where data backups will be located, the network bandwidth needed for recovery, and how data will be retrieved and then recovered for use in production.
Access to cloud-based backup and recovery services makes the entire process easier, but DR teams must still consider issues such as speed of retrieval and recovery when adopting cloud processes. Review and test primary and backup power systems regularly to ensure they are working properly. Keep up to date on patching, especially for systems involved with security and other protective resources. An overall backup and recovery strategy, with associated policies and procedures, is an essential resource to establish before developing a DR plan.
4. Has the organization identified and validated disaster scenarios?
During the risk analysis, DR teams identify risks, threats and vulnerabilities and assess their effect on the organization. From this work, DR teams can develop specific scenarios to help formulate procedures to follow if such events were to occur. For example, high-likelihood risks might include severe weather, such as snow, ice, tornados and flooding. By contrast, if the company’s location is not in an earthquake zone, that scenario can be moved to a lower position on the list.
Scenarios where some advance warning is possible, such as weather events, provide a window of time to prepare for possible technology disruptions. The key in scenario planning is to identify situations that present the most significant negative consequences to the company. These are high on the list and should be a primary focus of DR procedure development.
5. Who is responsible for executing DR plans?
While the answer is, logically, the DR team, other employees can and should participate. These members might include senior managers in mission-critical departments or employees responsible for important business functions.
Each of these designated employees needs to identify one or more alternates that can take over if the primary employees are unavailable. Each player must be trained on the DR plan, especially on the systems and resources needed to function. One or more senior-level employees must be designated as the person(s) authorized to declare a disaster and launch DR activities. These same people must be authorized to launch business continuity plans in case the event escalates beyond IT to one that threatens the whole business.
6. How do vendors and other third parties fit into the DR plan?
Considering that most IT departments work with multiple vendors for just about everything in their data centers, disaster recovery teams should also have questions prepared for service providers. Depending on how the IT infrastructure is architected, vendors can play different roles in a disaster. For example, if cloud vendors handle data storage, backups, and retrieval and recovery, the DR team must understand vendor responsibilities as well as their own during a disaster.
Often these responsibilities are spelled out in a service-level agreement. If IT teams want greater control of recovery and restoration activities, then they must not delegate those activities outside the company. In the case of data backup and recovery, an example of this is a hybrid configuration where critical systems and data are backed up locally on NAS or RAID devices as well as to a cloud service. The local NAS/RAID data becomes the primary recovery resource since the recovery time is likely to be faster than from a cloud.
7. How will staff manage IT operations if the data center is inaccessible?
The COVID-19 pandemic demonstrated the importance and strategic value of remote access to IT resources. Advances in network connectivity, such as VPNs and greater bandwidth, have greatly simplified the issues of remote work.
This can also be applicable to remote data center management. If, for example, a cloud service is backing up IT resources -- especially critical data and applications or VMs -- it should be easy to remotely log into the cloud-deployed systems. Users should be able to remotely manage production systems and email as usual, assuming no network outages have occurred.
8. How will the plan meet compliance standards, regulations and other requirements?
Depending on the type of business, it could be necessary to document that all IT data protection and operational recovery activities comply with applicable standards and regulations. These can be at the local, state, federal and international levels. Not only is knowledge of the applicable standards essential, but failure to comply can result in litigation, fines and other penalties.
It is also important from an audit perspective, since auditors seek evidence of compliance with applicable standards. Regular reviews and assessments of DR plans must include an examination of how well the plans comply with standards and regulations.
9. How can the DR team ensure the plan achieves its goals when executed?
All too often, disaster recovery plans are created and probably tested once, but the DR team might not be certain the plan will run successfully when the time comes. Regular testing is perhaps the most important part of disaster recovery planning. An unknown plan has a greater chance of causing problems than one that is frequently tested. DR teams must always ask, “Can this plan be executed, and will it provide the required results when activated?”
Scenario testing is another viable way to examine the efficacy of DR plans. Base the test on the occurrence of a specific event, such as a fire, flood or severe weather. Run through the plan’s steps to determine how well it addresses the scenario.
Some organizations take an all-hazards approach to testing. While this might result in a more robust and adaptable plan, it also can incur extra costs, such as additional backup resources and cloud recovery capabilities. The best strategy is to test at least twice a year or quarterly, test critical system recovery more frequently and update DR plans after tests have been completed.
10. Are funds available to pay for emergency IT recovery activities?
With all the various initiatives that can be taken in disaster recovery, almost everything comes back to the costs to achieve the desired outcomes. Technology investments have been discussed here, and DR teams must also consider people costs, vendor costs, costs for external help, testing costs and even emergency funds to procure equipment in a critical situation. When preparing a budget for DR activities, it can be useful to secure approval of emergency funds by senior management in advance.