Editor's note: This article was expanded and updated in November 2017.
Most organizations depend on voice and data communications, and also have LANs and WANs. With such a critical and strategic investment in networking, how do you protect those valuable investments from unplanned disruptions due to carrier problems or equipment malfunctions? And how do you know your network infrastructure is secure and protected from unauthorized access, viruses or attacks by hackers? Critical network infrastructures and associated assets must be protected with disaster recovery plans.
In this article and its associated network disaster recovery plan template, we'll examine the issues that should be addressed when preparing and deploying a network disaster recovery plan for voice and data communications.
Network disaster recovery planning is not always a priority
When it comes to network infrastructures, disaster recovery planning often isn't a huge priority. Network security, by contrast, is usually a high priority because a porous perimeter spells doom for most organizations. Preventing unauthorized access by hackers and other criminal types and halting the introduction of viruses and denial-of-service attacks are usually high priorities that get management attention.
On the voice side, growing acceptance of voice over IP (VoIP) technology has increased the importance of robust network security initiatives. As VoIP is simply another application using existing network resources, it has vulnerabilities that must be addressed the same as other network-based systems.
Older PBS systems typically used separate network facilities and did not overlap with data networks. However, as it became more cost-effective to share voice and data traffic over digital T-1 lines, the risks to voice communications systems increased.
Why networking requires its own specific DR plan
Network disaster recovery planning is critical for enterprise LANs, no matter how large the organization or potential disaster. And you require a network-specific DR plan for the same reasons network security has become such a high priority. With voice, data, internet access and other network services often sharing the same network resources, it is essential to not only protect network access lines and the interface devices -- routers and switches -- that support these services, but to be able to get those network resources back up and running in the event of an interruption.
A resilient network environment is also indispensable for providing business continuity. This enables you to access services, local or remote physical or virtual servers, and storage, irrespective of location: a central data center, colocation facility, managed service provider or the cloud. Business continuation is particularly crucial for branch or remote offices that require remote access information in the public cloud or at company headquarters.
Getting started with network disaster recovery planning
Before you get started creating your network disaster recovery plan, read these important caveats:
- Take the disaster recovery planning process seriously. If you want to protect your network infrastructures and related assets from unplanned events that could disrupt network operations, you need a plan. It doesn't have to be hundreds of pages long; a one-page plan with the right information can be more valuable than a voluminous document that nobody can use.
- Use business continuity standards as a starting point. Almost two dozen BC/DR standards are available worldwide.
- Keep it simple. Depending on how your voice/data/internet/wireless networks are configured, your plans will need to reflect that same level of structure and complexity.
- Limit content to actual disaster response actions. Assuming you are creating a plan to respond to specific network-related incidents, include only the information needed for the response and subsequent recovery.
- Keep the plan up to date and test often. Once the plan is complete, exercise the plan at least twice annually -- more often if your network configuration changes frequently -- to ensure that documented procedures make sense in the sequence indicated.
- Be flexible. A single disaster recovery template may not be applicable to all networks, especially if your organization has many corporate locations served by the network and multiple data centers. You may want to consider more complex templates, specialized network DR software or consultants experienced in network disaster recovery.
Network disaster recovery plan components
Next, we'll examine the structure and content of the network disaster recovery plan template, indicating key issues to address and activities to perform.
- Initial data. Once you have identified primary and backup networking staff to contact in a network disruption, position their contact data at the front of the plan so you won't have to waste valuable seconds paging through a lengthy document.
- Revision management. Have a page that reflects your change management process.
- Purpose and scope. Provide details about these attributes, as well as assumptions, team descriptions, a list of terms and other background information.
- Emergency instructions on how to activate the plan. Provide data on circumstances under which the plan will be activated, including outage time frames, who declares a disaster, who should be contacted and response procedures to be used.
- Policy information. If the IT department has a BC/DR policy, be sure to include relevant policy information; this is also a good place to reference the use of standards documents.
- Details of the plan. If possible, provide step-by-step procedures, as these are easier to follow than broad general statements, such as "reconfigure network channels to alternate location," which may require significant detail to complete properly. In addition, describe how often the plan is to be reviewed and updated, and by whom.
- Checklists and flow diagrams. Assuming a network disruption has occurred, identify steps to address it; these can be in the form of checklists and flow diagrams that provide a high-level view of response and recovery.
- Information gathering. Information needs to be gathered before officially declaring a network disruption; this includes network performance data and firsthand reports from IT staff and employees and first responders, if needed. Convene meetings as soon as possible with key IT network emergency team members to evaluate the facts before proceeding to a declaration.
- Declaring a disaster. Once the initial facts about the network disruption are obtained, the plan should list actions to take when it becomes necessary to declare a network disaster.
- Recovering from a disaster. Once the situation has been brought under control, subsequent parts of the plan should provide instructions on recovering and restoring network operations, restoring network connectivity devices, and related activities.
- Appendixes. Detailed appendixes should be provided at the end of the template; these should include lists and contact details on all IT and non-IT emergency teams, primary and alternate network vendors, alternate network configuration data, and other relevant information. It is very important to keep this information up to date.
Why a recovery site is necessary
As with all things DR related, a recovery site is an important component of any network DR plan; it is where you can recover and restore your IT infrastructure and operations when your primary data center becomes unavailable. There are two types of recovery sites: external and internal.
An internal site normally consists of a second data center owned and operated by another organization that the company can depend on to recover and resume operations should disaster befall its primary data center. Organizations with aggressive recovery time objectives and large information requirements tend to go with an internal DR recovery site.
There are three types of external recovery sites: hot, warm and cold. Hot sites are fully staffed, functional and ready to go in the event of a disaster. Warm sites have some or all of the necessary hardware, software, network services and personnel, but not the data. A cold site, which can be used to complement hot and warm sites, is an option for when a disaster lasts for an extended period. It has the infrastructure to support data and IT systems, but doesn't include the specific technologies necessary to run business operations until an organization installs that equipment and activates its DR plan.
Common mistakes of network disaster recovery planning
Redundancy and diversity are the fundamental components when planning a resilient and survivable data and communication network. With that in mind, businesses often make a few common internal and external mistakes when preparing a network DR plan.
Externally, companies often don't take the time to closely examine the network infrastructure of their primary and alternate carriers outside of their building when assessing the redundancy and resilience of voice and data networks. Where does service enter, and is there just a single entry point? Is service delivered using overhead wires or underground? If the former, where are the poles located, and are they in the path of oncoming transportation? If the latter, what type of conduit does the carrier use to carry service to the building: Is the cable simply buried or is it safely tucked away in a hardened conduit? What level of transport diversity to and from the building does the local telephone company provide? If a path is blocked, will voice and data service continue over another route?
Jon Toigo, managing principal of Toigo Partners International, discusses how to recover data, rehost applications and reinstantiate networks.
Internally, make sure you design your network infrastructure with diversity and redundancy in mind. That way, there will be no single points of failure that can take the network down. Are your network connectivity devices configured in a redundant arrangement, and are there extra switches, routers and other network devices available? Do you use more than one internet service provider (ISP) or have an alternate ISP ready to go should service from your primary ISP go down? What level of diversity -- or physically separate connections -- does your primary ISP provide to deliver service?
How to audit/maintain/update a plan
Auditing and testing a network DR plan helps to make sure it stays up to date and continues to meet the needs of your organization. It ensures the plan you have in place addresses your network technology issues, people and processes, and has the pertinent controls to work as expected when confronted with an actual emergency.
The results of an audit can detect areas of the plan that are incomplete, lack proper procedures and suitable documentation, are untested, and aren't up to date. As with an overarching DR audit, one that focuses on a networking DR plan should address:
- DR policies and mission statement
- Continual updating of written DR plan
- Designated hot, cold and/or warm site
- Data, systems and network recovery
- Regular backup of data and systems processes
- Drilling and testing of disaster procedures
- Backups of data and systems stored off site
- Designated DR committee and chairperson
- Listing of all emergency telephone numbers
- Effective communication procedures
- Up-to-date and authenticated system and operational documentation
- Emergency procedures
- Redundancy for essential personnel
- Software, hardware and networking vendor lists
- In-place automated and manual procedures
- External service-level agreements and contracts
The process of developing a network disaster recovery plan should be a relatively easy process. Plan complexity may increase, however, if you have a very complex network with multiple technologies and a complex topology. The keys to success include defining step-by-step response and recovery procedures, validating these activities through tests and keeping the plan up to date.