Petya Petrova - Fotolia
No one series of steps can deal with a network failure. Failures differ. Perhaps no traffic is moving at all, or maybe the problem is confined to just one part of the network. Perhaps access to a critical application is blocked, but does that application execute on a local server or from a SaaS provider?
Recovery steps vary based on the nature of the failure. That's why it's important to plan for each type of problem. Essentially, recovery planning depends on understanding how each type of failure could occur: Why would the entire network become inoperable?
Possible reasons for network failure
Power issues. The most obvious problem is power to the switches and routers failed. By planning in advance for this possibility, you can add battery backup and run a secondary link to the power supplier so your network doesn't go down when someone drives into a power pole.
Updates and configurations. Another potential reason for network failure is issues with software updates or network configuration. Were switches updated overnight with new software, or did a network-wide configuration change roll out? No update should be done without testing. But, in any case, the network must retain the previous software version and network configuration, and teams should be ready to quickly restore them.
Teams should also log all changes to the network. Each change must include an explanation of what was changed, why it was changed and who made the change. Organizations should carefully control access to admin passwords so only qualified personnel can introduce changes.
Hardware problems. Hardware device failure is always a possibility when only part of the network is down. Network teams should design the network with redundancy so alternate paths exist for any failing device, but performance may degrade to the extent that the network is unusable. Network monitors should make it quickly apparent which device has failed.
Server troubles. Failure of a single application may be due to an issue with the server on which it runs or because of a network failure on the path to the server. Teams should plan in advance to ensure enough extra server capacity is available to move the application elsewhere. If the network is the issue, it may be due to a hardware failure, a software update or a configuration change.
SaaS provider issues. Failure of a SaaS provider can be more difficult. While your business may depend on the provider, its operation is out of your hands. Again, advance planning is vital. Before signing up, insist on examining the provider's backup plans and on choosing service-level agreements that provide adequate guarantees.
Investigate disaster recovery-as-a-service providers. They may offer a way to deal with a SaaS failure. Make sure you can gain access to an updated copy of your data and to the software required to access it.
Finally, don't panic when a problem occurs. When users report the network is down, it could actually be down, or the problem could be confined to one application. Determine the extent of the problem, take out the appropriate plan and then follow it.
Dig Deeper on Network Infrastructure
Related Q&A from David Jacobs
UDP is a simple protocol, but it has inherent vulnerabilities that make it prone to attacks, such as limited packet verification, IP spoofing and ... Continue Reading
Connectivity over longer distances and higher data rates are some of the major differences that separate Carrier Ethernet from traditional wired ... Continue Reading
Network load balancing and application load balancing both handle traffic requests. But they process and direct those requests with different levels ... Continue Reading