Top 5 IT disaster scenarios DR teams must test

While most organizations are prepared to face small-scale interruptions, they cannot overlook a larger, more complex crisis just because it seems less likely to occur.

Typical interruptions IT teams prepare for are common events such as disk failures or power outages. However, there are several more IT disaster scenarios that businesses must address to be fully protected.

The root of many IT disasters is often that the people responsible for recovery did not consider anything beyond hardware failure or accidental or malicious loss of data. Unfortunately, threats and scenarios are always evolving, so disaster recovery plans must do the same.

There are many forms of disaster that can affect the availability of IT services, and some might be more relevant to individual organizations than others. It is a prudent move to assess which risks are most likely to threaten a company's infrastructure and services. A risk assessment matrix is one tool that can help determine the likelihood of a disaster occurring as well as its severity.

Below are five possible IT disaster scenarios that DR teams must prepare for and tips on how they can do that, regardless of business size and type, location, and infrastructure.

Failed backups

Failed backups are some of the most frequent IT disasters. Businesses can replace hardware and software, but if the data and all backups are gone, bringing them back might be impossible or incredibly expensive.

Some organizations might not realize that their offices lie in flood plains or earthquake-prone areas until it is too late. Mitigation against such issues takes a degree of forward planning.

Sys admins must periodically test their ability to restore from backups to ensure backups are working correctly and the restore process does not have some unseen fatal flaw. At the same time, there should always be multiple generations of backups, with some of those backup sets off site.

Natural disasters

Natural disasters can take many forms, including fires, floods and earthquakes. While the type of disaster might vary by region, just about all of them can damage hardware and cause data loss. Many can render the worksite inaccessible for long periods of time.

The ability to fail into the cloud to keep core services working means that while not every application is available, those that are essential to run the business are. Building in infrastructure to make remote work a viable option is another way to prepare for a variety of natural disasters.

Having the abilities to fail into the cloud and work off site takes some forethought, planning and application, but pays massive dividends should a disaster occur. Repairing and replacing buildings and hardware can take more time than people estimate, and a business that is unable to function during recovery is at risk of serious financial losses.

Example of a color-coded risk assessment matrix.
DR teams can use a risk assessment matrix to determine the likelihood and severity of different IT disaster scenarios.

Ransomware attacks

Ransomware is not only one of the most damaging disasters that can happen to a business, but it is perhaps the most likely as well. It only takes one person with sufficient privileges to click on a wrong link to cause chaos.

Defending against ransomware is neither trivial nor cheap. A lot of modern ransomware has intelligence to make sure that it does not activate until after it has compromised several generations of backups.

There are many ways to reduce the risk of a ransomware attack, but no single preventive tool. Keeping application and OS patches up to date, scanning email for questionable attachments, restricting access to external media and providing good user education will help.

Network interruptions

This IT disaster scenario is one that happens often, unfortunately. For example, heavy machinery can sever cables, rendering the network inaccessible. Network interruptions are an increasingly urgent concern as more IT systems become SaaS-based. Network connectivity is essential to join and use the SaaS system.

Fortunately, the fix for this has become easily available and inexpensive in recent years. A secondary line is one option for small businesses, and most network routers offer 4G or 5G networks as a backup. While not ideal, it makes network interruption less of a disaster and more of an inconvenience. Incorporating backup connectivity does have a cost, but it might be worth it when the alternative is an office full of staff who cannot work.

Hardware failure

Hardware failure can take many forms, including a system not using RAID, a single disk loss taking down a whole system, faulty network switches and power supply failures.

Most hardware-based IT disaster scenarios can be mitigated with relative ease, but at the cost of added complexity and a price tag. One example is a database server. Such a server can be turned into a database cluster with highly available storage and networking. The cost for doing this would easily double the cost of a single nonredundant server. Administrators would also have to undergo training to manage such an environment.

Hardware failure can affect the cloud as well. However, it is usually abstracted out, and there are several copies of the data to rebuild and continue with.

Stuart Burns is a virtualization expert at a Fortune 500 company. He specializes in VMware and system integration with additional expertise in disaster recovery and systems management. Burns received vExpert status in 2015.

