Arjuna Kodisinghe - stock.adobe.
Understand the costs of a disaster recovery failure
A disaster recovery plan may seem like an unnecessary expense to some businesses, but a lack of preparation will cost you.
Some organizations question how much to invest in a disaster recovery plan. They're reluctant to allocate significant budget and staff to systems they'll need in the face of a disaster that may, or may not, occur.
But when a major system goes offline, and recovery and failover processes must kick in for business to resume, the value of a DR plan is indisputable.
Without an established DR plan and tool set, any amount of downtime is extended. This is why the cost of disaster recovery failure, or not preparing a DR plan, is potentially much greater than the costs associated with establishing the plan itself.
The integrity of a DR strategy is critical. Unfortunately, a lot of companies do not consistently plan, test and update DR processes to incorporate changing requirements, which leaves many simply trying their luck.
Prepare for different types of failures
Broadly speaking, system failures that require DR fall into two categories: infrastructure failure and localized failure. Infrastructure failures include data centers and storage systems, while localized failures tend to affect applications and their dependencies.
Application failover is generally straightforward, as the number of resources an organization requires is small. These may include managed resources or available spare capacity, either in the cloud or at an alternative DR site. Failing over data center infrastructure in its entirety, however, is expensive, even for the largest companies.
The risk of an entire infrastructure loss is low, but does happen in the event of a fire, natural disaster or other physically destructive event. Ransomware attacks and other security incidents can also compromise entire systems.
Assess the costs of DR failure
In many lines of business, system failure can have a massive effect. If an organization provides on-demand services, it may have to meet legal requirements and strict service-level agreement clauses. If it fails to meet these requirements because of extended periods of downtime, the repercussions can be costly.
It's important for businesses to quantify the cost of a system failure. This includes the cost of recovery, downtime repercussions and how often the failure is likely to occur on a yearly basis. Organizations can perform risk assessments to determine potential threats and the associated consequences and costs. Risk assessments will help identify the most likely threats and what they may cost the organization. The results of an assessment will provide insight to prepare for potential risks. For example, if an organization's data center is in an area that deals with frequent power outages, a risk assessment will help create a disaster recovery plan to deal with that specific scenario.
For organizations concerned about spending money on crises that may not even happen, risk assessments ensure they budget an appropriate amount to address the most likely scenarios. Organizations must also pay attention to less likely or unpredictable events, because a failure to plan for a natural disaster or a ransomware attack leaves them vulnerable to additional costs and legal repercussions.
What to prioritize
It can be difficult to define services that are critical to maintain during a crisis versus those that the organization can rebuild or restore after the disaster has passed. Focus first on services that are essential for business functionality and work backward. This will ensure all required resources are available for critical systems.
Ideally, all organizations would be able to comfortably protect themselves from all risks without breaking the bank. However, that is not the case. While the costs of disaster recovery failure are powerful motivators to prepare for all possible worst-case scenarios, that is not financially feasible for every business.
Not every part of an organization will need a disaster recovery plan. DR involves licenses, bandwidth, storage and system admin time. The less an organization has to protect with a DR strategy, the more it will save. The equation of what to protect is difficult to balance, but essentially boils down to the following categories:
- Must-have infrastructure. This is infrastructure the organization cannot function without.
- "Nice-to-have" infrastructure. This is infrastructure that is recoverable only if the DR budget allows it.
- Everything else. This category can be left to recover last without potential to damage the organization.
An enterprise network diagram maps all relationships and dependencies within an IT infrastructure, serving as a blue print of system requirements that can guide organizations through these tasks.
Which DR option is the most cost-effective?
Failover to cloud is increasingly popular because it doesn't require organizations to own physical DR assets that often sit there idle. While organizations do incur a cost when they fail over to the cloud, these costs are small in comparison to owning and maintaining an entire physical infrastructure. Cloud is the go-to DR option for most smaller companies, not only because of cost and simplicity, but also because of the support they can get from the cloud provider.
For bigger environments, a warm physical site is a consideration. However, warm sites are incredibly expensive, with costs that include power, light, HVAC, cooling, server hardware, network infrastructure, staffing and security. However, all those resources belong to the organization to use exclusively.
With the cloud, this is not always the case. Organizations can acquire cloud resources in two models: on-demand consumption or designating and paying for resources upfront. If a geographically local issue occurs and every business wants to fail into the cloud, there may not be enough resources available to satisfy demand. Users can pay a premium to reserve space for DR should they need it, but that will vary by provider.
Lastly, one of the major cloud DR costs that catches companies off guard relates to storage. A key metric here is the rate of change for an organization's data. The rate of change indicates both bandwidth and storage requirements. Most cloud providers bill based on the cost per machine and disk, as well as bandwidth usage. While cloud provider costs are typically clear and predictable, these costs can add up quickly, depending on the provider, so always be aware of the rate of change.