Alex -


Build and maintain digital resilience for a stronger DR program

A digital resilience program builds on existing preventive and restorative activities by identifying the ways an organization's infrastructure can become more resilient.

As more organizations engage in digital transformation activities, they must ensure those initiatives can survive potential disruptions. That is digital resilience.

It is not enough to simply create disaster recovery plans that recover damaged systems. There must be a way for technology infrastructures to adapt from events and update how they operate to be better prepared for future events.

To achieve digital resilience, organizations must carefully examine all aspects of a technology infrastructure. This includes data centers, systems, networks, security, power and HVAC systems, supply chains and personnel. Any situations that may negatively affect the organization's ability to recover and restore its technology infrastructure must be identified and remediated.

Achieving digital resilience is a complex process that begins with senior management's understanding of its importance and then their support for its implementation. Once a plan is in place, regular testing and review of completed plans and procedures ensure that the organization remains resilient over time.

What does digital resilience look like in an organization?

Starting at the data center, every aspect of a facility must be examined for potential failure points. To achieve digital resilience, organizations must carefully examine the following data center elements.

Facility security Building access controls, closed-circuit TV systems
Power systems Primary commercial power, backup power systems
Power system protection Surge suppressors, lightning arrestors, grounding
HVAC systems Temperature and humidity management
Water detection systems Under raised floor water detection alarms
Lighting system Primary lighting, emergency lighting
Hardware Servers, networking devices, equipment racks, furniture
Software Applications, OSes, utilities
Cybersecurity Cybersecurity software, ransomware prevention software
Network perimeter security Firewalls, intrusion detection and prevention systems
Staffing Operations staff, engineering, programming, maintenance, administration
Policies and procedures Security policies, operational procedures
Storage systems Primary on-site storage, off-site storage, cloud storage
Redundant components Devices and software that are ready to be used if production assets are disabled or destroyed

Organizations must also take inventory of smoke detection devices and fire detection and suppression systems, as well as the existence and accessibility of windows and doors, especially emergency exit doors.

Examine each of the above elements to ensure they are available in an event that threatens to damage the data center. Organizations must extend that same level of care throughout the business in terms of the devices being used by employees, applications, how data is backed up, how device access security is managed, and how systems and applications are securely stored.

Remote working is one of the key tools in digital resilience, as it reduces the likelihood that a disruptive event halts the business. This, of course, hinges on the technology needed for remote work: remote access software licenses, sufficient network bandwidth and remote devices configured with cybersecurity and access control software to protect them from potential breaches.

Initiatives to achieve digital resilience

Figure 1 depicts a sequence of activities with actions to be taken along a path to digital resilience. It starts with business as usual and activities to perform to prepare the technology infrastructure for potential disruptions.

When a disruptive event occurs, organizations launch steps to respond to and recover digital resources. Once the event has ended and IT teams have assessed the consequences, the organization identifies ways to adapt its existing technology resources and emergency planning to better recover in future events.

Achieve digital resilience
Figure 1

Assuming the organization can recover and resume business operations, a new normal in digital operations may evolve, based on the results of post-event assessments. The objective of this activity is to identify and implement improved methods for responding to disruptions.

The following activities are recommended as part of a program to increase digital resilience. They can be broken up into four main stages:

  1. Pre-event prevention. These are the ongoing activities an organization performs to reduce the likelihood of a disruption, including technology monitoring, risk assessments, resilience plan development and regular system updates. Pre-event prevention also involves testing existing strategies for vulnerabilities and establishing disaster recovery policies and procedures.
  2. Incident response. These procedures identify how the organization initially responds to a digital disruption. They can include damage assessment for natural disasters or capturing and quarantining suspicious data if the disruption is a cyber attack.
  3. Recovery and resumption of operations. These steps complete the restoration process and return systems to normal production status. At this stage, organizations enact procedures to recover systems, networks and security. Typically embedded in disaster recovery plans, these steps define how the organization recovers and restores its digital infrastructure
  4. Adapt strategy based on event outcomes. These are actions to take post-event that identify ways the organization can adapt its operations and increase its ability to recover from an event. This stage can include the promotion of resilience initiatives and training by the organization to employees and IT teams, as well as scheduling reviews of the digital infrastructure throughout the year.

Common roadblocks to digital resilience

The decision to embark on a digital resilience program is likely to be costly, owing to the systems development, programming, hardware and software acquisitions, testing and other activities during the process.

Investments in a digital resilience program may include third-party support, such as consultants and experienced vendors that have knowledge on digital resilience activities. Other costs can include specialized software, additional hardware and increased use of cloud services for critical systems and data.

Along with additional expenses, a lack of senior management support or employee participation can hinder digital resilience initiatives. Thorough assessments and employee training in the pre-event stage can bolster efforts to establish a culture of digital resilience.

Paul Kirvan is an independent consultant, IT auditor and technical writer, editor and educator. He has more than 25 years of experience in business continuity, disaster recovery, security, enterprise risk management, telecom and IT auditing.

Dig Deeper on Disaster recovery facilities and operations

Data Backup