Alex - stock.adobe.com

How to operationalize a strong cyber-resilience plan

Attackers are becoming increasingly powerful, and regulators are demanding proof of cyber-resilience. In this threat landscape, cyber-resilience plans need to change, too.

Cyber-resilience refers to an organization's ability to anticipate, withstand, respond to and recover from cyberincidents while maintaining or restoring critical business operations.

The definition is easy enough to understand. Making sure it actually happens is the tricky part.

Resilience used to focus on backup systems and other recovery technologies. What's needed today is a broader review of potential vulnerabilities, critical assets, recovery capabilities, software development protocols, regulatory requirements and system design. In addition, organizations need to understand what happens to the business when key third-party services are suddenly unavailable.

What's needed is an update.

Why organizations need to rethink their cyber-resilience strategy

First and foremost, threat actors have become more efficient. They know which systems to target and how to execute those attacks quickly, especially when assisted by AI. Plus, attackers often compromise traditional backup and recovery systems.

"Most enterprise backup solutions today focus on protecting personal information and customer data, and that's really driven by regulatory pressures," said Mark Orsi, CEO of the Global Resilience Foundation. "We need to expand that definition of critical data to include applications, network configurations, system images and infrastructure configurations to survive crises, and all of that has to be stored in distributed and immutable backups."

This requires a shift from perimeter defense to operational continuity that focuses on limiting impact, restoring critical services and learning from the event.

"Three forces drove this: Attack surfaces scaled exponentially with cloud and AI adoption, nation-state threat sophistication outpaced prevention-focused defenses, and regulators began demanding demonstrable recovery capabilities, not just control counts," said Madelein van der Hout, an analyst at Forrester. "Modern resilience now requires maintaining customer trust and operational continuity during incidents, not just before them."

Some organizations also face geopolitical threats. "Nation-state attacks on critical infrastructure increased more than 40% in 2025, and campaigns like Salt Typhoon compromised over 600 organizations across 80 countries, in some cases going undetected for years," van der Hout said, adding that assets in war zones, including data centers, are at risk of both cyber and kinetic attacks.

How to build a strong cyber-resilience plan

From a security standpoint, a strong cyber-resilience plan starts with an understanding of the organization's threat landscape. What are the most likely types of attacks? Who are the likely threat actors? What are the attackers' likely goals?

"Understanding the particular cyberthreats that impact your industry, your sector and your geography is foundational to make sure that you're prioritizing the right mitigations," said Sharon Chand, U.S. cyber defense and resilience lead at Deloitte.

Business operations awareness is also critical. Everyone involved in response and recovery must understand in advance what's needed to keep the business running at a minimally viable level.

While the basics of cyber-resilience still revolve around patch management, network segmentation, endpoint protection, backup and recovery, and threat detection, modern cyber-resilience plans should also consider the following key elements.

Minimum viability

Cyber-resilience is impossible without knowing which systems and data must remain available to keep a business operating at a minimal level. This knowledge determines what to prioritize and which mitigations are needed.

"Restoration of a cloud environment or an application is far less important than restoration of a capability to service a customer. Understanding what that is from top to bottom has to be part of that resilience strategy," Chand said.

Automation

Humans can no longer respond to attacks quickly enough, so automating wherever possible is an important part of cyber-resilience. Take, for example, business impact assessments of potential threats, which should be updated annually, and testing, which must be conducted regularly -- two often manual processes.

"There's an opportunity to embrace agentic AI in automating a lot of what needs to happen for a company to be cyber-resilient," Chand said. "It can be very complicated for an organization to run an integrated test across a business service because that's taking down elements of infrastructure, it's failing over applications into redundant environments."

AI testing

Many AI platforms now offer security-specific services to locate and remediate vulnerabilities in code and networks. Put a plan in place for AI to look for those vulnerabilities before a threat actor can find them, said Rich Mogull, chief analyst at the Cloud Security Alliance. "Everything I write [for a CSA project] goes through a security review using the LLM. It's very effective at finding things." He also recommended setting up a schedule for AI testing.

Detection and response speed

Orsi anticipates threat actors building more automation into their attacks by using AI and chaining together attacks. "We're seeing the time for an exploit to become an attack down from days to hours to eventually minutes," he said. "We'll have to really be good about how we address those things at speed and scale."

"Mean time to detect and respond has replaced prevention rates as the primary metric," van der Hout said, noting that investment is shifting to continuous security testing and exposure management.

"The window [to patch and remediate] is closing to effectively zero," Mogull added. "True resiliency now is going to require more preventive security boundaries within our organizations because we're going to lose the ability to patch before the attackers execute their attacks."

Recovery architecture

Security experts without fail emphasize the need for immutable backups, which cannot be modified, deleted or overwritten. It is also important to isolate backup systems in recovery vaults separate from production networks. The goal is to recover data to its last known good state.

"The restoration timelines and the fundamental technology that needs to be restored have really shifted," Chand said. "Now enterprises are looking at things like immutable storage for their data to be able to recover to a particular point in time that was known to be good. They're looking at clean-room technologies, where they can restore the business process infrastructure application data layers into a clean room and verify that it's free from bad actors." She also stressed the need to know where critical data resides and the security and criticality requirements of that data.

Simulated scenarios

Security teams should run simulations and tabletop exercises to test their recovery systems and processes against their most likely threats. These simulations need to include internal stakeholders, such as legal, finance, business operations and communications team members.

Agentic AI can help run these simulations. "It can be very complicated for an organization to run an integrated test across a business service because that's taking down elements of infrastructure, it's failing over applications into redundant environments," Chand said. "What if we can simulate that effectively within a digital twin environment? What if we can simulate that effectively using agentic AI to do more of a continuous testing approach rather than a periodic or annual approach?"

Orsi suggested including a few key vendors and suppliers in the testing exercises and using industry-specific critical infrastructure exercises, such as those provided by ISACs. Another approach is to work with a neutral third party for exercises with organizations and their third parties. This collective resilience helps organizations better understand the impact of systemic security events.

Software development and testing practices

Security and DevOps teams often have competing goals. Developers are under pressure to push out code quickly, while security teams want time to test for vulnerabilities. Because attackers can and will use AI to find vulnerabilities in production code, defenders and developers need to cooperate on software security.

Third-party risk management

Every organization shares risk and recovery responsibilities with external providers.

"Third-party risk management has moved to the board level, with contractual requirements now covering breach notification service-level agreements and quantum security migration plans," van der Hout said, adding that partners should provide software bill of materials validation against disclosed vulnerabilities and third-party compromise containment procedures.

Mogull recommended organizations have a clear understanding of internet-based third-party dependencies. A connection to Salesforce, for example, could be a failure point that an incident response plan must account for. "If you lose that connection, can you fail gracefully," Mogull said.

Device resiliency

An organization might have thousands of IoT devices connected to its networks -- some of which are likely vulnerable to attack.

"The core concerns are long device lifecycles that prevent patching, IT/OT convergence that exposes previously air-gapped networks, and resource constraints that prevent endpoint agent deployment, forcing security to operate at the network level," van der Hout said.

A cyber-resilient IoT strategy should cover the following, van der Hout said:

  • Continuous automated asset discovery with real-time communication baseline monitoring.
  • Risk-based microsegmentation with virtual patching for devices that can't be updated.
  • Zero-trust principles with least-privilege access.
  • Validation testing through breach and attack simulation specifically targeting IoT infrastructure.

"The ultimate measure isn't just device security, it's operational recoverability -- the ability to detect compromised devices, contain them, replace or rebuild them, and keep the wider environment running," she said.

Quantum-safe transition

While quantum computing is still in an experimental phase, it will be here soon enough, so organizations should be preparing for its arrival. Plus, harvest-now, decrypt-later attacks are already happening, van der Hout noted.

To prepare for post-quantum cryptography, she recommended cryptographic algorithm discovery, replacement procedures and verification of vendors' quantum readiness. Teams should also consult NIST's post-quantum cryptography recommendations for guidance on completing cryptographic inventory and migration roadmaps.

Decision trees and communication protocols

Organizations must appoint points of contact and communications protocols before an event occurs. They need to determine who has the authority to make certain decisions during a security incident. As such, response playbooks should address executive communication, legal and regulatory notification, crisis messaging and operating under degraded conditions.

"In a fast-moving incident, the first challenge is often not just technical recovery, but understanding what's happening, gaining visibility into it, understanding who's impacted, what mitigations are safe, who needs to be in the room," Orsi said. "One way to do this is to build contactless and escalation paths. If you're doing that during the incident, you're already behind."

Creating those paths, he said, requires secure communication channels, named contacts at critical vendors, pre-approved decision frameworks and playbooks for moving into an impaired but acceptable mode of service for customers at different priority levels.

Staff stress

Finally, don't forget the human factor. A crisis can exhaust and demoralize teams. To avoid burnout, Mogull suggested bringing in contract help, especially if the teams were over-extended before the incident.

Michael Nadeau is an award-winning journalist and editor who covers IT and energy tech. He has held senior positions at CSO Online, BYTE magazine, SAP Experts/SAP Insider and 80 Micro. Nadeau also writes the PowerTown blog on Substack for stakeholders in local renewable energy initiatives. Follow him on Bluesky at @mnadeau.bsky.social.

Dig Deeper on Risk management