Blue Planet Studio - stock.adobe

Guest Post

Will agentic AI transform enterprise disaster recovery?

Guest contributor Catalin Voicu argues that the hesitation around agentic AI in disaster recovery isn’t about capability, it’s about trust and accountability.

Disaster recovery has always been the last sector of IT to modernize, and honestly, there's a reason for that. When a disaster strikes and customers are hammering your inbox, no IT leader wants to gamble on whether an AI agent correctly understood which critical resources to prioritize or what load balancer to restore in order to get back to a healthy production state.

I've spent 15 years working with cloud backup and disaster recovery (DR) teams, helping them evolve their procedures, and I can vouch for the skepticism within these cloud ops teams. They have good reason to hesitate jumping on the agentic AI train.

But right now, I'm watching that window of hesitation start to close.

Two decades of cloud, one labor model

We've had cloud infrastructure for nearly 20 years. Enterprises have graduated from tape to object storage, built multi-AZ replication, embraced snapshot-based retention, and moved petabytes off-prem. The technology stack transformed quickly, and new services popped up at a rocket’s pace. But the operational model has barely changed.

For example, I can guarantee that someone on your backup and disaster recovery team is still manually tagging cloud resources. If budget is a concern (and it usually is), a team member is still writing one-off scripts with shoddy documentation to push cold data to cheaper storage tiers. And somewhere, right now, there is a very tired engineer who got woken up at 2 a.m. to log into a backup console due to a hyperscaler health degradation alert, press the right buttons, validate connections and confirm their data integrity checks out before the rest of his colleagues woke up.

The technology evolved, but the labor model is still stuck in the early aughts, when the cloud was just pioneering. That's the gap agentic AI is built to close.

What agentic AI actually brings to the table

When I talk about agentic AI in disaster recovery, I'm not talking about a smarter dashboard or AI-generated DR runbooks (although these will certainly come later). Agentic AI will initially focus on automating those systems that can act autonomously, without a human in the loop, and it will do so across the five areas that consume more DR team hours than anything else:

  1. Automated resource tagging. Today, manual classification to identify critical resources or departmental ones takes up many hours. Agentic systems can automatically identify and classify resources for you.
  2. Continuous security validation. Instead of scheduled compliance audits that catch problems after the fact, AI agents can monitor your environment in real time — flagging misconfigurations before they become recovery risks.
  3. Self-executing recovery. This is the big one. Recovery workflows can be triggered and completed automatically, without someone needing to manually validate each step under pressure at 3 a.m.
  4. Adaptive backup strategies. Backup frequency and geographic replication can adjust dynamically based on actual usage patterns and risk signals, rather than static schedules set by someone two years ago.
  5. Autonomous cost optimization. Aging backups can be continuously tiered to cheaper storage without manual oversight — balancing cost against retention compliance on an ongoing basis.

None of these are science fiction. We're already seeing agentic capabilities deployed in adjacent IT domains. The technology is mature enough. So why isn't DR moving?

The real barrier is trust, not technology

According to Deloitte's 2025 survey of AI leaders, the top barriers to agentic AI adoption were legacy system integration and risk and compliance concerns — with nearly 60% of respondents citing both. In disaster recovery, those concerns are amplified by one uncomfortable truth: A failed agentic action in DR doesn't mean a delayed deployment. It can mean permanent data loss, compliance violations and customers who never come back.

That stakes profile changes everything. It's why so few DR teams have seriously budgeted for AI at all. They've watched other departments chase genAI ROI that hasn't materialized. They've seen the headlines. They're not going to be the team that automated their way into a catastrophic outage.

What will actually move the needle isn't a capabilities demo. It's a proof of concept with clearly defined guardrails — autonomous actions constrained within policy boundaries, audit trails that satisfy compliance teams, and recovery outcomes that work correctly, not just faster. Leaders need to see it work before they sign off on removing the human from the loop.

A failed agentic action in DR doesn't mean a delayed deployment. It can mean permanent data loss, compliance violations and customers who never come back.
Catalin VoicuCloud Solutions Engineer, N2W

The accountability problem nobody is talking about

Even beyond technology and compliance, there's a cultural shift that I think is harder than anything else. When an agentic system reconfigures your recovery architecture at 2 a.m. and something goes wrong, who owns that outcome?

The answer isn't obvious, and most organizations aren't ready for it. Engineers move from operators to validators. Architects become policy stewards. The accountability structure changes fundamentally — and that organizational rewiring takes longer than any software deployment.

This is why DR will lag behind other sectors. Not because the AI can't do it, but because trust and accountability have to be redesigned alongside the automation.

The resilience gap is coming

It may take a few more years than other parts of IT, but the enterprises that figure out how to govern autonomous disaster recovery — not just automate it — will have a meaningful, durable resilience advantage. Those that don't will still have someone manually tagging resources when the next major outage hits.

The gap is real. The technology is ready. The question now is whether the organizations are.

Catalin Voicu is the Cloud Solutions Engineer at N2W.

Dig Deeper on Disaster recovery planning and management