Free1970 - stock.adobe.com

Tip

Business continuity failures: 5 real-world examples to study

Business continuity failures are costly and can significantly harm a company's reputation. These five high-profile examples demonstrate what can go wrong when a plan fails.

Stuart Burns

Published: 16 Jun 2025

The best business continuity planning happens before an incident takes place, but IT teams can learn from the mistakes of others to bolster their own planning.

No one likes publicizing their mistakes, and organizations that experience a business continuity crisis are no different. Because each business continuity failure presents a learning opportunity for other businesses, it's unfortunate that real-life examples can be hard to track down -- that is, unless the organization has a high-enough profile for the issue to make the news.

Although a news article won't provide IT teams with an understanding of a particular company's business continuity plan and how it helps critical business functions continue in the event of serious disruption or disaster, such failure stories offer insights into the aspects of a company's plan that were likely missing or followed incorrectly.

Below are five examples of major business continuity failures, how they happened and what IT teams can do to prevent the same thing from happening at their organizations.

CrowdStrike security update crashes millions of Windows systems

On July 19, 2024, security vendor CrowdStrike sent out an update that caused one of the largest IT outages in history. Estimates say that approximately 8.5 million Windows devices were affected, majorly disrupting airlines, healthcare systems, financial services and media outlets.

Experts estimate that the outage will cost the affected Fortune 500 companies $5.4 billion. As a result of the outage, CrowdStrike reformed its update procedures to prevent similar disruptions from occurring in the future.

What we can learn

A major takeaway from the CrowdStrike outage was the overreliance of organizations on tech and automation. While automated updates keep IT systems up to date, they forgo testing those updates before deployment. While there might be manual workarounds for such a disruption, like there was with CrowdStrike, many organizations now lack enough onsite personnel to run them.

IT automation is beneficial in many ways, but the CrowdStrike outage emphasizes the importance of having a human in the loop for critical processes.

Business continuity planning lifecycle diagram. — Organizations must regularly review and update business continuity plans to ensure their effectiveness.

FAA system failure causes U.S. ground stop

On Jan. 11, 2023, thousands of flights across the U.S. were grounded due to an hourslong Federal Aviation Administration (FAA) system outage of the Notice to Air Missions (NOTAM) database. NOTAM is a critical system that pilots must consult before takeoff to inform them of hazards and runway closures.

The NOTAM system is also old.

While the FAA said the root issue was a deleted file, the outage time could have been significantly reduced if the legacy infrastructure had offered the high availability of more up-to-date systems. It might be a tall order to replace a longstanding, internationally used system such as NOTAM, but organizations that are resistant to replacing existing systems can learn from this business continuity failure. Outdated systems that prevent implementing current standards and recovery times make business continuity more difficult than it already is.

What we can learn

IT teams in organizations that -- for whatever reason -- cannot replace outdated legacy systems should prioritize business continuity strategies such as knowing how to test without interrupting operations, finding high availability processes and verifying backup integrity. They can also point to high-profile incidents, such as the FAA system outage, as evidence for new system needs.

Microsoft Azure/Office outage halts users internationally

Also in January 2023, Microsoft had a major outage that affected users across the globe, but especially in Europe.

The outage left many business and personal users unable to access email and files or manage Azure infrastructure. The root cause was eventually tracked down to a bad routing change Microsoft made to its core routing infrastructure.

What we can learn

Unfortunately, no one-size-fits-all fix for cloud computing exists. Larger businesses can mitigate outages by using multiple zones. In that situation, each region has multiple data centers that are hundreds of miles away from each other and share no resources, so loss of a single zone does not take down the environment.

Smaller companies might find it more useful to use built-in disaster recovery tools, such as those in Azure, to completely fail over and get back up and running quickly. This does require some preplanning, but does not require the complexity and cost of a multizone setup with redundancy.

Larger organizations with higher availability requirements can instead use the availability features to handle a downed data center by having redundancy and rerouting of traffic.

Fire damages OVHcloud's data center -- and reputation

Not even the biggest companies with endless resources can prevent natural disasters from occurring. In the case of extreme weather, business continuity is a matter of being prepared. Unfortunately, OVHcloud was not.

In March 2021, one of the cloud provider's data centers caught fire, and the fire suppression measures were not up to the job. Many clients woke up to find their rented servers offline. To make things worse, one of the backup arrays was completely destroyed in the fire, losing critical backups that the service provider could have used to recover customer data.

This crisis not only affected immediate business functions -- OVHcloud's reputation suffered due to the outage, and it was the subject of a $10 million class action lawsuit from more than 140 of its clients.

What we can learn

The OVHcloud business continuity failure illustrates the importance of the 3-2-1 rule of data backup. Multiple backups, on different hardware, in different locations are the most surefire way to keep data safe in a fire or natural disaster. That way, if the data center is destroyed, there is still a data backup elsewhere that the client can restore to get services working again.

Ransomware compromises NHS Foundation Trust

The National Health Service (NHS) is one of the largest employers in the U.K. Downtime costs significant money and endangers public healthcare, making the Aug. 4, 2022, ransomware attack on the NHS a prime example of a disastrous business continuity failure.

The attack, which targeted a major software provider for the NHS, took several months to remediate fully. During the initial stages, the front-line staff had to revert to pen and paper, and make do with whatever records they had that were not computer-based. Part of the delay in service restoration was the impact on legacy systems.

However, there was a bigger problem with this failure: hidden shadow IT systems installed by employees with little to no professional IT oversight.

What we can learn

Legacy IT systems frequently incur a higher maintenance cost and are more likely to be neglected when it comes to maintenance and updates. It is easier said than done, but one way to avoid these issues is by replacing legacy systems.

Organizations must also have strict policies regarding the acquisition and management of IT systems and software. Any purchase must be tightly managed and done in conjunction with IT staff approval, since they are often aware of issues that less technically savvy managers might not know about.

Stuart Burns is an enterprise Linux administrator at a leading company that specializes in catastrophe and disaster modeling.

Dig Deeper on Disaster recovery planning and management

Part of: Tips for maintaining business continuity over time

Up Next

How often should you review a business continuity plan?

Business continuity plans are not a one-and-done deal. Before a disaster strikes, ensure your organization's BC plan is up to date with regular reviews.

Business continuity maturity model: An at-a-glance guide

Business continuity maturity models are effective tools to improve disaster recovery processes and define an organization's desired level of BC preparedness.

Business continuity failures: 5 real-world examples to study

Business continuity failures are costly and can significantly harm a company's reputation. These five high-profile examples demonstrate what can go wrong when a plan fails.

Business continuity failures: 5 real-world examples to study

Business continuity failures are costly and can significantly harm a company's reputation. These five high-profile examples demonstrate what can go wrong when a plan fails.

CrowdStrike security update crashes millions of Windows systems

What we can learn

FAA system failure causes U.S. ground stop

What we can learn

Microsoft Azure/Office outage halts users internationally

What we can learn

Fire damages OVHcloud's data center -- and reputation

What we can learn

Ransomware compromises NHS Foundation Trust

What we can learn

Dig Deeper on Disaster recovery planning and management

Texas judge throws out second lawsuit over CrowdStrike outage

One year on from the CrowdStrike outage: What have we learned?

2024: the year misconfigurations exposed digital vulnerabilities

Channel catch-up: News in brief