kokotewan - stock.adobe.com

Tip

Data center power outage causes and how to prevent them

Mitigating data center power outages is crucial for business survival. Learn the strategies today's organizations use to safeguard their critical infrastructure from disruptions.

Today's sophisticated data centers handle mission-critical operations and processes, and it is not feasible to shut them down -- even for a short duration. IT and disaster recovery teams must be prepared to mitigate data center outages.

Power disruptions or failures might not result in a complete blackout, but can still negatively affect operations in the data center. Disruptions can cause a partial or complete shutdown of the data center or below-standard operation. Even a partial lag with critical systems might result in unacceptable performance of data center equipment, violating service-level agreements or losing customer trust.

Despite all the precautions organizations can take to provide uninterrupted power to data centers, situations can occur that threaten their continued operations. Emergency power strategies are a vital part of DR planning. Data centers are seriously at risk without emergency power systems and strategies to protect their power supplies.

While no power system is 100% infallible, organizations can deploy safeguards to reduce the likelihood of an unplanned disruption. The goal is to minimize the potential for component failure and get operations back to normal levels as soon as possible. This article will discuss common causes for data center power outages and offer tips on mitigating them.

Common causes of data center power outages

There are several common causes of data center power outages, each with their own destructive effects. IT and DR personnel should be familiar with these disruptions and understand how they might affect existing infrastructure.

Weather-related events

Severe storms, earthquakes, tsunamis, hurricanes, tornadoes, flooding, mudslides or lightning strikes can damage power lines and critical utility infrastructure, which can affect the delivery of power to a broad geographic area. Extreme temperatures can overload cooling systems, potentially leading to shutdowns.

Utility company disruptions

The national power grid in the U.S. comprises many interconnected power systems. Data centers can lose power during regional power grid failures or brownouts, which can be caused by high demand or equipment failure. Additionally, the national critical infrastructure continues to age, which can lead to outages.

Equipment malfunction

Failure of primary or backup systems can lead to prolonged outages for utility companies and end users alike. Faulty hardware or software in power management systems can also cause outages.

Human error

Employees in utility companies have a huge responsibility to keep power flowing, and inadequate employee training can cause mistakes during maintenance or system upgrades. Even experienced utility technicians can occasionally make a mistake.

Cybersecurity incidents

Cybersecurity attacks are a growing threat to the nation's critical power infrastructure. Targeted ransomware attacks or hacking of power monitoring software can be exploited to threaten power generation and delivery.

Strategies to prevent future outages

Protecting data centers from unplanned power outages requires a well-designed program of maintenance, testing, documentation, monitoring and analysis of power performance data. The following is a list of key strategies for establishing a robust, secure and survivable power environment:

  • Electric power companies are major partners in data center operations. Close cooperation with carriers and regular reviews of power quality keep organizations updated and informed of the status of their resources.
  • Power quality can vary greatly by provider, so it is essential to invest in equipment that removes or minimizes power anomalies such as voltage or frequency fluctuations, sags, spikes, surges, brownouts or blackouts. This includes power conditioners, line filters, surge suppressors, lightning arresters and many other devices.
  • Obtaining primary commercial power from two different power grids and routing that power to the data center via diverse paths, if possible, can improve an organization's chances of recovering in a power outage. However, the costs to engineer and construct such a diverse power infrastructure can be prohibitive.
  • In a medium to large data center, emergency power systems typically include a centralized uninterruptible power supply (UPS) system providing continuous power if commercial power is lost. If the tanks are refueled, motor-based generators can run indefinitely.
  • Establish primary and alternate sources of fuel for emergency generators and, if possible, arrange for expedited fuel delivery, even if it costs extra.
  • Configure emergency power systems to deliver emergency power for anticipated computer loads, the data center's HVAC system, telecom closets, emergency lights and other loads as needed.
  • Size the emergency power system to handle the anticipated loads.
  • If modular UPS equipment is used, the backup power array can be expanded via additional UPS modules and batteries.
  • To make sure emergency power systems will work when needed, perform regular tests, especially with a medium to full electrical load.
  • A program for maintenance is essential, in addition to regular testing. This includes scheduling tests of primary and backup power systems, regular inspections, and following manufacturer recommendations for maintenance and support.
  • Benchmarking is another strategy for power protection. This means establishing a tracking mechanism that documents the results of every test. Such data can help indicate potential problems before they occur.
  • Consider installing emergency power systems equipped with load banks capable of providing loads equaling 100% of the generator capacity. This enables full testing without affecting data center operations.
  • Develop emergency procedures for responding to power problems while minimizing the effects on critical data center systems. Such procedures should list step-by-step actions to take for a given type of emergency.
  • Be sure to have access to trained maintenance personnel to facilitate a power system recovery. If on-site employees are not familiar with power system operation, obtain the data from equipment manufacturers or work with a contractor who specializes in power systems.
  • Make sure power system documentation is up to date and that the documents are available in electronic and hard-copy versions.
  • Locate primary and backup power systems in secure areas to prevent unauthorized access.
  • If possible, commission power systems prior to placing them in service. Commissioning examines and tests all power system components end-to-end across the data center to make sure all components work together properly.
  • Invest in AI technology to enhance monitoring, problem detection and response, and compliance with regulatory standards.

The role of AI in preventing outages

Many of the strategies in this article can be performed with artificial intelligence. Today's power management systems have AI elements that handle the following functions:

  • Predictive maintenance. AI can analyze system performance data using algorithms that can predict potential failures in power equipment.
  • Energy optimization. AI tools can use power consumption patterns to optimize energy usage and system efficiency.
  • Identifying and responding to potential faults. Detection of potential fault conditions using AI identifies anomalies in real time and launches a response autonomously.
  • Real-time load management. Upon detecting a power issue, AI tools can automatically reposition workloads across computing devices during power interruptions, maintaining mission-critical operations.
  • Support for data center disaster recovery. Data center power system administrators can use AI-driven simulations and scenario planning to prepare for power outages.
  • Automated remote monitoring. AI can monitor power activities remotely and support monitoring of multiple data centers.

The real cost of data center power outages

Loss of data center power can damage businesses of all sizes, in any industry. The consequences of a disruption can include failure to deliver products and services on time, loss of customers, loss of revenue and reputational damage.

For example, in 2024, 60 data centers in northern Virginia simultaneously switched to backup generators, almost causing blackouts, due to a lightning arrester failure on a high-voltage transmission line.

According to Uptime Institute, which provides guidance on protecting data centers from outages and increasing uptime and availability, 70% of outages cost more than $100,000, while some can end up costing millions from lost customer revenue and reputational damage.

Uptime Institute's 2024 report noted that approximately 55% of organizations reported at least one data center outage in the past three years. The report also said failures in power and cooling systems accounted for 71% of these outages, with human error being a significant contributing factor.

Paul Kirvan, FBCI, CISA, is an independent consultant and technical writer with more than 35 years of experience in business continuity, disaster recovery, resilience, cybersecurity, GRC, telecom and technical writing.

Dig Deeper on Disaster recovery facilities and operations