Getty Images

How to manage storage resiliency

Resilient storage has never been more important to an organization than it is now. Ensure storage systems can respond to issues with a strong game plan.

No matter where it resides, data is susceptible to cyberattacks, natural disasters, hardware failures and a variety of other threats. The more resilient the storage systems, the more adeptly they can handle and recover from such threats.

Storage resiliency strives to deliver continuous services during an incident and to recover quicker from its impact, with minimal damage to the data or the organization's reputation. While it can be challenging, it is critical for organizations to have the right strategy and services.

Why is resilience important in storage?

More than ever, organizations rely on their data to deliver services, carry out operations and make informed business decisions. The rising threat of malware and ransomware continues to put that data at risk. Security and IT professionals are under greater pressure than ever to protect the data, while ensuring its integrity and availability in the event of an attack.

Today's organizations need primary storage systems resilient enough to withstand cyberattack or other types of threats. Unfortunately, many organizations continue to treat cybersecurity as a separate concern from storage resiliency, leaving a significant gap in protecting an organization's data.

Vendors offer a variety of definitions of what resilience means, often focusing on availability above all other considerations. Resilience is about much more than availability, however. The resilient system must be able to detect adverse events, respond to those events appropriately and recover from them as quickly as possible.

A strong resilience strategy should incorporate an organization's entire inventory of storage systems, whether they're used for primary or secondary storage. With such a strategy in place, an organization can realize important benefits:

  • Prevent or minimize data loss, while maintaining data integrity.
  • Recover faster from an event with minimal downtime and disruptions to service.
  • Increase fault tolerance, availability and reliability.
  • Achieve security, privacy and compliance goals more effectively.
  • Protect an organization's reputation and revenue flow.
  • Maintain business continuity (BC) and quality of service.

One of the biggest motivators for ensuring resilient storage is the rise in cyberattacks and their growing sophistication. According to a recent report from Splunk, 87% of respondents claimed their organizations had been a target of ransomware in 2023, while 52% of the organizations suffered some type of data breach.

Cyberattacks are not the only risks, however. Storage systems also face threats from natural disasters and power outages. Data loss can result from hardware or software failures, system malfunctions, insider threats or simply human error. Ensure storage resiliency to protect data against the wide range of possible threats.

Backup and recovery are only part of the larger storage resiliency effort, which requires a number of systems and processes to ensure security, privacy, availability, reliability, integrity and BC.

What strong storage resiliency looks like

Some organizations consider their backup and recovery systems to represent a reasonable resilience strategy. However, backup and recovery are only part of the larger storage resiliency effort, which requires a number of systems and processes to ensure security, privacy, availability, reliability, integrity and BC.

A resilience strategy is an ongoing effort an organization needs to tailor to its specific storage and data requirements. As part of this effort, security and IT personnel should consider the following measures:

  • Disaster recovery plan. A comprehensive DR plan defines how to recover if the storage and data are hit with a cyberattack or other disruption. The plan should consider the organization's recovery time objectives (RTOs) and recovery point objectives (RPOs).
  • Resilience preparation. IT should inventory the organization's current data assets and determine which is the most important and sensitive. Assess data and storage infrastructures for potential vulnerabilities and risks. Develop contingency plans for getting up and running if an event occurs that IT cannot resolve. Evaluate and choose software or services that help storage resiliency.
  • Backup and recovery. Like the production data, backup and archived data must have full protection and adhere to applicable regulations. Following the 3-2-1 backup rule, at least one storage system should be off-site. The recovery process should be quick and efficient enough to meet the stated RTO and RPO goals.
  • Redundancy and replication. Without copies, a single event -- even one as basic as a drive failure -- could result in data loss. IT teams often use multiple methods to create redundant data, such as replication, RAID storage and regular backups. At least one of those copies should be physically separate from the primary storage. Redundancy also applies to the hardware that houses the data, such as server, network and data center components.
  • Security. Security defenses include firewalls, antimalware, intrusion detection systems and access controls that adhere to the principle of least privilege. Cybersecurity should feature end-to-end data encryption, as well as ongoing monitoring and system analytics that look for anomalies and potential threats.
  • Maintenance. Administrators should keep the software and firmware up to date, especially when it comes to security patches. The storage systems and their supporting hardware should be in optimal working order, with a close eye on potential failures. Prioritize security -- not only when it comes to protecting the current data, but also as it applies to deleting data and decommissioning storage devices.
  • Testing and assessment. Conduct ongoing risk assessments, verify backups, and conduct security and storage audits. IT teams should continually test DR plans to ensure that they're up to date, work as expected, and continue to meet their defined RTO and RPO goals.
  • Education and training. Storage administrators and users must understand the potential risks to data and what's at stake if data is compromised or destroyed. They should know how to mitigate those risks and what to do if an incident occurs. Key personnel should stay current on emerging threats and how to minimize the risks from these threats.

While critical to BC, a storage resiliency strategy has challenges, starting with the costs and complexities.

Implementing a resiliency strategy requires an investment in software, hardware and skilled personnel. Organizations must balance their resilience needs against their available budgets, which often means compromising one or the other. Ideally, an organization includes all its storage systems, whether on premises or in the cloud, but this is not always possible, leaving some data at higher risk.

Security and IT teams must also contend with multiple environments, complex integrations, and large and varied data sets, adding to the overall complexities and costs. They must remain diligent in complying with applicable regulations, which can be a complex and time-consuming undertaking.

Chart of IT resilience

A brief summary of what vendors offer

Despite the challenges that come with implementing a comprehensive resiliency strategy, awareness about storage and data resilience has been growing across the industry. Many vendors now build resiliency features into their products and services, as in the following examples -- listed in alphabetical order:

  • AWS provides a fully managed, policy-based backup service. It also offers services such as Application Recovery Controller for managing application recoveries, the Elastic Disaster Recovery service for point-in-time recovery and Resilience Hub for working with an application's resilience posture.
  • Commvault's Cyber Resilience service includes features such as backup and recovery, autonomous recovery, threat scanning within backup data, cyber deception and threat detection, and data access governance.
  • Druva's Data Resiliency Cloud service offers air-gapped, immutable backups in object-based storage, with automatic vulnerability scans, patches and upgrades. The service provides point-in-time data recovery, automated incident response, a centralized security and governance dashboard, and an AI-driven posture and observability dashboard.
  • IBM Spectrum Virtualize, a software-defined storage product, includes the Safeguarded Copy feature, which creates immutable copies of the data to help protect against ransomware and other threats.
  • Infinidat's InfiniSafe offers cyber resilience for both primary and secondary storage. InfiniSafe includes four types of protection: logical air-gapping, immutable snapshots, near-instantaneous recovery and a fenced forensic environment for spinning up immutable copies of primary or secondary data.
  • Microsoft SharePoint includes resilience protections for both its metadata and user data, which is hosted on Azure Blob Storage. For the metadata, SharePoint uses replication and a proprietary automation technology for failover. For the user data, SharePoint writes every file simultaneously to both primary and secondary data regions.
  • Nutanix offers snapshots, storage replication, self-healing, block awareness and degraded node detection. Nutanix systems also contain redundant physical components, including power supplies, along with a network fabric that can sustain individual link, node or block failures.
  • Pure Storage Evergreen includes the multilayered Assured Data Resilience, built into the platform's architecture. Pure Storage also offers the Zero Data Loss Guarantee across its Evergreen portfolio, as well as Pure Protect DR as a service.
  • ShardSecure, a data security and privacy protection platform, offers a number of features for ensuring storage resiliency. It includes high availability, self-healing data, data integrity checks and multiple Microshard containers to provide redundancy and real-time data distribution across customer-owned storage repositories.

Robert Sheldon is a technical consultant and freelance technology writer. He has written numerous books, articles and training materials related to Windows, databases, business intelligence and other areas of technology.

Next Steps

Immutable storage: What it is, why it's used and how it works

Dig Deeper on Storage architecture and strategy