Maksim Kabakou - Fotolia
High-performance computing requires storage systems with the necessary performance and capacity to ensure reliable operations, but these systems come with a hefty price tag.
The TCO for high-performance computing (HPC) storage goes beyond the initial price per gigabyte, which means IT teams must consider other variables. Here are eight factors that contribute to HPC storage costs, in addition to the price of the system itself.
1. Evolving business requirements
Business requirements change over time, as do the workloads that support them. HPC storage must accommodate these changes, which might demand reconfigurations, upgrades and capacity additions, in a timely fashion. For example, a storage system should be able to scale with minimal downtime. Delays can lead to lost revenue and decreased productivity. Even if disruptions are minimal, IT must still purchase and deploy the additional hardware, which adds to the overall TCO.
In some cases, an organization might overprovision its storage system to accommodate future business requirements and workload fluctuations. This, however, can translate to unnecessary expenditures and higher long-term maintenance costs, which also increases TCO. As it is, HPC systems rarely achieve 100% utilization -- it's often closer to 80% -- and over-provisioned storage can further affect long-term costs. At the same time, be careful not to over-use HPC storage systems, as this can hurt application performance and productivity.
2. Reliability and availability
An HPC storage system should support continuous operations with minimal service disruption. The system should be able to handle drive and node failures, as well as other unplanned interruptions, while maintaining availability and performance. There are two ways to meet these requirements: invest in the right storage infrastructure upfront or spend more time and money later on to keep the system running. Both options affect TCO, but the latter often results in higher HPC storage costs.
Another consideration is the storage configuration. For example, certain RAID levels maintain parity across drives to increase reliability, but this requires additional raw capacity. A highly available storage system typically implements redundant components, such as multipath I/O or dual controllers. The IT team might also hold onto spare parts, such as extra power supplies. In addition, staff must regularly update or replace components. All of these factors add to the TCO.
3. System and data protection
For most organizations, protecting storage systems and their data is a top priority, but these protections come at a cost.
Security features built into storage hardware, software and firmware can add to the TCO, as do the everyday operations required to maintain a secure infrastructure. For example, it takes time and resources to manage private keys and access controls, monitor storage and network systems, ensure compliance with applicable regulations, and conduct regular security and compliance audits.
A disaster recovery strategy also adds to the TCO. Backups, snapshots and failover operations typically require additional equipment, software or services, as well as personnel. At the same time, IT teams must physically protect storage systems, which requires additional fire alarms, detection devices or other monitoring tools.
4. Supporting software and services
Storage vendors typically offer optional support and services contracts, which can substantially increase a system's TCO. The exact amount will depend on the vendor and the level of service. Organizations can choose minimal coverage at a lower upfront cost, but this could increase their risk of extended downtime, leading to higher HPC storage costs down the road. Some third-party companies offer storage maintenance services, which can be cheaper than the vendor's plans, but IT teams must still factor these costs into the TCO.
The TCO should also reflect any licensing fees for software that supports the storage system. This includes software that directly facilitates storage operations, as well as third-party software -- such as a specialized file system or software-defined storage -- that's used in conjunction with the HPC storage system. In addition, TCO calculations should account for any other systems or services needed to manage and monitor the storage infrastructure.
5. Storage network infrastructure
A storage system must be able to communicate with other HPC components and beyond. This requires a reliable high-speed network infrastructure that can sustain operations.
Whether the network fabric is Ethernet, Fibre Channel or InfiniBand, IT will need to deploy and maintain components such as cables, switches, adapters or load balancers. Network costs are trickier to calculate when it comes to storage TCO because other HPC components share the network. Even so, the storage TCO should reflect at least a percentage of those networking costs.
Like the storage system itself, the network infrastructure might have its own service contracts or software licensing fees. In addition, the network will likely include redundant components, such as switches or adapters, to avoid any single point of failure. IT might maintain spare parts or take other steps to limit downtime and maintain performance. Component refresh cycles also add to network costs.
6. Operating environment
An HPC storage system requires data center space to operate. Although today's denser HPC systems can reduce some of that space, IT teams should still include these costs in TCO estimates, along with related data center maintenance and repair expenses.
Another big expenditure is the energy needed to power and cool the storage system. Technologies such as flash storage can mitigate power consumption, but they still add to overall energy costs.
The TCO should account for any required changes to the data center to prepare for the storage system. For example, denser rack storage might require updated power supplies, enhanced cooling or reinforced flooring. The data center might also need more cabling or fire protection systems, as well as additional redundancy, such as an extra generator or uninterruptible power supply.
7. Staffing requirements
Another big expenditure is the personnel needed to deploy and maintain an HPC storage system. This includes the time IT staff spends to procure, set up, configure, integrate and test the system. It also includes ongoing management efforts, which can be complex and time-consuming. IT teams must ensure systems run at peak performance, with minimal disruptions and downtime, which can add significantly to the TCO.
The transition process to new storage systems, including the migration of data from legacy systems to HPC storage, also requires time and resources. In some cases, IT teams might need to recruit qualified personnel to deploy and maintain the system, or they might need to train existing personnel. This depends on the storage system itself and the staff's qualifications. Either way, some investment will likely be necessary.
8. Unscheduled downtime
To reduce expenses, an organization might be tempted to purchase a less reliable storage system, or limit investments in staff training and hiring. This, however, can result in longer and more frequent downtime -- and a potential loss in revenue.
In a May 2020 study published by Hyperion Research, about half of those surveyed reported that their HPC storage systems failed once a month or more. Those respondents said downtimes ranged from less than a day to over a week, and that one day of downtime could cost from under $100,000 to over $1 million.
HPC service disruptions have a severe impact on organizations that rely on the technology for continued productivity and innovation. When the systems go down, their work often comes to a halt, leading to long-term financial consequences. Although these HPC storage costs can be difficult to calculate, always include them in TCO estimates, especially when comparing products.