cutimage - Fotolia

Tip

A look at HCI hardware maintenance

Resource forecasting, capacity planning and upgrade compatibility are essential components of hyper-converged infrastructure maintenance. Learn why admins must implement them.

Alastair Cooke

Published: 30 Oct 2019

Hyper-converged infrastructure simplifies day-to-day operational tasks, but it does not eliminate the need for general hardware upkeep.

Hyper-converged infrastructure (HCI) hardware will eventually fail or run low on capacity; all hardware has a finite lifespan. To keep your HCI operational and delivering business value, you must keep the hardware platform healthy with regular component maintenance.

If you have a small HCI deployment, you may not see a single failure in the three- to five-year lifespan of your servers. Though the more servers you have and manage, the higher the probability of a failure. If you are running hundreds of HCI hardware nodes, a component may fail every couple of months, although modern servers are designed to be failure tolerant.

Systems usually have redundant fans and power supplies so a single component failure doesn't cause an outage. That said, your HCI maintenance plan should include replacement hardware, whether it's on-premises or you get all service through the vendor's support.

HCI hardware fills up eventually

HCI requires ongoing capacity management. Resource demand grows over time, and each cluster resource is a finite pool when it comes out of the box.

Capacity monitoring should be a core part of your HCI hardware management plan -- preferably with forecasting -- to predict when you need more resources. When you create your budget forecasts, include time for the financial approval, ordering, fulfilment and hardware deployment.

It's poor operations -- and stressful -- to run out of capacity while extra hardware is still on a delivery truck. Be mindful of resource balances, because HCI platforms are purchased as a combination of compute and storage. This makes it trickier to expand any compute hardware than a regular, hot-swappable server. To track resource availability, you can use HCI management software to get regular reports or alerts when resources reach a certain threshold.

If your workload has an uneven distribution of compute and storage consumption, then you could be paying for resources you do not use, making your HCI less cost-effective.

Consider whether adding compute-only or storage-only nodes is the more cost-effective way to expand your HCI hardware setup. Also remember that maintenance activities can take resources away from the HCI cluster; you may need to shut down the node to replace parts such as fans or hard drives.

Cluster expansion considerations

When the time comes to expand your HCI cluster, consider the effects new hardware has on resource availability. If you continue to expand with similar HCI hardware nodes from the same vendor, you will likely not affect overall performance.

If you expand with nodes that have significantly different storage and processing resources, there may be an imbalance in performance across your infrastructure. For example, a cluster with four older medium-sized 256GB HCI nodes may be expanded with two newer, and much more powerful 768GB nodes.

If your cluster expands from 1TB of RAM to 2.5TB of RAM, and one of the new nodes fails, the cluster can lose nearly a third of its RAM; but if one of the older nodes fail, you lose only 10% of the RAM. This potential imbalance might affect CPU or storage capacity and lead to maintenance or compatibility issues on the newer nodes.

The next step after rolling cluster expansion is rolling component replacement. When your HCI nodes reach the end of their lives, you can deploy new nodes into the cluster and then retire out any older nodes.

Figuring out if an asset is at the end of its life is a business decision. End of life can be when the asset's value depreciates to zero, when you decide to remove the risk of failure from old hardware or when new hardware improvements make older hardware expensive to run.

Next Steps

Emerging approaches to HCI for IT services

Dig Deeper on Data center hardware and strategy

E-Handbook: How to maintain your hyper-converged data center

Article3 of 4

Up Next

A look at HCI hardware maintenance

Resource forecasting, capacity planning and upgrade compatibility are essential components of hyper-converged infrastructure maintenance. Learn why admins must implement them.

HCI hardware fills up eventually

Cluster expansion considerations

Next Steps

Dig Deeper on Data center hardware and strategy

Compare Nutanix AHV vs. VMware ESXi in the hypervisor battle

Learn about the Azure Stack HCI benefits for admins

iXsystems entry-level NAS goes hyper-converged with TrueNAS Scale

Bare-metal container clusters: Infrastructure for next-gen apps

HCI data centers require new maintenance practices

Continuous HCI platform maintenance ensures system health

A look at HCI hardware maintenance

How to prepare your data center for an HCI appliance