Managing and protecting all enterprise data

How to address HCI's resource provisioning challenges

Provisioning in hyper-convergence is often imbalanced. Integrating more powerful, higher-capacity nodes with HCI software or taking a more hybrid approach can help.

Hyper-converged infrastructure promises to make IT resource provisioning, including storage, less complicated thanks to the way it scales. When IT needs to deliver more processing power or storage capacity to applications or needs to support more workloads, managers add more servers, which become nodes in the cluster. Each node includes a prerequisite amount of computing power and storage capacity, and the environment automatically scales as demands warrant.

The challenge is that hyper-converged infrastructure sacrifices efficiency for simpler resource provisioning. Most data centers don't require additional computing power, storage performance and storage capacity at the same pace. In almost every HCI environment, one or two of these resources is out of balance, meaning more resources are available than needed. Compute-dense environments end up with idle storage capacity, and demanding storage capacity environments wind up with idle CPUs.

The industry has identified these efficiency challenges of first-generation HCI offerings. Today, IT professionals use two main methods to make their next-generation HCI environments more efficient at provisioning compute, network and storage.

Option 1: Fewer, more powerful nodes

The first option integrates more powerful, higher-capacity nodes with HCI software that can take advantage of them. Where first-generation HCI preferred high node-count clusters for performance and availability reasons, the goal of HCI environments that use more powerful nodes is to keep the overall node count to a minimum. While these more powerful nodes cost more per node, your overall expense is less because there are fewer of them, which also means less networking infrastructure and simpler management. In most cases, a cluster with six of these more powerful nodes can outperform a 16-node cluster using less capable, commodity hardware.

These more powerful nodes scale out like first-generation HCI and also scale up to ensure each node achieves full utilization before another node is added for efficient resource provisioning. IT managers can purchase nodes in minimal configurations and add computing power and storage capacity as needed. The internal scaling of these nodes means the storage component of the HCI software must also manage resources and data protection differently, since each node may have a different configuration.

This node design works best when the nodes are configured with all-NVMe flash and high core-count CPUs so that each node can support dozens of virtual machines and, potentially, deliver millions of IOPS. Thanks to the high baseline performance, the HCI cluster can support a wide variety of workloads simultaneously and enable organizations to virtualize workloads that were previously considered bare-metal only.

The HCI software also needs to take full advantage of all-NVMe flash environments by serving data to virtual machines (VMs) running on each node directly from the node. With NVMe, any I/O that requires the network to access data may eliminate much of the performance advantage of NVMe. Even data protection strategies must be optimized. HCI vendors need to reconsider the typical erasure coding technique found in HCI clusters. Data protection should take advantage of the simpler replication technique, a more advanced version of erasure coding or the HCI software should offload it from the hyper-converged cluster.

hyper-converged scaling options

The more powerful nodes strategy provides greater resource provisioning simplicity than first-generation HCI because there are fewer nodes to manage and more workloads can run within the environment. It even has the potential to be the single environment for the entire data center, unifying all provisioning tasks under the same interface.

This strategy for provisioning compute and storage also improves efficiency by adding a scale-up capability to HCI's foundational scale-out technique. IT professionals can add additional compute and capacity to each node before having to expand by adding additional nodes. Not only is this approach more efficient, but it can smooth out the purchasing curve.

The challenge with hyper-converged infrastructure is it sacrifices efficiency for simpler resource provisioning.

From a cost-efficiency standpoint, the more powerful node approach not only lowers total hardware acquisition cost by requiring far fewer servers, but it also increases licensing efficiency as most HCI vendors license their software by the number of nodes and cores. While the more powerful node approach has access to a high number of cores, thanks to its use of NVMe, it can do more with fewer cores. Finally, because of the reduction in nodes, networking infrastructure requirements are also lowered. Consequently, an organization needs to purchase fewer switches and network interface cards, further reducing costs.

Option 2: Hybrid hyper-converged infrastructure

One reason HCI is considered simpler than other data center infrastructure approaches is its consolidation of compute, storage and networking into a single tier. An advantage of consolidation is that HCI software can ensure data associated with a virtual machine is stored directly on the node on which that VM is running. This direct access to data eliminates the network from affecting storage I/O performance, but it does limit how much data each node can store before the HCI vendor has to invoke a tiering strategy. Tiering or caching is problematic in HCI environments because it has to occur across a cluster of nodes, increasing network traffic for all I/O and jeopardizing storage performance. Also in jeopardy is the ability to mix workloads, since a capacity-demanding workload may force the HCI tiering or caching software to offload the data needed by a storage-intensive workload.

An alternative to first-generation hyper-convergence is to take a more hybrid approach to storage resource provisioning, using shared storage in conjunction with the HCI. The challenge is how to use shared storage without reintroducing complexity into the process. In a hybrid model, all nodes use the shared storage pool to offload older data, so the internal non-pooled storage in each node is dedicated exclusively to the VMs on that node. If that non-pooled tier is NVMe flash drives, then storage I/O for these VMs will be very high.

hyper-converged benefits

Shared storage pools enable the commodity nodes to be small in size, with the primary focus of their configuration on delivering computing power to the VMs they support. In this architecture, the HCI software stores all new or modified data in the node where the I/O occurs and then replicates it to the shared storage area. All data protection, like RAID or erasure coding, is performed on the shared storage area, alleviating the compute overhead the node needs to allocate to data protection, enabling computing power to focus even more on VM performance.

Nodes only need to access the shared storage area when a VM is requesting older, non-cached data, so the performance capabilities of the shared storage area are modest, which further drives down costs. The shared storage area is typically available as either all-flash for environments where the potential performance drop of a cache miss might affect VMs, or it's available as HDDs for situations where the impact of a cache miss is negligible.

The hybrid approach makes HCI resource provisioning much more efficient than first-generation HCI. IT staff only add nodes to the cluster when they need to respond to a demand for more computing power, typically due to new applications. When IT has to respond to a request for more capacity, provisioning more storage only necessitates expanding the single shared storage area. The result should be nodes that run at very high levels of CPU utilization without creating an excess of storage capacity.

Since the majority of HCI expansions are in response to adding more capacity rather than computing power, most data centers find hybrid HCI requires a smaller number of nodes.

Provisioning HCI networking

Hyper-converged infrastructure claims to converge three data center elements: compute, storage and networking. HCI vendors often leave networking out of the conversation. It is an essential component, however, and lack of network provisioning efficiency is problematic.

For example, when a server is added to a cluster, the networking and joining of that server to the HCI cluster so it can become a node is not automatic. Additionally, the server-to-server communication on the HCI network is critical; setting priorities for specific workloads so they can maintain service levels is a must. The ability to provision network resources at the VM level, in much the same way as provisioning a CPU, is essential.

IT needs to add software-defined networking offerings to their HCI to ensure they can efficiently provision the networking element of the environment. Consequently, several software-defined networking (SDN) vendors are certified to work with hypervisors that are part of some HCIs. Other HCI and hypervisor vendors bundle SDN into the core of the solution.

Deciding between powerful nodes and the hybrid model

When comparing the two options, both dramatically increase compute and storage provisioning efficiency when compared to first-generation HCI. The more powerful nodes concept works with existing hypervisor and HCI software, but does require a custom, purpose-built node. Alternatively, the hybrid model works with more traditional off-the-shelf server hardware, but requires some customization of the hypervisor software. Essentially, hybrid vendors replace the storage software component of their offering with a more intelligent one that offloads inactive data and data protection from the node.

Both resource provisioning approaches aim to reduce overall node count and deliver improved storage I/O performance, which should lead to lower overall infrastructure costs. Each one should also be capable of supporting a great mixture of workloads. A hybrid model brings efficiency to a higher node count cluster, while more powerful nodes eliminate much of the need for cluster expansion. IT needs to decide if the higher per-node performance approach is a better match or if they have enough workloads that the greater node count of hybrid HCI is a better fit.

Article 4 of 5

Dig Deeper on Converged infrastructure management

Get More Storage

Access to all of our back issues View All
Cloud Computing
and ESG