What are some common VMware High Availability configuration errors?

What are the most common mistakes people make when configuring VMware High Availability?

First and foremost, making the assumption you will be able to implement VMware High Availability (HA) without training -- or at least reading the manual – is an obvious mistake. Beyond that, some of the most common errors VMware administrators make include:

1. Not using identical host hardware in the VMware vSphere HA cluster. Use of different host hardware can, and often does, lead to an imbalanced cluster. By default, VMware HA prepares for the worst-case scenario of the largest, most powerful host in the cluster failing. To be able to deal with that failure, more resources from the other hosts in the cluster have to be reserved, making those resources unavailable.

2. Allowing cluster host inconsistencies that prevent a virtual machine (VM) from being started on any cluster host. Users often neglect to mount data stores to every cluster host. This makes booting the VMs from cluster hosts that cannot see those specific data stores impossible. Another common inconsistency is an incorrectly set up Distributed Resource Scheduler (DRS) with flawed VM to host affinity rules.

3. Making a vSphere cluster that is too small. The way vSphere HA works is that every host in the cluster has to reserve a portion of their resources to handle a host or node failure. A 12-node cluster would have to reserve 1/12th of the entire cluster's resources to handle the failure of a single node. If HA is required to protect against two nodes failing simultaneously, then 1/6th of the cluster's resources must be reserved. Setting up a smaller cluster can hamper the ability of that cluster to tolerate nodal failures. Larger clusters are a lot more tolerant of failures.

4. Not protecting the vSphere vCenter instance. This is one of those Homer Simpson "D'oh!" moments. It seems obvious, and yet it's a common mistake.

5. Not enabling the network switching PortFast option. This can create the user impression that the failed node VMs have not come up because it takes so long for the VMs to regain their network connectivity.

Next Steps

Frequently asked questions about VMware HA

Guide to VMware High Availability

VMware High Availability positives and negatives

Dig Deeper on Disaster recovery facilities and operations

Data Backup
Storage
Security
CIO
Close