Kit Wai Chan - Fotolia
How does vSphere high availability affect VM restart order?
If you encounter a VM failure, you can use vSphere HA to automatically detect and restart VMs. Before you restart the VMs, you'll also need to meet resource and host criteria.
VMs can fail for a number of reasons, but with the proper configuration, vSphere High Availability can automatically detect the failure and attempt a restart -- as long as your system meets five criteria.
The restart process for a VM in a high availability cluster that fails is often more complicated than it seems because the original host system is likely no longer available. The cluster's master host must find another host system that is both available and capable of running the afflicted VM. The cluster's master host must also evaluate certain parameters before it restarts a failed VM node.
The first step in the vSphere High Availability process is for the cluster's master host to determine whether the VM files are accessible. If the master can't find the necessary files, it can't restart the VM. In most cases, this requires the host to access a snapshot or VM image that is currently running on another active cluster node.
Next, the master node must determine whether other suitable host systems are available -- and whether the VM is even capable of running on those available host systems. The replacement host must be a different physical system than other nodes in the cluster. That way, you avoid running duplicate VM nodes on the same physical host, which would defeat the purpose of using vSphere High Availability. If there are no other compatible host systems available, it's impossible to restart the VM.
After the cluster's master host finds compatible host systems, the master considers any resource reservations on those systems. A system can reserve processors, memory, network interfaces and virtual flash. Before a VM can start, the potential host system must have enough unreserved resources available to meet its resource requirements. If there aren't enough resources available -- unreserved processor, memory, network interface or virtual flash capacity -- the VM won't be able to restart on that system.
Next, the cluster's master host must check for any prevailing host limits. For example, the VM won't restart on a system if that action violates the maximum number of supported vCPUs or VMs. If this process violates host limits, the master attempts to select an alternate host.
Finally, the master has to obey VM affinity or anti-affinity rules. For example, VM placement might be subject to VM affinity rules, which limit the VM to run on a certain subset of available host systems. In contrast, VM anti-affinity rules prevent VMs from starting on certain systems -- even if those systems are available and meet other criteria. If no available systems meet VM affinity or anti-affinity rules, it's impossible to restart the VM.
If conditions prevent a VM from restarting, the master triggers a log event noting that vSphere High Availability can't restart the VM. VSphere High Availability will try to restart the VM later if conditions change.
Dig Deeper on Disaster recovery planning and management
Related Q&A from Stephen J. Bigelow
What is data separation and why is it important in the cloud?
Some enterprises avoid the public cloud due to its multi-tenant nature and data security concerns. Learn what data separation is and how it can keep ... Continue Reading
NAS vs. object storage: What's best for unstructured data storage?
There are advantages and disadvantages to using NAS or object storage for unstructured data. Find out what to consider when it comes to scalability, ... Continue Reading
Do hypervisors limit vertical scalability?
Knowing hardware maximums and VM limits ensures you don't overload the system. Learn hypervisor scalability limits for Hyper-V, vSphere, ESXi and ... Continue Reading