VMware vSphere High Availability is a utility that restarts failed VMs on alternative host servers to reduce downtime for critical applications. With vSphere HA, you can pool physical servers on the same network into a high availability cluster. HA can then detect VM failures and outages and restart them on other stable systems in the cluster in the event of a problem.
HA automates VM failover and helps organizations improve availability. Using the Fault Domain Manager agent, it monitors ESXi host availability via a host's "heartbeat" -- a signal VMware Tools generates to monitor availability. You can define affinity and anti-affinity rules in HA to ensure VMs remain where they belong.
VSphere High Availability explained: Manage VMs
VSphere HA offers a variety of VM management features, including disk and network monitoring, heartbeat customization, automatic VM restarts, automatic failover and affinity rules.
You can use HA's VM Monitoring features to check a VM's I/O and determine disk and network activity. You can check compute time and manipulate a VM's heartbeat to help detect potential failures and restart nonresponsive VMs faster.
You can also configure HA to automatically restart a VM if it detects a failure. To do so, the master host must evaluate VM file accessibility, suitable host system availability, potential host resources, host limits and VM affinity or anti-affinity rules.
Best practices for vSphere HA clusters
Following a handful of best practices will ensure your HA clusters run as intended.
You'll need to use at least two vCenter Servers for HA-managed hosts. You can run as many as 64 hosts on a single cluster, and you can manage several clusters within your data center.
Use at least two network interface cards to build redundancy into your network, and configure your hosts in a way that HA doesn't use NICs that share subnets with NICs used for other purposes. Use different subnets or virtual LANs for separation and a redundant network IP isolation address.
You can connect NICs to separate physical switches to improve the management network's reliability. This approach also improves cluster resilience.
Use Proactive HA whenever possible to identify hardware conditions of both hosts and workloads. Proactive HA works with the Distributed Resource Scheduler to evacuate VMs from a host before a problem occurs.
HA application monitoring
HA automatically restarts unresponsive VMs, but to understand why, you must use HA's application monitoring feature.
HA monitors VM responsiveness with the heartbeat mechanism, which is an artificially generated signal produced by VMware Tools and received by HA's VM Monitoring feature. If the heartbeat is absent for a certain amount of time, vSphere decides the VM is nonresponsive and restarts it on the cluster on which the application resides.
HA application monitoring works almost the same way as VM Monitoring. Configure your applications with custom heartbeats and select a level of sensitivity. High-sensitivity checks for heartbeats absent for more than 30 seconds, medium sensitivity monitors for heartbeats absent longer than a minute, and low-sensitivity takes action once a heartbeat is absent for two minutes. You can also configure custom monitoring periods.
HA and storage
In most production deployments, vSphere High Availability requires shared storage to protect VMs. Storing a VM on a host's local storage makes it vulnerable to hardware failure, and if it becomes unresponsive offline, both the local storage and the VM become inaccessible. If you store your VMs in a shared location, HA can restart a failed or unresponsive VM on a remaining host in the cluster and it can continue running smoothly.
For secure shared storage options, consider VMware vSAN. VSAN provides software-defined storage services for hyper-converged infrastructures built on VMware technology, and it works with the vSphere hypervisor to deliver a virtual storage area network across an infrastructure. It supports many vSphere availability features -- not only HA, but also fault tolerance, Distributed Resource Scheduler and VM replication.
For small or test lab deployments, use vSphere High Availability without shared storage. Use local storage instead; however, avoid doing this in large production environments.
Hyper-V vs. High Availability
VSphere HA and Microsoft Hyper-V serve the same purpose but operate in different ways. Both services require the creation of a cluster. VSphere HA requires a cluster of ESXi servers administered by vCenter Server, and Hyper-V uses a feature called Windows Failover Clustering that doesn't require a management server. Both services track status through heartbeats.
Hyper-V uses a quorum model to determine what happens during a failure situation. This is meant to ensure that the majority of nodes remain accessible and prevent an isolated node from trying to host other running workloads.
On the other hand, vSphere High Availability can be explained as a master host used to monitor all other hosts in the cluster. If a VM stops sending a heartbeat to the master host, it determines whether that VM has simply lost communication or has truly failed, a process that requires checking to see if the host in question is still communicating with its data store. In the event of a failure, the master host powers the VM off and restarts it on a new host. This means that you can prioritize certain VMs and bring the most important ones online first.