Sergej Khackimullin - Fotolia
VMware vSphere's High Availability service can automatically restart VMs that have stopped responding, but you must use vSphere HA application monitoring to get the full details.
The principal mechanism for responsiveness to VMs is a heartbeat, which is an artificially generated signal VMware Tools produces and the vSphere VM Monitoring service receives. If the heartbeat is absent for longer than a prescribed time, vSphere deems the VM non-responsive.
This usually indicates that there's a fault in the guest OS or that VMware Tools isn't functioning, which is often because it's not getting compute time. In any case, vSphere High Availability (HA) can trigger a restart of the afflicted VM.
A VM heartbeat without vSphere HA application monitoring or VM monitoring results isn't a perfect indication of a VM's condition or functionality. There are cases where the VM heartbeat might stop, but the VM and its application continues to function normally. If this happens, vSphere HA might restart the VM unnecessarily.
To improve VM monitoring and prevent unnecessary VM restarts, the VM Monitoring service in vSphere HA can also check the VM's I/O to determine disk or network activity -- a fundamental indication of application activity.
VM Monitoring checks for I/O activity for the previous two minutes, in addition to the regular heartbeat. If the VM heartbeat is missing, but there is recent I/O activity, the VM workload might still be working, so vSphere won't restart the VM. If the VM heartbeat is missing and there's no recent I/O activity -- within two minutes by default -- then the cluster's master node can restart the afflicted VM.
Beyond VM heartbeats and I/O activity, VMware also supports vSphere HA application monitoring, which enables you to configure customized heartbeats for select applications. This requires applications that support vSphere HA application monitoring or an SDK that you can integrate with the application.
VSphere HA application monitoring works almost exactly like the VM Monitoring service. Once you enable vSphere HA application monitoring and the application is producing a custom heartbeat, it restarts a VM if the application's heartbeats stop for a specified period of time.
You can also select sensitivity with vSphere HA application monitoring. High sensitivity looks for heartbeats that are absent for over 30 seconds, medium sensitivity checks for heartbeats that are absent for over one minute, and low sensitivity checks for heartbeats that are absent for over two minutes.
You can also configure custom monitoring periods. Shorter windows can detect troubled VMs faster, which can lead to earlier VM restarts, though this increases the possibility of false positives.
Dig Deeper on Disaster recovery planning and management
Related Q&A from Stephen J. Bigelow
Fog computing vs. edge computing -- while many IT professionals use the terms synonymously, others make subtle but important distinctions between ... Continue Reading
Learn how load balancing in the cloud differs from a traditional network traffic distribution, and explore services available from AWS, Google and ... Continue Reading
Access management is critical to securing the cloud. Understand the differences between AWS IAM roles and users to properly restrict access to AWS ... Continue Reading