VMware DRS and VMotion: Improve workload balance, prevent problems
Learn how both VMware DRS and VMotion work individually, how they work together, and get configuration tips for optimal virtual machine performance in an environment that takes advantage of both technologies.
VMware Distributed Resource Scheduler (DRS) helps allocate computing resources in a virtual environment to ensure a better use of those resources. Combine DRS with VMotion, and you can create a well-run virtual infrasctructure or, alternatively, a virtual traffic jam. In this tip, I'll explain how both DRS and VMotion work individually and together and explain some configuration tricks that will keep the two from clashing and slowing down virtual machine performance.
VMware Distributed Resource Scheduler
Simply put, DRS ensures that your resource requirements are enforced. You start with a number of VMware host systems, shared storage, same network presence, and resource pools that you define. From there, DRS will balance the workload across the resources you presented to the cluster. It is an essential component of any successful ESX implementation.
With VMware ESX 3.x and VirtualCenter 2.x, it's possible to configure VirtualCenter to manage the access to the resources automatically, partially, or manually by an administrator. This option is particularly useful for setting an ESX server into maintenance mode. Maintenance mode is a good environment to perform tasks such as scanning for new storage area network (SAN) disks, reconfiguring the host operating system's networking or shutting down the server for maintenance. Since virtual machines can't be run during maintenance mode, the virtual machines need to be relocated to other host servers. Commonly, administrators will configure the ESX cluster to fully automate the rules for the DRS settings. This allows VirtualCenter to take action based on workload statistics, available resources and available host servers. A fully automated DRS cluster is illustrated below in figure one.
An important point to keep in mind is that DRS works in conjunction with any established resource pools defined in the VirtualCenter configuration. Poor resource pool configuration (such as using unlimited options) can cause DRS to make unnecessary performance adjustments. If you truly need to use unlimited resources within a resource pool the best practice would be to isolate. Isolation requires a separate ESX cluster with a limited number of ESX hosts that share a single resource pool where the virtual machines that require unlimited resources are allowed to operate. Sharing unlimited setting resource pools with limited setting resource pools within the same cluster could cause DRS to make unnecessary performance adjustments. DRS can compensate for this scenario, but that could be by bypassing any resource provisioning and planning previously established.
How VMotion works with DRS
The basic concept of VMotion is that ESX will move a virtual machine while it is running to another ESX host with the move being transparent to the virtual machine. ESX requires a dedicated network interface at 1 GB per second or greater, shared storage and a virtual machine that can be moved. Not all virtual machines can be moved. Certain situations, such as optical image binding to an image file, prevent a virtual machine from migrating. With VMotion enabled, an active virtual machine can be moved automatically or manually from one ESX host to another. An automatic situation would be as described earlier when a DRS cluster is configured for full automation. When the cluster goes into maintenance mode, the virtual machines are moved to another ESX host by VMotion. Should the DRS cluster be configured for all manual operations, the migration via VMotion is approved within the Virtual Infrastructure Client, then VMotion proceeds with the moves.
VMware ESX 3.5 introduces the highly anticipated Storage VMotion. Should your shared storage need to be brought offline for maintenance,Storage VMotion can migrate an active virtual machine to another storage location. This migration will take longer, as the geometry of the virtual machine's storage is copied to the new storage location. Because this is not a storage solution, the traffic is managed through the VMotion network interface.
Points to consider
One might assume that with the combined use of DRS and VMotion that all bases are covered. Well, not entirely. There are a few considerations that you need to be aware of so that you know what DRS and VMotion can and cannot do for you.
VMotion does not give an absolute zero gap of connectivity during a migration. In my experiences the drop in connectivity via ping is usually limited to one ping from a client or a miniscule increase in ping time on the actual virtual machine. Most situations will not notice the change and reconnect over the network during a VMotion migration. There also is a slight increase in memory usage and on larger virtual machines this may cause a warning light on RAM usage that usually clears independently.
Some virtual machines may fail to migrate, whether by automatic VMotion task or if evoked manually. This is generally caused by obsolete virtual machines, CD-ROM binding or other reasons that may not be intuitive. In one migration failure I experienced recently, the Virtual Infrastructure client did not provide any information other than the operation timed out. The Virtual Center server had no information related to the migration task in the local logs. In the database VPX_EVENT (for VirtualCenter 2.5) table I found the following entry:
Unfortunately, database entry does not provide much information. The resolution: address the configuration with the virtual machine. If you cannot reboot or shut off the virtual machine to work in an offline environment, my solution has been to run the VMware Converter tool to copy the virtual machine to another instance and correct the configuration in an offline or staging configuration state. Once the issue is addressed, move over the relevant changed data, and use this new virtual machine going forward.
Identification of your risks is the most important pre-implementation task you can do with DRS and VMotion. So what can you do to identify your risks? Here are a couple of easy tasks:
- Schedule VMotion for all systems to keep them moving across hosts.
- Regularly put ESX hosts in and then exit maintenance mode.
- Do not leave mounted CD-ROM media on virtual machines (datastore/ISO file or host device options).
- Keep virtual machines up to date with VMware tools and virtual machine versioning.
- Monitor the VPX_EVENT table in your ESX database for the EVENT_TYPE = vim.event.VmFailedMigrateEvent
All in all, DRS and VMotion are solid technologies. Anomalies can happen, and the risks should be identified and put into your regular monitoring for visibility.
Rick Vanover is an MCSA-certified system administrator for Belron US in Columbus, Ohio. Rick has been working with information technology for over 10 years and with virtualization technologies for over seven years.