Microsoft's change from a per-CPU server OS licensing model to a per-core approach has caused many organizations to look for ways to reduce Windows Server licensing costs. Finding and eliminating zombie VMs is one way to accomplish this goal.
Advances in virtualization technology make it quick and easy to scale workloads out horizontally, but this ability has a downside: It can lead to server sprawl. Many application owners see virtual servers as an almost-unlimited resource, so they create VMs quickly and forget them just as fast. Left unchecked, this can lead to numerous zombie servers wandering through a virtual infrastructure.
Numerous virtual servers lead to real costs
Even the smallest physical server requires an organization to spend a minimum of $6,155 for a Windows Server Datacenter edition license based on Microsoft's 16-core minimum per server. For a quad-core server with a total of 96 cores, the cost jumps to more than $36,000. That's before Software Assurance gets tacked on.
This sticker shock of the new pricing model hits companies considering a migration, and it causes administrators to take a closer look at the workloads in their inventory in an effort to find ways to cut down on Windows Server licensing costs.
A zombie server that isn't doing much doesn't present a huge drain on resources, because hypervisors excel at allocating resources to VMs that need it. However, these zombie servers increase the VM density per host and drive the need for additional hosts, which leads to higher Windows Server licensing costs.
Finding the VM is just the first step
The best way to deal with a zombie virtual server is to remove it. It's not a complex process to dispatch these VMs. Finding them is where the difficulties lie, because it's possible to inadvertently remove a VM that's still necessary.
As part of the investigative work, IT admins need to identify the zombie VM, then verify it is needed. This two-step process is critical to avoid removing or stopping production workloads.
Admins can detect zombie workloads by name or by performance, but the easiest is by name. Temporary servers are often spun up without much advance planning and, consequently, do not match the company's server-naming scheme. The hard part is determining if the workload has a valid purpose. IT admins should not disconnect or remove something that looks off just by its label.
This is where the monitoring aspect comes into play. If the admin finds a server that appears to be dormant, with no CPU activity or network I/O over the course of several days, then it might be a zombie VM. Instead of deleting it, disconnect it from the network. This is an ideal way to see if anyone notices without causing much harm.
If someone asks about the VM, then IT can reconnect the server easily. When the organization finds VMs that don't use the proper naming scheme, find out what the server does and who owns it to ensure they follow best practices the next time.
If no one asks about the server after about a week, then shut it down. Keep it in this shutdown state for several days. If there are no complaints, remove it from the virtual inventory and place it in cold or low-tier storage for about a month before deleting it. This procedure offers a smart and safe way to consolidate workloads on the virtual hosts.
Removing zombies has other benefits
The verification process is the risky part when it comes to zombie workloads. Pulling a VM out of production can trigger a disruption, which is always a concern. There are risks when removing zombie workloads, but the payoff can be huge when the organization can cut back on Windows Server licensing costs.
In addition to the financial savings, removing zombie servers also saves IT admins from unnecessary maintenance. Admins responsible for patching servers have probably been updating these zombie workloads, along with all of the other VMs, to avoid exposure to an east-west attack. This time and effort puts a drain on the IT staff's resources.