Troubleshooting VMware snapshots

Virtual machine (VM) snapshots are a great tool to identify problems with VMs. An expert explains troubleshooting common problems that arise with snapshots in VMware ESX Server.

Virtualization administrators can use snapshots in vSphere to travel back in time and figure out what went wrong...

with their virtual machines (VMs). In part one of this series, I discussed how to use VMware snapshots. In part two, I explained how to delete snapshots without wasting disk space. But what do you do when your snapshots start acting funny? In this tip, we'll troubleshoot potential problems that may come up when using snapshots in vSphere.

Locating VMs that have snapshots
Finding out which VMs have snapshots can be challenging. In VMware Infrastructure 3, there wasn't a centralized, built-in way to accomplish this task in the vSphere Client or vCenter Server. You had to use methods, such as scripts and command-line utilities, that made locating snapshots difficult. But there were some enhancements in vSphere that made locating snapshots much easier. Here are a few of the methods that you can use.

Method 1: Find command
Use the find command in the ESX service console or ESXi Tech Support Mode

  1. Log in to the console.
  2. Change to your /vmfs/volumes/ directory.
  3. Type find -iname "*-delta.vmdk" -mtime +7 -ls to find snapshot files that have not been modified in seven days or simply find -iname "*-delta.vmdk" to find all snapshot files.

Method 2: Use the Storage View in vCenter Server
The Storage View, part of a new Storage Monitoring and Reporting plug-in that comes with vCenter Server, shows information related to storage in vSphere. When you select an object in the left pane of the vSphere Client, you can select the Storage View tab in the right pane and view storage information related to that object. One of the columns that you can view is Snapshot Space -- which is the total size of all snapshot-related files, including the -delta.vmdk, .vmsd and .vmsn files.

By selecting an object, such as Cluster or Datacenter, and sorting the Snapshot Space field, you can view the size of any VM snapshot that exists under that object. VMs that haven't had a snapshot will show 0 bytes. Once a snapshot of a VM is taken, it will still show a very small size (around 40 bytes), which is from the residual text left in the .vmsd file.

Method 3: Use alarms in vCenter Server
You can configure a vCenter Server alarm to trigger when a VM snapshot size reaches a predetermined gigabyte threshold. You can also set alarms at any virtualization level -- from a single VM to the top vCenter Server level. These alarms will keep you informed of snapshot growth, so you can take action, if needed.

Method 4: Use a PowerShell script
The Get-Snapshot command, part of vSphere PowerCLI, can query VM snapshot information. You can use it in scripts to produce reports on VMs that have active snapshots. There are several, free PowerShell scripts that you can download and run periodically, such as SnapReminder, yadr -- A vdisk reporter and Snapshot List. You can also set the scripts to run automatically.

Dealing with snapshots that don't delete properly
Occasionally, a snapshot will not delete properly, leaving an active snapshot for a VM. This situation can happen when using backup applications or deleting snapshots through Snapshot Manager. In most cases, the snapshot will not appear in the Snapshot Manager. The only indication that a snapshot may still exist is the presence of delta files in the VM's directory.

If you have a snapshot running that is not in Snapshot Manager, you can attempt to delete it in one of two ways. First, create a new snapshot using the vSphere Client and delete all snapshots from the snapshot manager after the new one has been created. Alternatively, use the ESX service console or vSphere CLI. Switch to the VM's home directory and create a new snapshot by typing vmware-cmd createsnapshot . Wait for the snapshot to be created and type vmware-cmd removesnapshots. When it completes, see if the delta files have been deleted. If they have, then it was successfully completed.

If the delta files weren't deleted, check the VMX file for the VM and locate the lines starting with scsi. If the VM is configured with only one virtual disk, it is usually scsi0:0. (If .present is false, it is a non-existent drive that you can ignore.) The .fileName should be using the original disk file that was created with the VM and it's usually the same name as your VM. If this is the case, then your VM is not using the snapshot files. If it has a -00000# in the filename, it is currently using a snapshot file.

To be clear, a VM with no snapshots displays the following: scsi0:0.present = "true" scsi0:0.fileName = "myvmname.vmdk". And a VM with snapshots will display the following: scsi0:0.present = "true" scsi0:0.fileName = "myvmname-000001.vmdk"

If the above operation failed, your other options are to either clone the VM or clone the VM's disk file. To clone the VM, you can either use the clone function in vCenter Server or the standalone vCenter Converter application. When it's completed, shut down and delete the old VM.

Another method is to shut down the VM. Log in to the ESX Service Console or ESXi Tech Support Mode. Then, switch to the VM's directory and clone the VM's disk file, using vmkfstools and specifying the snapshot file as the source disk (i.e. "vmkfstools --i myvmname-000001.vmdk myvmnamenew.vmdk").

Next, go into the settings for the VM. Remove (don't delete) the hard disk. Then, add a new hard disk and browse to the newly created disk file. Power on the VM and verify everything is working before you delete the old disk and delta files.

Changing snapshot file locations
By default, the snapshots are written to the home directory of each virtual machine. You may want to change this location, as to not take up space on the volume that your VM resides. It is possible to individually specify a new working directory for snapshots on each VM. Both snapshots and .vswp files are written to this directory when you choose this method.

Be warned: If the VM is on shared storage and you specify local storage as a location, you will not be able to use features that move VMs between hosts, such as vMotion, High Availability and Distributed Resource Scheduler. To do this, follow these steps:

  1. Power off your VM and log in to the ESX service console or ESXi Tech Support Mode.
  2. Edit the VMX file of your VM with the nano (ESX only) or vi (ESX/ESXi) editor.
  3. Add a new line, using the following syntax: workingDir="/vmfs/volumes/SnapVolume/Snapshots/"
  4. If you want your .vswp file to stay in the VM's directory, add the following line to the VMX file: sched.swap.dir = "/vmfs/volumes/VM-Volume1/MyVM/". This step is optional. Furthermore, do not worry about updating the existing "sched.swap.derivedName" parameter, because it is generated by the VM and written to the configuration file each time the VM powers on.
  5. Power on your VM, and your .vswp, .vmsn and snapshot (delta-vmdk) files will now be located in this directory.

Using vMotion and Storage vMotion with snapshots
Using vMotion to migrate a VM to a different host is supported and all existed snapshots are retained. If you try to vMotion a VM with running snapshots from one host to another, however, you will receive the following warning: "Reverting to snapshot would generate error (warnings) on the destination host." In other words, the migration wizard cannot verify the compatibility of the virtual machine state in the snapshot with the destination host.

Because the compatibility cannot be verified, a failure could occur if the VM configuration in the snapshot uses devices or virtual disks that are not accessible on the destination host. A failure can also occur if the snapshot contains an active VM state that was running on virtual hardware and it's incompatible with the destination host CPU.

Using Storage vMotion to move a VM to another disk location is not supported, initially. To use it, you must first delete all the snapshots on a VM. Alternately, you can power the VM off and perform a cold migration to another disk location.

Using Fault Tolerance with snapshots
VM snapshots are not supported on VMs that use Fault Tolerance (FT). As a result, backing up FT-enabled VMs can be tricky, because many backup applications rely on VM snapshots.

Look at alternative backup methods, such as traditional OS backup agents that run inside the VM, cloning VMs and then backing up the clones, temporarily disabling FT when running backups on the VM or using storage snapshots to backup the VM's data store.

Fault Tolerance can be controlled via PowerShell scripts, so you can run pre-backup scripts to temporarily disable FT. That way, a backup application can take a snapshot. Then, a post-backup script can re-enable FT.

Helpful articles on snapshots
For more information on snapshots, check out the following VMware Knowledge Base articles:

Dig Deeper on VMware ESXi, vSphere and vCenter

Virtual Desktop
Data Center
Cloud Computing