Getty Images/iStockphoto

Tip

12 common virtual machine backup mistakes

Despite an administrator's best efforts, virtual machine backups can fail. Determine the cause of the failure and modify the VM backup strategy to prevent future mistakes.

Virtual machine backup mistakes can occur at any level. It's up to backup administrators to spot and rectify these errors to reduce the risk of data loss.

Virtual machine backups enable organizations to protect VMs with the same reliability and security as traditional physical server backup. However, there are several ways that VM backup can go awry.

Some roadblocks are relatively simple, such as bottlenecks and a lack of resources. More complex issues, such as guest OS difficulties and virtual disk corruption, can complicate data protection efforts significantly.

Below are 12 common virtual machine backup mistakes that administrators must watch out for. Catching these errors and quickly remedying them is key to keeping VM data safe.

1. Performing guest OS backups

Backing up through the guest OS is probably the most common VM backup mistake. It's best to perform backups at the VM host level rather than installing a backup agent onto a VM's operating system.

The reason why it is best to avoid guest OS backups whenever possible is that they are inefficient and difficult to manage at scale. In addition, if several virtual machines run guest OS backups simultaneously, they can collectively cause significant performance bottlenecks.

Another reason why it's best to perform host-level backups is because doing so keeps administrators from having to manage each VM backup individually. New VMs are created all the time, and it's easy to forget to include a new VM in backups. Backing up at the host level avoids this problem altogether, because newly created virtual machines are backed up automatically.

2. Backing up virtual hard disk files directly

Users should never try to back up virtual hard disk files directly at the physical storage device and bypass the virtualization layer. While there are ways of safely backing up a virtual hard disk outside the virtualization layer, doing so bypasses the various safeguards that are built into the operating system. A simple mistake can corrupt the entire virtual hard disk, especially if snapshots or checkpoints are present.

3. Treating VM snapshots as a backup alternative

VM snapshots -- or checkpoints, if using Microsoft -- preserve the state of a VM from the point in time when the snapshot was taken. In addition, users can create multiple snapshots to provide more than one restore point to choose from. While this can be useful in certain situations, it should never be used as a primary method for backing up VMs.

A virtual machine backup contains a full copy of the VM's virtual hard disk. Conversely, a snapshot does not copy a virtual machine's contents. That is why snapshots are not true backups. If a storage problem causes a virtual machine to be lost, the snapshots will likely also be destroyed. Even if the snapshots remain, they are useless without the original virtual hard disk. Snapshots should be treated as a convenient feature rather than a backup alternative.

Snapshots also tend to diminish read performance, especially if multiple snapshots exist for a VM. Each hypervisor vendor has its own way of doing things, but generally speaking, the act of creating a snapshot causes a new virtual hard disk to be created. The original virtual hard disk is treated as read-only.

This means that when a read operation occurs, the hypervisor must read the snapshot virtual disk first and then perform a second read against the original virtual hard disk if the snapshot virtual hard disk does not contain the requested data. Creating multiple snapshots can result in several virtual hard disks having to be read every time a read operation occurs.

4. Not creating up-to-date backups

Backup applications are just like any other application in that they can contain bugs or security vulnerabilities. They must be kept up to date through patching. The unique thing about backup applications, however, is that a bug can jeopardize entire backups.

As an example, at one time there was an issue with VMware Data Recovery that caused its catalogs to become corrupt. A catalog is essentially an index of the data that has been backed up and is used by most backup applications. The catalog corruption issue was fixed with a patch, but some admins who failed to update their software in a timely manner found themselves having to rebuild their backup catalogs from scratch.

5. Not assigning the right permissions

Some backup applications require each protected host server to have a service account that can facilitate the backup process. These types of backup applications can be prone to backup errors related to insufficient permissions. For example, a backup might fail if the account policy forces a password change, but the backup application itself is not made aware of the password change. When this occurs, the backup usually fails before any data can be processed, and the logs reflect a security error or a read failure.

As important as it is for a backup application to have the necessary permissions, it is also important to avoid assigning excessive permissions to a backup account. If a backup application backs data up to a backup vault, for example, then it is best to remove the permissions required to delete, encrypt or modify the vault or the data within it. That way, if the backup account were to become compromised, a cybercriminal would be unable to use that account to destroy existing backups.

6. Using unsupported OS versions

Unsupported guest operating systems are another potential cause of virtual machine backup failures. For example, a backup application that fully supports backing up VMs that run Windows Server 2022 might view Windows Server 2025 as an unsupported operating system unless the backup software is updated to make it aware of the new OS version.

The problems caused by a lack of OS support can be avoided by verifying backup support before upgrading virtual machines to a new operating system. It is possible, however, that even if a backup application does not recognize the application running on a VM, it might still be able to create an image backup of that VM.

7. Overloading the host server

Another virtual machine backup mistake is overstressing the host server. If a VM resides on a disk that is already I/O bound, then the disk might not be able to deliver sufficient performance to keep the backup from timing out. The fix to this problem is to correct the storage bottleneck.

While backing up at the virtualization layer reduces resource usage on VMs when backups occur, resource usage will still be high on the hosts and storage devices when backups are running.

Resource starvation problems often come down to backup scheduling. Hosts typically share the same data stores in virtual environments, and bottlenecks caused by too many simultaneous VM backups on a single data store will affect all hosts that have VMs running on it. Likewise, if too many VMs on the same host are being backed up at the same time, it will create bottlenecks for all the VMs on that host.

A better option is to use continuous data protection. CDP backups will initially be large and resource-intensive. However, once the initial backup is complete, all future backups will generally be small. The reason for this is that small backups run constantly -- every few seconds to every few minutes -- as opposed to running a monolithic scheduled backup.

8. Virtual hard disk corruption

Just as a physical hard disk can become corrupt, so too can a virtual hard disk. If corruption exists within a virtual hard disk, then a backup application might have trouble backing up the corresponding VM.

Typically when this occurs, the backup application logs will contain either read errors or data integrity errors. These errors can be clues that corruption might exist within a virtual hard disk.

9. Not quiescing properly

Backups of VMs that are running Windows Server as a guest OS generally rely on the Volume Shadow Copy Service (VSS). This service performs a quiesce operation that enables applications running on the VM to be backed up in an application-consistent -- as opposed to a crash-consistent -- manner.

The Volume Shadow Copy Service uses a collection of VSS writers to facilitate the backup of various applications and OS components, such as the Active Directory. If any of the VSS writers required by the backup process were to fail, then the entire backup could fail as a result.

If an administrator suspects that a VSS failure might be to blame for a VM backup failure, they should check the state of the VSS writers within the virtual machine. The vssadmin list writers command, within the guest OS, displays the state of each VSS writer.

10. Using buggy applications

Virtual machine backups can fail because an application that is running on a VM is buggy. For example, Microsoft once released an Exchange Server patch called Cumulative Update 3 for Exchange Server 2013. Among other things, the patch contained a fix for a bug that randomly caused Exchange Server backups to fail.

If you are experiencing inconsistent problems with backing up a VM, check to see if there are any known bugs with applications running on the VM.

11. Security software configuration issues

Occasionally, security software might keep a backup from completing properly. For example, there have been plenty of documented instances of antimalware software interfering with certain backup applications. Similarly, some backup applications might require exceptions to be added to firewalls.

12. Starving backup servers of resources

Backup servers are basically like pumps: Data is read from a source, goes into the backup server and then is sent from the backup server to the target device. The volume that a backup server can handle is determined by the resources assigned to it, and the more resources that are available, the faster it can pump data.

Backing up VMs can heavily tax primary and backup storage resources, as well as the network, but there is more to backups than just moving data from Point A to Point B. Backup servers handle advanced functions, including deduplication, compression and determining which disk blocks need to be backed up. For a backup server to achieve maximum throughput, it must have sufficient resources to avoid creating a bottleneck in any one resource area.

Backup administrators should monitor the resource usage of the backup server. In practice, it's better for a backup server to have too many resources than too few. Ensuring that a backup server has the resources it needs can enable data to move at maximum speed. This will decrease the time required to back up data.

Brien Posey is a 22-time Microsoft MVP and a commercial astronaut candidate. In his more than 30 years in IT, he has served as a lead network engineer for the U.S. Department of Defense and a network administrator for some of the largest insurance companies in America.

Next Steps

Why and how to enable Azure Backup Instant Restore

Methods to restore an Azure VM

How to create an incremental snapshot in Azure

Dig Deeper on Disk-based backup

Disaster Recovery
Storage
ITChannel
Close