Backing up virtual servers is normally a relatively simple process. Most any backup application that is available today fully supports backing up virtual servers. However, one often overlooked aspect of virtual data center backups is clustering.
Clustering is used to make virtual machines fault-tolerant. If a host server were to fail, the virtual machine can fail over to another node in the cluster. If such a virtual machine were not clustered, then the failure of the host server would force the virtual machine offline.
Unless a backup application is fully cluster-aware, the backup and restoration process might not go as expected. This article explains some important things to take into account when backing up virtual machines in a Windows failover cluster.
What can go wrong?
You may wonder what would happen if you were to back up a clustered virtual server with a backup application that is not cluster-aware. Every backup application is different, but, usually, the backup process will succeed. The problem is that, if you ever have to perform a full restoration of a virtual machine, then the virtual machine will not be fault-tolerant after the restoration completes. In other words, the virtual machine will be fully functional, but if the underlying host server were to fail, then the virtual machine will not fail over to another cluster node, even though the virtual machine is running on a clustered host.
The method that you would use to make a virtual machine fault-tolerant once again depends on your virtualization platform. In a Microsoft environment, you would open the Failover Cluster Manager and manually designate the virtual machine as a clustered resource.
Important clustering considerations
Obviously, it is important to use cluster-aware backup software, but having backup software that directly supports clustering is not enough. Backups of virtual machines must adhere to certain criteria set by Microsoft.
The first of these criteria is that the cluster must be running and must have quorum, which means that a majority of the cluster nodes are running and can communicate with one another. Under normal circumstances, this requirement isn't even an issue. However, it does have implications for failure situations. If you are ever in a situation in which multiple cluster nodes fail and the cluster loses quorum, then you will not be able to make a cluster-level backup of your virtual machines until you correct the problem and return the cluster to a healthy state. Even during these types of situations, however, it might be possible to make off-line backups of your virtual machines.
Another very important issue to consider when planning your backups is cluster storage. Prior to the release of Windows Server 2012, failover clusters for virtual machines depended on a cluster shared volume that was accessible to all of the nodes in the cluster. In Windows Server 2012, the requirement for a cluster shared volume goes away, but Microsoft still recommends using a cluster shared volume as a best practice.
Assuming that your failover cluster does make use of a cluster shared volume, the storage architecture has a direct impact on the backup process. Although all of the nodes in the cluster are able to access the shared storage device, only one cluster node is able to communicate with the cluster shared volume at a time. The active cluster node controls the cluster shared volume. This is directly reflected within the Disk Management Console. If the cluster node is active, the Disk Management Console will show the cluster shared volume as being online. Otherwise, the cluster shared volume will be listed as Reserved, as shown in Figure A.
The reason why the volume's status is so important is simple. Your backup software has to be able to communicate with the volume on which the virtual machines are stored. If you run a backup against an individual cluster node, then only the disks that are online at the time of the backup can be included in the backup.
Restoring a cluster node
Any virtualization-aware backup application (aside from Windows Server Backup) should allow you to restore individual virtual machines. However, consider the cluster nodes themselves when making a disaster recovery plan.
More resources on failover clusters
Build a multi-site cluster
Managing Windows server clusters
Windows Server 2012 failover clustering
For the most part, restoring a cluster node is a fairly straightforward experience. The most important thing to know is that each node in the cluster maintains a copy of the cluster configuration data. When you perform a normal, bare-metal restoration of a failover cluster node, the restoration process brings the cluster node back to a functional state. At this point, the newly restored cluster node will reach out to the other nodes in the cluster and download the cluster configuration from one of those nodes. This brings the newly restored node to a current state.
If, for some reason, the cluster configuration data becomes corrupted, then you will have to perform an authoritative restoration. An authoritative restoration starts out similarly to a normal restoration in that you begin the process by performing a bare-metal restoration of a cluster node. Unlike a normal restoration, an authoritative restoration does not attempt to retrieve the cluster configuration data from another node in the cluster. Instead, the restoration process treats the cluster configuration data on the restored node as the most recent copy of the cluster configuration data. This data is then propagated to the other nodes in the cluster so that all of the cluster nodes use the same cluster configuration data.
The backup and restoration of virtual machines within a Windows failover cluster is fairly straightforward. Even so, you do have to make sure that your backup software is cluster-aware and that backups are being run against the active cluster node. Also, make sure that your backup strategy includes backing up individual cluster nodes and the cluster configuration data.
About the author:
Brien M. Posey, MCSE, has received Microsoft's MVP award for Exchange Server, Windows Server and Internet Information Server. Brien has served as CIO for a nationwide chain of hospitals and has been responsible for the Department of Information Management at Fort Knox. You can visit Brien's personal website at www.brienposey.com.