kras99 - Fotolia
IT professionals are increasingly counting on snapshots as a form of protecting data in their virtual environment. In seconds, snapshots provide a frozen, secondary instance of data. This instance can be backed up, replicated or even used as the baseline to start another virtual machine (VM).
However, there are two challenges when counting on snapshots for data protection. First, a snapshot is an instance, not a full copy. Second, snapshots can be implemented at the VM, hypervisor, backup software or storage array. Deciding which location should trigger and manage the snapshot can be confusing. In this article, you will find out how to overcome the inherent weakness in snapshots and how to select the right snapshot method for your data center.
What is a snapshot?
Snapshots take advantage of the way data is organized on a storage device to create a point-in-time instance of the original data set. Most file systems and storage systems have a two-tier organizational method for data. The first tier is metadata. The metadata tier is a small catalog that points to the second tier, the actual location of the data on disk.
Instead of copying all the physical data in tier 2, snapshots just copy the metadata in tier 1. That copy is made almost instantly, and takes very little additional storage capacity. Then, the blocks that are part of the snapshot are set to read-only. Going forward, the snapshot manager maintains two copies of metadata, an active copy that the production applications will continue to update, and a static copy, used by other applications like backup, replication and so on. The number of metadata copies grows with the number of active snapshots.
An important differentiator for snapshot technologies is how they handle the modification of a block of data by production applications or users. Snapshots will typically use one of two methods to manage changes while maintaining snapshot integrity. The first option copies the old data to a new location, updating the snapshot metadata to allow access the old block. The second option writes the modified block to a new location and updates the active metadata copy. Of course, each product has its own nuances, but in general, they fall into one of these two types.
In either case, the snapshot data set is entirely dependent on the primary copy being accessible and the consumption of space occurs as the number of snapshots and the time those snapshots are retained increases.
Snapshot + replication
Because snapshots are entirely dependent on the source data set, if that source data set is lost, due to a storage infrastructure or site failure, the snapshot copies are also lost. As is, this vulnerability leaves the snapshot use case limited to recovering from data corruption or accidental file deletion.
However, since snapshots track block-level changes, the technology can also be used to replicate data efficiently. Snapshot-based replication copies only the blocks that have changed since the original snapshot. After the initial replication of data, these small block transfers are ideal for updating a WAN-connected system located in another data center. And in this scenario, snapshots reside on a secondary system, so they are no longer dependent on the primary system for data.
This independence makes replication essential to making snapshots useful for a broader set of data protection use cases. Another option is to copy snapshot data to a secondary system in the primary data center in addition to an off-site system. Then, if the primary storage system fails, the secondary on-site system has all the protected data for fast restores, while the off-site system has all the data for disaster recovery. Depending on the type of snapshot manager selected, that secondary system could be less expensive, potentially driving down cost.
So, what's a snapshot manager?
A snapshot manager is the software that triggers the snapshot and manages the multiple copies of metadata, keeping them up to date as the active data set changes. The snapshot manager is often part of something else, like an application, a file system, hypervisor, software-defined storage platform or physical storage array. Each of these implementations has unique advantages, and many data centers will choose to use a mixture of products in order to meet their data protection and recovery goals.
Some applications have the ability to create and manage snapshot and replication jobs for the data they create. Also, third-party snapshot and replication utilities are often built for specific applications. While limited in scope, these products have the advantage of application awareness. They can gracefully put an application into a quiesced state, while monitoring specific processes to confirm that the database is still up and running. If one of these processes stops responding, the snapshot can trigger an automatic recovery. Another advantage of the application-aware snapshot method is that these products can direct replicated data to almost any secondary storage device, potentially lowering overall storage costs. The shortcoming is that these products are limited to the application(s) that they support, meaning that the data center may require a separate snapshot process for each application.
File system snapshots
Increasingly, snapshot capabilities are built into file systems. This snapshot method is similar to application snapshots but it operates on the whole file system instead of just an application. This is important, because file system APIs can be used to trigger quiesced snapshots of applications. These snapshots work across applications, but are limited to the operating system and virtual machine. This means that each operating system in the environment will require its own snapshot technology. Also, most file system snapshots cannot be managed centrally. Each server's snapshot schedule has to be individually managed and monitored. For a large data center, this could lead to hundreds of individual snapshot jobs to track.
In a virtual environment, snapshots can be triggered at the hypervisor layer, simplifying snapshot management. Instead of performing and monitoring snapshots per virtual machine and application, control is consolidated to the hypervisor. For example, VMware snapshots can be managed from within vCenter. Like application and file system snapshots, the snapshot and replication target can be a secondary storage system from any manufacturer, because snapshots are implemented at the hypervisor level.
All three of the above technologies (application, file system and hypervisor snapshot methods) will typically exhibit performance problems as the number of and age of snapshots increase. Enter the storage-based snapshot method.
Storage infrastructure snapshots
The most commonly used snapshot method is via the storage infrastructure, usually performed by the storage hardware. There are several advantages of using hardware-based snapshots. First, snapshots are triggered per volume or system; there are fewer snapshot jobs to manage. Second, in most cases, hundreds of snapshots can be maintained without significantly impacting performance, thanks to dedicated storage processors handling the various metadata tables.
The downside is that hardware-based snapshots are limited to their replication target. In many cases, the two storage systems must be from the same hardware vendor, but increasingly, these vendors allow the use of a lower-cost system from within their portfolio as the secondary target. Another downside is that if the data center has multiple storage systems, each storage system will have its own snapshot manager that needs to be monitored separately.
Software-defined storage (SDS), assuming it supports snapshots and replication, solves these two issues by providing a common engine across multiple systems. As a result, management is consolidated to a single interface.
Backup application snapshots
Backup applications serve two use cases as it pertains to snapshots. In the first use case, the software performs and manages the snapshot. In the second, the software can trigger the snapshot on another device and then provide management of that snapshot. In the first case, the backup application essentially replaces the snapshot capabilities of all the other methods listed above. In the second case, the software manages, orchestrates and organizes snapshot data.
The second use case is most interesting. It allows the use of the best snapshot technology from various hardware vendors, and it adds the ability to efficiently search for data within the snapshot. The challenge with the second use case is that storage system support is limited, but as the support grows, it could be a compelling use case.
Selecting the right snapshot for your virtual environment
Many data centers will need to use multiple snapshot methods. For example, application-awareness can be valuable. In that case, it may be worthwhile to manage snapshots for mission-critical apps separately, using an application-specific snapshot method.
Also, most large data centers will not be able to use file-system or hypervisor-based snapshots exclusively due to performance concerns. Large organizations typically use native storage system or backup software snapshot capabilities. However, the snapshot capabilities of file systems and hypervisors are still necessary, because the storage and backup snapshots can use them as a framework to accurately capture snapshot data.
When selecting a snapshot method, start by selecting an approach that provides the broadest coverage possible first, and if issues arise, select an additional snapshot method that addresses specific needs.
A closer look at storage snapshots
Storage snapshots vs. clones
Snapshot technology today