Snapshots have their pros and cons. They can be a cost-effective way to protect data and replicate instances of a database quickly. But they can also be the source of unexpected cloud costs. The latter makes implementing effective snapshot management strategies particularly important.
Before you can use this valuable tool efficiently, it's important to understand how snapshots are implemented. The first time a database snapshot is created, all the data on the source disk is copied to the snapshot. The next time a snapshot of that same disk is made, only the changes to the disk that followed the first copy are moved over.
This is ideal because the only data you need to pay for is what's stored with each snapshot, rather than paying for the redundant data already present in the backup. When restoring a snapshot, each completed snapshot and incremental snapshot is copied to re-create the state of the disk at the time the last snapshot was made.
This optimal solution to copying data can lead to some unexpected consequences, including the following:
- not saving much space when deleting snapshots;
- not tailoring snapshot lifecycle policy management to each environment; and
- incurring egress charges when copying snapshots across regions.
However, some snapshot management strategies are more helpful than others in avoiding these pitfalls.
Delete unnecessary snapshot data
For many organizations, a higher-than-expected storage bill could trigger an impulse to delete unnecessary data. For example, instead of the development team keeping the five latest snapshots for their development servers, the retention policy could be changed to store only three snapshots.
Many people might assume that change would save two snapshots worth of data, or 40% of the total snapshot size. However, the savings will end up being much less than that because of the way snapshots use incremental updates.
If the oldest snapshot is a full copy of a disk and it's referenced by the snapshot that is created next, when the oldest snapshot is deleted the data it contained will be copied to the next snapshot. This may be ideal from a data protection perspective, as no data is lost, but it doesn't do much if the goal is to cut costs.
Tailor snapshot policies to the environment
One way to save on snapshot storage costs is to customize snapshot management policies to a particular environment, such as development, test, user acceptance and production. Obviously, you want robust protection of data in production, though this can often lead to frequent snapshots. Development environments are a different story, though.
Often, developers will work from a common code base for applications that are regularly rebuilt and redeployed. This is part of the increasingly popular continuous integration/continuous delivery practice that enables development environments to have a limited need for snapshots.
Test environments used by developers and user acceptance testing are similar in that the environment is typically used for testing the latest changes to an application. If data is lost on a server in either test environment, it can be rebuilt much more easily than a data loss in the production environment and, therefore, has less need for snapshots than production data.
The database dependency factor
Another snapshot management factor to consider is the dependency of a snapshot on the database used to create it.
Database snapshots must be restored to a version of the database that is compatible with the version used to make the snapshot. This is generally not a problem when using snapshots for short-term data recovery, such as accidentally deleting data and quickly restoring it.
If snapshots are stored for long periods of time, however, the chances that the source database will be upgraded increases. This makes snapshots less appealing for long-term archival storage.
To get around this problem, make backups that are independent of the source database by exporting a snapshot and storing it in Amazon S3. When exported, data is saved in the Apache Parquet format, a widely used columnar format created as part of the Hadoop ecosystem. Any application that supports Parquet can import these files.
To save even more, store the exported data in an Infrequent Access S3 storage class bucket.
It is important to note that care should be taken when copying snapshots across regions, as it will incur egress charges. Instead, generate and store snapshots within a single region to minimize network charges.
Database services, such as AWS Relational Database Service, provide database snapshots. These are particularly useful for production backups, but they are not the best option for long-term backups. Knowing your different environments well will go a long way toward implementing snapshot management policies best suited to your organization's requirements and avoiding pitfalls, such as excessive egress charges.