Using cloud storage as a backup destination is increasingly popular, and it can be a great way to have infinite off-site backup capacity. But before you leap in and back up to the cloud -- resolving your immediate backup capacity problem -- it is worth asking a few questions.
We've used Amazon Web Services (AWS) when discussing pricing and other issues, as it is the largest and best understood cloud provider. There are many other cloud providers that have backup services built on top of AWS.
What will it cost?
In theory, the storage capacity associated with backing up to the cloud and to an archive is close to free. The reality is that capacity is nowhere near as close to free as we might like.
AWS Glacier -- the cheapest option -- currently lists at $0.004 per GB, per month, so every 100 TB of archive data you store will cost approximately $5,000 per year. Bear in mind that archives required for compliance reasons accumulate quickly. If you store a new 100 TB archive on Glacier every month for three years, you spend more than $250,000 just for storage. Keep it up, and in the sixth year, you will spend more than $300,000 to store your 7 petabytes of archives.
There will also be costs to transfer the data. Inbound data is usually free, but outbound data will cost you. On AWS, the transfer into Glacier is free, but when you restore one of your 100 TB archives back to your data center, the download will cost around another $10,000.
How long will it take?
There are two parts to the how long question: How long the transfers will take and how long it will take for the restores to start. The good news is that most cloud backup systems can transfer data at tens of gigabytes per second. Your throughput will be limited by your network connection. You will need to analyze your current backups to identify the volume of data you need to transfer, and then calculate the speed of the network connection required.
More significant is the latency factor from the time you request a restore until the data starts to flow. Glacier is a cheap place to store data, but it can take as long as 12 hours to start a restore. After the start time, you will need to wait for the actual data transfer, which will take as long as the backup took to transfer. If you have a recovery time objective of six hours, for example, using Glacier to back up to the cloud might not be the best option.
Is it safe?
If you are using a smaller cloud provider to back up to the cloud, you should review its physical and network security.
Data security would be your next area of concern. Backup data is usually transferred over an encrypted tunnel, most often a Secure Sockets Layer connection.
The data is also encrypted at rest, but who holds the encryption keys? Anyone gaining access to the encrypted data and the encryption keys can decrypt your backups. Can you manage your own keys? If the provider manages the keys, how are they stored? Is the key store as highly available as the object store, where the encrypted data resides? The last thing you want is to be unable to restore because the data center with the key store is down.
Is it good for DR?
Do you plan to back up to the cloud for disaster recovery (DR) purposes? A DR event implies that your data center is not available and that you need to restore whole systems. Does the backup allow you to restore into virtual machines (VMs) on the cloud or onto new servers in a different data center? What do you need to have in place before the restore can start: bare metal or an installed OS?
You should also work through what happens if you restore into VMs on the cloud, and then want to move production back to your own data center.
What about my compliance?
Often, an archive is about regulatory compliance and requires a guaranteed unmodified state of your infrastructure at some point in the past. If you plan to use cloud backup as an archive store, check whether you can make certain backup points read-only. Also look at whether there is any automated lifecycle for these compliance archives. Having archive points deleted automatically when they are past their required retention time will help keep the storage bills under control and limit your e-discovery liability.
Guide to cloud backup and disaster recovery