Five tips for creating a backup service-level agreement

Learn how to write a backup service-level agreement that will keep the peace between IT and information users.

A data backup service-level agreement (SLA) is often a verbal understanding between the IT staff and the corporate community as a whole. But it's dangerous not to have backup SLAs as written documents with management buy-in. Without the commitment in writing, there's a grey area, and in my experience when there's a grey area, IT often loses. Really then, the first step is to actually have a backup SLA and to make sure it's written and signed off on by executive management.

Set specific timeframes for the recovery of files based on the age of those files.

Make sure that everyone understands that while you can do many things, miracles can't be guaranteed. If you are using disk-to-disk (D2D) backup or have a centralized storage system such as a storage area network (SAN) or network-attached storage (NAS), restores of files within the last few days are easier to access, are on a faster medium and will result in faster restore times. The older a dataset is from the restore request, the more difficult it will be for you to recover those files.

If what you can commit to has ramifications on regulatory or corporate governance concerns, you need to work those times into your backup SLA. And, if what you can commit to isn't acceptable to the organization's demands, you need to respond with a design that will; adding disk, improving the infrastructure or changing a process to accomplish objectives that are mutually acceptable.

While everyone believes that their file is the most important one at that point in time, it may not be.

What's more important, the CFO's financial projections spreadsheet or recovering the email server? Having a written set of your organization's priorities will keep you out of trouble.

Prioritization is not only based on the person or system needing recovery, but also ties to my first tip. It may make more sense to handle the data recoveries that will take you five minutes to find and recover vs. recoveries that will take a few hours. Get the easy jobs tackled first.

You can also make a priority based on the quality of user-provided information. A request such as "I need that file that I accidentally deleted, but I don't remember when it was or the name of it" probably should come after "I need 2005's version of payroll.xls that's in the finance directory."

Spell out restore times.

Most people don't understand that restores almost always take longer than backups -- and in many cases, significantly longer. Recovering a corrupted 100 GB Exchange store, for example, even if you have the latest copy and that copy is on disk, is going to take time. Plus, there's going to be the additional time to replay transaction logs to get the email environment backup up to date. Again, this goes back to setting expectations and setting priorities, but an understanding of restore times needs special clarification in a storage service-level agreement.

Make certain your users understand that you may not keep all their data forever.

In fact, with the current legislation surrounding electronic data, you may be asked to delete data long before the user would like, but in accordance with corporate and legal guidelines.

You should also ensure that there's an understanding that while you may attempt to keep data for a long period of time, there's a gradual decay in the reliability rate for that data as it ages. Unless the organization invests in specific technology or services to maintain certain types of data, and those solutions exist, there's likely to be a gradual reduction in your ability to recover specific information the older it gets.

Clarify the implications of backup windows.

It's important that users understand they contribute to a successful backup by making sure they're logged out of the system if possible, or at least have a minimal amount of documents open, especially at the start of the backup so snapshots can be made cleanly.

You should also explain what performance impact the backup is going to have on applications that the user is using during the backup window, most notably email, but other enterprise applications as well. If the degraded performance is an issue, requests should be made for technology to minimize this impact.

Explain that backups don't always work.

As you know, backups don't always are not always successful and as a result, data is either exposed for an extra usage window until the next backup can be performed or backup jobs need to be re-run during the usage window to make sure that data is captured. The SLA shouldn't guarantee the latest copy of data. If a guarantee of the latest copy of data is required, steps beyond backup to tape should be explored. Capabilities such as backup to disk, continuous data protection (CDP) and snapshots all provide significantly better reliability that straight backup to tape.

A backup SLA is a critical document in helping maintain the peace between IT and the users of that information. It also can help management understand the ramifications of not investing in certain types of data protection technology and may be a useful tool in communicating that gap.

A successful SLA is one that's realistic in what can be accomplished by the current data protection plan. It's neither overly conservative nor too optimistic. It must be agreed on by management and then well-communicated to the information users of the organization.

About the author: George Crump, founder of Storage Switzerland, is an independent storage analyst with over 25 years of experience in the storage industry.

Dig Deeper on Disk-based backup

Disaster Recovery