When planning for the provision of contracted technology disaster recovery services, a service-level agreement is probably one of the most important items on your checklist.
A service-level agreement (SLA) is essentially a contract between your organization and the proposed service provider that specifies the products/services to be provided, expected performance levels from the vendor and performance expectations made by the customer. It might also specify any penalties or remedies for failure to achieve the agreed-upon SLA metrics. SLAs are essential tools to ensure that the products/services you obtain are acceptable.
A companion to this article is a free disaster recovery service-level agreement template you can use for DR products and services. There are many different formats and styles for a disaster recovery SLA, from a very simple document to tables with detailed performance expectations for a broad range of activities.
The included service-level agreement template below is an example of an SLA with a cloud DR service provider. In your SLA, be sure to specify financial penalties and remedies if performance or response time is unacceptable. If a vendor doesn't accept SLAs, seek another vendor.
Types of service-level agreements
You can tailor the service-level agreement template included with this article to support a wide variety of IT situations, although the template is centered on disaster recovery.
IT product and service providers, cloud computing providers and network service providers execute SLAs with customers to establish expectations and penalties for nonperformance.
They might also use SLAs to demonstrate their capabilities and commitment to service versus competing vendors.
The three main types of service-level agreements are identified as such:
- Service-based SLA. This is for a service -- often, a managed service -- and it establishes performance parameters for all customers using that service.
- Customer-based SLA. This is based solely on an agreement between the vendor and the individual customer and covers all services being provided to that customer.
- Multi-level SLA. This SLA focuses on corporate activities and covers all users in the customer organization. It's used to avoid duplicate or conflicting agreements across the organization.
SLA goals and objectives
When procuring a managed service to support DR requirements such as data backup and recovery, customers want assurances that the service will be available and functional when needed. The main SLA objectives include performance metrics, downtime metrics, and recovery and repair metrics. These metrics establish what constitutes acceptable performance, minimum time between failures and the minimum time to repair. Other relevant metrics agreed to by the vendor and customer can also be included.
An additional -- and often just as important -- objective is to establish penalties and remedies for unacceptable or marginal performance against the agreed upon metrics.
Service-level agreement components
So far, we have outlined the fundamental reasons for executing an SLA and the goals to be achieved by the SLA. The following table provides a checklist of relevant SLA components for a DR application.
For a disaster recovery SLA to be successful, the parties must agree on what is provided, the metrics to be satisfied, the method of monitoring and reporting service delivery, and remedies for failure to satisfy SLA requirements.
Services applicable to service-level agreements
Here are some examples of services that are candidates for internal metrics included in a disaster recovery SLA:
- Fulfillment of contracted recovery time objectives following a disruption;
- Fulfillment of contracted recovery point objectives following a disruption;
- Completion of one risk assessmentfor each business unit per year;
- Completion of one tabletop exercise for the main DR plan;
- Completion of failover/failback tests on mission-critical applications as identified in the business impact analysis; and
- Review and update of business impact analysis data annually.
The following are examples of service-level agreements for externally provided services:
- Speed of backup of mission-critical data files by a cloud backup service provider;
- Work area recovery centers, specifically how quickly the customer can access its agreed-upon workspace upon a disaster declaration;
- Recovery of internet connectivity following disruption of local access facilities;
- Time required to failover mission-critical applications from primary to backup servers; and
- Time required to failback recovered systems via a cloud-based service.
Disaster recovery metrics and SLAs
To evaluate performance for disaster recovery service-level agreements, benchmarks such as tier 1 and tier 2 metrics must exist. High-level DR metrics are considered tier 1.
Tier 2 metrics can be more detailed than tier 1 and can be found in technology DR plans. They are often based on DR professional standards such as the National Institute for Standards and Technology SP 800-34, Contingency Planning Guide for Information Technology (IT) Systems.
As you'll see in our disaster recovery service-level agreement template, key components of SLA development include the identification of performance metrics, agreement to them by all parties, a process for monitoring service delivery against the metrics, plus a process for evaluating performance and resolving SLA violations.
Reviewing the service-level agreement
As with any kind of legal document, your organization's legal department should review and approve the service-level agreement before it's signed. Depending on how the SLA is structured, it can protect your organization, the service provider or both.
When reviewing a disaster recovery SLA, make sure customer requirements and service provider requirements are covered. As a customer, you'll likely want an SLA to ensure that your service provider delivers products and services according to a set of agreed-upon expectations. Downtime and uptime requirements are common concerns for customers, so they should be included in the agreement.
The service provider might require a customer to take steps to protect any intellectual property made available to them. Service providers might also designate circumstances where they aren't liable to meet performance requirements, such as outside circumstances, such as fires or natural disasters, that damage the provider's equipment or cause a disruption.
Don't be surprised if most of your vendors have their own service-level agreement. If a vendor seems reluctant to accept your desire for an SLA, it's probably a strong clue that their performance might not fulfill your expectations. The best strategy is to have your own SLAs in place, review the vendor's disaster recovery SLA, make your decision as to the way to go and have your legal staff review everything before signing.
Watch out for unexpected disaster-recovery-as-a-service costs
Crafting a cloud DR plan
What's the difference between a service-level objective and a service-level agreement?