A cloud SLA (cloud service-level agreement) is an agreement between a cloud service provider and a customer that ensures a minimum level of service is maintained. It guarantees levels of reliability, availability and responsiveness to systems and applications; specifies who governs when there is a service interruption; and describes penalties if service levels are not met.
A cloud infrastructure can span geographies, networks and systems that are both physical and virtual. While the exact metrics of a cloud SLA can vary by service provider, the areas covered are uniform:
- volume and quality of work (including precision and accuracy);
- responsiveness; and
The SLA document aims to establish a mutual understanding of the services, prioritized areas, responsibilities, guarantees and warranties provided by the service provider. It clearly outlines metrics and responsibilities among the parties involved in cloud configurations, such as the specific amount of response time to report or address system failures.
The importance of a cloud SLA
Service-level agreements are fundamental as more organizations rely on external providers for their critical systems, applications and data. A cloud SLA ensures cloud providers meet certain enterprise-level requirements and provide customers with a clearly defined set of deliverables. It also describes financial penalties, such as credits for service time, if the provider fails to live up to the guaranteed terms.
A cloud SLA's role is essentially the same as any contract -- it is a blueprint that governs the relationship between a customer and provider. These agreed-upon rules create a trusted foundation upon which a customer commits to use a cloud providers' services. They also reflect the provider's commitments to its quality of service (QoS) and underlying infrastructure.
What to look for in a cloud SLA
The cloud SLA should outline the responsibilities of each party, the acceptable performance parameters, a description of the applications and services covered under the agreement, procedures for monitoring service levels, and a schedule for the remediation of outages. SLAs commonly use technical definitions to quantify the level of service, such as mean time between failures (MTBF) or mean time to repair (MTTR), which specifies a target or minimum value for service-level performance.
The defined level of services should be specific and measurable, so that they can be benchmarked and, if stipulated by the agreement, trigger rewards or penalties accordingly.
A typical compute and cloud SLA articulates precise levels of service, as well as the recourse or compensation the user is entitled to should the provider fail to deliver the service as described. Another key area is service availability, which specifies the maximum amount of time a read request can take, how many retries are allowed and other factors.
The cloud SLA should also define compensation for users if the specifications aren't met. A cloud service provider usually offers a tiered service credit plan that gives users credits based on the discrepancy between SLA specifications and the actual service levels delivered.
Selecting and monitoring cloud SLA metrics
Most cloud providers publicly provide details of the service levels that users can expect, and these will likely be the same for all users. However, an enterprise selecting a cloud service may be able to negotiate a more customized deal. For example, the cloud SLA for a cloud storage service might include unique specifications for retention policies, the number of copies to retain and storage locations.
Cloud service-level agreements may be more detailed to cover governance, security specifications, compliance, and performance and uptime statistics. They should address security and encryption practices for data protection and data privacy, disaster recovery expectations, data location, as well as data access and portability.
Verifying cloud service levels
Customers can monitor service metrics such as uptime, performance, security, etc., through a cloud provider's native tooling or a portal. Another option is to use a third-party tool to track the performance baselines of cloud services, including how resources are allocated (e.g., memory in a virtual machine, or VM) and security.
It is important that the cloud SLA uses clear language to define terms. Such language governs, for example, inaccessibility of a service and who is responsible -- slow or intermittent loading may be attributed to latency in the public internet, which is outside the cloud provider's control. Providers also typically specify and exempt any downtimes due to scheduled maintenance periods, which are usually, but not always, regularly scheduled and reoccurring.
Negotiating a cloud SLA
Most general cloud services are straightforward and universal with little variance, such as infrastructure as a service (IaaS) options. There may be more room to negotiate terms in specific custom areas such as data retention criteria, or in pricing and compensation/penalty. Negotiating power typically scales with the size of the customer, but there may be room to score more favorable terms. Be prepared to negotiate for any customized services or applications delivered through the cloud.
When entering any cloud SLA negotiation, it's important to protect the business by clarifying uptimes. A good SLA protects both the customer and supplier from missed expectations. For example, 99.9% uptime ("three nines") is a common stipulation that translates to nine hours of outage per year; 99.999% ("five nines") means roughly five minutes of annual downtime. Some mission-critical data may require higher levels of availability, such as fractions of a second of annual downtime. Consider multiple regions or zones to help minimize the impact of a major outages.
Be aware that some areas of cloud SLA negotiations amount to unnecessary insurance. Few use cases require the highest uptime guarantees, which require extra engineering work and costs, and may be better served with private on-premises infrastructure.
Pay attention to where data resides with a given cloud provider. Many compliance regulations, such as HIPAA (Health Insurance Portability and Accountability Act), require data to be kept in specific regions with certain privacy guidelines. The cloud customer owns and is responsible for this data, so be sure these requirements are built into the SLA and are validated by auditing and reporting.
Finally, the cloud SLA should include an exit strategy that outlines the expectations of the provider to ensure a smooth transition.
Scaling a cloud SLA
Most SLAs are negotiated to meet the customer's current needs, but many businesses change dramatically in size over time. A solid cloud service-level agreement outlines intervals where the contract is reviewed and potentially adjusted to meet an organization's changing needs.
Some vendors build in notification workflows that trigger when a cloud service-level agreement is close to being breached, so new negotiations can be initiated based on the changes in scale. This can cover uptime availability levels or usage that exceeds criteria and might warrant an upgrade to a new service tier.
Cloud SLA examples
Below are links to cloud SLAs from the major public cloud platforms. Many individual cloud services require separate SLAs -- each of these vendors lists dozens of such SLAs.