What you need to know about SharePoint SLAs
Define the terms in your governance document and commit only to services you can deliver
If your organization has drafted a SharePoint governance document, then it’s only a matter of time before the subject of service-level agreements, known as SLAs, comes up. Simply put, an SLA is a written agreement that specifies the requirements for server or application uptime and the penalties for not meeting those requirements. Your governance document outlines all of the rules and guidelines about how SharePoint should be used in your organization, so it’s the perfect place to include an SLA.
By far the biggest mistake that administrators make in setting up SLAs is coming up with some arbitrary availability number. That’s just taking the easy way out.
For some reason, it seems to have become popular for organizations to claim that they can deliver five nines of availability—that’s availability 99.999% of the time. Although it’s really easy to jump on the bandwagon and include this number in your SLA, it’s unrealistic for most small and medium-sized organizations to be able to deliver this type of availability. If you do the math, five nines of availability translates into roughly five minutes of downtime a year.
Although it’s easy to give in to the pressure of committing to high availability, doing it isn’t always smart. Many times annual performance reviews for IT staff members take into account whether or not a person has met his or her SLA obligations. In other words, if you aren’t able to meet the metrics that you have agreed to, then you may not get your next raise.
SLAs as a bargaining tool
Instead, use the SLA as a negotiating tool. In other words, if management wants your SharePoint organization to achieve five nines of availability, then it must be prepared to give you the hardware, software and training that are required to make that happen. Occasionally you might even be able to get higher-ups to throw in incentives for meeting availability goals.
So let’s pretend that you get your senior managers to agree to give you everything you need to achieve their availability goals for your SharePoint servers. Even in that situation, it’s not a good idea to add just one sentence to the governance document stating your goals. You need to take some additional steps to protect yourself and the rest of the IT staff in case you fail to meet those goals.
For starters, if you are creating an SLA from scratch, the best idea is to ease into it. Think about it for a moment—if you end up getting a bunch of new servers and other hardware for the sole purpose of making SharePoint available at all times, then you’ve got a lot of configuration work ahead of you. And, what do you think the odds are of getting everything to work perfectly right off the bat?
A more reasonable approach is to give yourself six months from the time of the initial deployment to when the SLA is actually enforced. That way, you can have time to fine tune your new servers.
It’s also a good idea to use this time to test various failover scenarios and to get the rest of the IT staff trained on the failover procedures. Also, take the time to evaluate the deployment’s availability on at least a monthly basis as you work through the fine-tuning process.
Measuring server availability
Another issue to address in the SLA section of your SharePoint governance document is specifically how the server’s availability will be measured. Many years ago I found myself in a situation in which one of the servers that I was in charge of went down unexpectedly. My boss was already in a bad mood that day, and although the outage didn’t last very long, he began ranting about the server going down so often.
In actuality, the server really didn’t fail very often. However, the burden of proof was on me. My only defense was the logs from the server.
To keep a situation like this from happening in your organization, it’s a good idea to document ahead of time what means will be used to track the server’s availability over time. While you’re at it, specifically define what counts as downtime. For instance, many organizations do not even allow servers to be taken down for scheduled maintenance. Others are a bit more lenient as long as notice is given ahead of time for scheduled maintenance.
Clear definition of downtime
Regardless of how your organization feels about downtime, it is important to have a clear definition of what does and does not constitute a breach of the SLA written into your governance policy.
One more thing to include in your SLA is a policy that allows for exceptions under usual circumstances. For example, suppose that Microsoft released the next version of SharePoint and, for whatever reason, the upgrade process required the existing SharePoint deployment to be taken off-line for about a day.
You don’t want to end up getting fired for violating the SLA just because you are performing an upgrade. That’s why you need to build some flexibility into your SLA that allows management to sign off on “approved downtime” under certain circumstances.
SLAs and governance policies go hand in hand. Although it would initially seem that creating an SLA would be one of the simplest parts of creating your governance document, you actually need to put some work into it for your own protection. If there is any chance at all that you could be penalized for failing to meet the SLA requirements, then your governance document needs to spell it out.
About the Author
Brien M. Posey has received Microsoft’s Most Valuable Professional award six times for his work with Windows Server, IIS, file systems/storage and Exchange Server. He has served as CIO for a nationwide chain of hospitals and healthcare facilities and was once a network administrator for Fort Knox.