Enterprise reliance on connectivity to keep operations running requires consistent and precise network performance monitoring. IT organizations need service quality guarantees from their providers that they are meeting or exceeding performance standards. Subpar service quality can result in diminished productivity, insufficient customer support and slower times to market.
Service providers compensate enterprise clients with service credits when network performance falls below standards set in their service-level agreements (SLAs). But this is a tradeoff most organizations would prefer not to make.
Network availability is arguably the most cited metric of an SLA, but network reliability is equally critical to assess performance. While the two measures are often used interchangeably, they are distinctly different. Both are essential to accurately assess network service quality.
What is network reliability?
Network reliability is the measure of the length of time infrastructure operates without disruption. Reliability is assessed using a couple different equations. The first is mean time between failures (MTBF), which is the network operating time between outages.
To arrive at that figure, network administrators divide the total service time by the number of network failures. So, if over the course of 100 hours, there were three network outages adding up to 4 hours of downtime, that would equate to 96 hours of service time, and MTBF would be 96 divided by 3 -- or 32 hours. The MTBF calculation can be seen here:
MTBF = Total operating time ÷ Number of network failures
The second way to calculate network reliability is to look at the failure rate, which gives network administrators the average time between failures. To arrive at that figure, IT staff divide the number of total failures by service time. In this case, that would be 3 divided by the service time of 96, resulting in a .03125 failure rate, or a little more than 3%. Administrators then deduct that failure rate from 100% to measure network reliability, which, in this case, is 96.875%. The failure rate and network reliability calculation can be seen here:
Failure rate = Total number of failures ÷ Total operating time
100% - failure rate = network reliability
Organizations should also look at how efficient and responsive their IT teams are to an outage by evaluating mean time to repair (MTTR). IT teams can calculate MTTR by adding the total time spent on repairs over the course of a specific time range and then divide that period by the number of repairs.
MTTR = Total repair time ÷ Total number of repairs
What is network availability?
Network availability is the percentage of time the infrastructure is operational during a given time period. In other words, uptime divided by total service time. The network availability calculation can be seen here:
Network availability = Network uptime ÷ (Uptime + Downtime)
Network availability provides a good snapshot of infrastructure accessibility by quantifying the percentage of time the network is operational. However, in most cases, network availability offers only a limited perspective into actual operational performance.
A network can be highly available but not particularly reliable. If, for example, the network availability measures one hour of downtime for every 100 hours of service time, then that's a 99% availability rate. That may look good on paper, but over the course of a year, that would mean the network was out of service for more than three days. A network that achieves 99.9% availability is down for nearly nine hours annually.
Network reliability, on the other hand, spotlights how well the infrastructure runs to support functional processes. A network with a lengthy MTBF or a low failure rate is likely to complete transactions and processes on a consistent basis.
Measuring network availability is only one part of the performance equation. IT also needs to track dependability to confirm the infrastructure is providing optimal service levels to support business processes.
Network reliability + availability = service quality
To accurately assess infrastructure performance, network administrators need to look at network reliability and availability together. IT managers can track reliability and availability of individual equipment, such as routers and servers. But a better measure of real operational performance is to examine connection uptime. In other words, total connection uptime divided by total time in service.
Network managers can drill down and isolate availability and reliability metrics for different segments and paths on the network to uncover configuration inefficiencies and better plan for redundancies between data centers or other enterprise resources. They can also use this information to identify resources that need upgrades.
Two other techniques can also be used to help managers understand real-world operational conditions. The first, reactive monitoring, measures availability and reliability of a production network on an ongoing basis.
The second, proactive monitoring, employs synthetic traffic that is sent across the network. Its transmission is measured by performance tools that can be used for troubleshooting and determining optimal performance.
Test traffic is also generated to diagnose configuration errors and equipment issues. The data derived from proactive monitoring can be used in other areas as well. For example, prior to deploying a new application, IT can test it on the network to identify any potential issues so code changes or other adjustments can be made in advance of the rollout.
Finally, proactive monitoring can be used to validate reactive data. This information can be helpful to support SLA metrics and identify where changes should be made to better meet operational goals. IT can also use this data to plan failover measures.
Dig Deeper on Network management and monitoring
Related Q&A from Amy Larsen DeCarlo
Enterprises need to ensure network service-level agreements provide a comprehensive view of network service performance to support business ... Continue Reading
The rise of distributed networks, mobile devices and cyber threats has spurred the ongoing convergence of network management and network security ... Continue Reading
The benefits include simplified network monitoring and automation capabilities. The challenges include data quality questions and integration ... Continue Reading