MTTR (mean time to repair)
MTTR (mean time to repair) is the average time required to fix a failed component or device and return it to production status.
Mean time to repair includes the time it takes to find out about the failure, diagnose the problem and repair it. MTTR is a basic measure of how maintainable an organization's equipment is and, ultimately, is a reflection of how efficiently an organization can fix a problem.
Data storage professionals can use MTTR as a performance metric to evaluate how efficiently they are using their data storage resources. Once the mean time to repair is known, it can be used to modify and improve an organization's processes to reduce that figure and decrease the amount of lost productivity.
If the mean time to repair is already low for a device, it indicates that a component can be repaired quickly and efficiently.
If the MTTR is high, that information can lead to improvement changes within the organization. For example, administrators could build in resiliency to prevent future failures from happening and to offer quicker feedback mechanisms to notify people about issues.
The metric is also an important consideration when negotiating service-level agreements, and it can help engineers decide when to schedule system maintenance.
How MTTR, MTBF and MTTF compare
MTTR can be calculated by dividing the total time required for maintenance -- downtime -- by the total number of repairs within a specific time frame.
If the total time required to fix the issue is 120 minutes, and four breakdowns caused that downtime, an organization can conclude that a breakdown will take approximately 30 minutes to repair.
While MTTR calculates the time it takes to make a repair following a failure, MTBF -- mean time between failures -- refers to the average time between one failure and the next. MTBF can be calculated by dividing the total uptime by the total number of breakdowns.
The goal of any organization should be to decrease MTTR and increase the MTBF of a system. Generally, mean time to repair indicates efficiency in correcting processes, and mean time between failures indicates the reliability of a system.
While MTTR and MTBF depend on products that can be repaired, these metrics do not apply when something needs to be replaced entirely. MTTF -- mean time to failure -- predicts the failure rate for products that cannot be repaired. It's important for organizations to be aware of the difference between these three concepts, so they don't waste time focusing on how long it takes to repair a system when the best option could be to replace it with a new one.
How to improve MTTR
Modern monitoring technologies are an organization's best chance to reduce its mean time to repair. Monitoring, done on site or remotely through tablets or smartphones, can provide 24/7 insight into system performance. Using this up-to-date information, an organization can establish its MTTR and MTBF, allowing engineers to run preventative maintenance and to plan for repairs before a failure occurs.
MTTR, MTBF and MTTF can also be tracked by software that issues reports detailing failure rates and repair cycles for individual products.