Definition

mean time to detect (MTTD)

Mean time to detect or discover (MTTD) is a measure of how long a problem exists in an IT deployment before the appropriate parties become aware of it. MTTD is a key performance indicator (KPI) for IT incident management. A shorter MTTD indicates that users suffer from IT disruptions for less time than with a longer MTTD. MTTD may also be referred to as mean time to identify (MTTI).

Problem detection can come from people -- such as end users reporting a software outage -- or from systems monitoring and management tools. Generally, IT organizations should strive to detect an issue before an end user does, to minimize the disruption it causes, but this is not always possible. The onset of an issue should be recorded by affected IT equipment and the software programs that run on it. For example, a security intrusion could be tracked to a password entered on the breached system at a specific time. The MTTD KPI can indicate if IT monitoring technologies collect sufficient data and cover the probable sources of incidents.

How to calculate MTTD

The formula for MTTD is the sum of all the time incident detection times for a given technician, team or time period, divided by the number of incidents. This MTTD can then be compared to a previous time period, another incident response team or so on to gauge performance.

For example, the 24/7 IT operations support team for internal applications at a national bank tracks its MTTD monthly. In August, the team experienced eight incidents, and determined the start and discovery time of each from system logs, an intrusion detection system and help desk tickets filed by users (see table 1).

Start time

Detection time

Elapsed (min)

2:35 a.m.

3:42 a.m.

67

4:13 p.m.

8:30 p.m.

257

1:10 p.m.

1:55 p.m.

45

1:43 p.m.

2:25 p.m.

42

8:05 a.m.

11:16 a.m.

191

3:15 p.m.

3:30 p.m.

15

9:28 a.m.

4:14 p.m.

406

10:09 p.m.

12:32 p.m.

143

Table 1

The mean time to detect is calculated as:

(67 + 257 + 45 + 42 + 191 + 15 + 406 + 143)/8

MTTD = 145.75 minutes

Some organizations might choose to remove outliers from the equation (see table 2); in this case, 406 minutes is the highest TTD, and 15 minutes is the lowest. Without these outliers, the MTTD equals 124.17 minutes.

Start time

Detection time

Elapsed (min)

2:35 a.m.

3:42 a.m.

67

4:13 p.m.

8:30 p.m.

257

1:10 p.m.

1:55 p.m.

45

1:43 p.m.

2:25 p.m.

42

8:05 a.m.

11:16 a.m.

191

3:15 p.m.

3:30 p.m.

15

9:28 a.m.

4:14 p.m.

406

10:09 p.m.

12:32 p.m.

143

Table 2

Organizations also can tier incidents by severity (see table 3), for example to determine if the MTTD for security problems is decreasing, which is more important than if the MTTD for minor performance issues declines. In the example, MTTD for the most severe problems is significantly lower than the overall MTTD, at 42.33 minutes. 

Start time

Detection time

Elapsed (min)

Severity

2:35 a.m.

3:42 a.m.

67

High

4:13 p.m.

8:30 p.m.

257

Low

1:10 p.m.

1:55 p.m.

45

High

1:43 p.m.

2:25 p.m.

42

Medium

8:05 a.m.

11:16 a.m.

191

Medium

3:15 p.m.

3:30 p.m.

15

High

9:28 a.m.

4:14 p.m.

406

Low

10:09 p.m.

12:32 p.m.

143

Low

Table 3

If the MTTD for August is lower than the MTTD for July and June, the IT team might observe a trend of faster problem discovery, but the bar for what is considered a significant change is set by individual organizations, and improvement in incident response must incorporate other metrics.

Related IT incident management metrics

MTTD is one of several metrics used to gauge the efficiency and efficacy of IT incident response. Others include:

  • Mean time to repair or restore (MTTR), which is how long it takes to fix the problem once it is detected;
  • Mean time between failures (MTBF), which is how long the IT deployment goes without a performance degradation or outage;
  • First-time resolution rate, which shows how effectively the team troubleshoots a problem; and
  • Percentage of downtime over a given time period, such as 999% per year.
MTTD formula
The formula for MTTD

MTTD is used by IT organizations to gauge the effectiveness of an individual or team's monitoring and management systems and the communication routes from users, either internal or external customers, to the troubleshooting parties. It can be a way to test the difference made by a new tool or approach.

Combined metrics such as first-time resolution rate and MTTR show the troubleshooting skills and IT management capabilities of a response team. MTTD and MTTR combine show the overall timeline of incident response.

MTTD and MTTR
MTTD and MTTR cover the full timeline of a failure or incident.

MTTD does not reflect the security threat level to the deployment, nor its resiliency. For example, an organization might track the number of incidents in a given time period to determine how exposed its IT deployment is to attack or failure, regardless of how quickly these incidents are discovered and resolved.

This was last updated in October 2018

Continue Reading About mean time to detect (MTTD)

Dig Deeper on Systems automation and orchestration

Software Quality
App Architecture
Cloud Computing
SearchAWS
TheServerSide.com
Data Center
Close