10 DevSecOps metrics that actually measure success
Knowing which metrics to monitor is a good place to start when measuring success. Here are the ones to collect, and what to do once you have them.
Metrics are a staple of the modern enterprise. The measurements that metrics provide tell business and technology leaders a great deal about the organization's services and operations.
As software development takes center stage in many organizations, leaders are relying on metrics to drive software projects, gauge software teams, streamline development processes and improve security for software and infrastructure. But metrics aren't created equal -- leaders must consider what metrics matter for their DevOps and DevSecOps success.
The role of DevSecOps metrics
From a broad perspective, metrics are measurements that relate to something's performance, behaviors or properties. Most metrics involve time, rate or volume.
Common metrics measure the time it takes to do something, or the number of times something could be accomplished at a given rate. When considered and implemented properly, metrics provide decision-makers with an objective understanding of what's happening -- or might happen later -- and help them make informed decisions accordingly, such as:
- whether a process or service is stable and repeatable, or unpredictable or erratic;
- whether a process or service performs successfully, or experiences errors or disruptions;
- whether goals are being met and if the process or service achieves the intended business results;
- how processes, services and products compare; and
- how and when to manage change.
Metrics have become a focal point of software development and help refine software quality as well as the processes used to create software products. Modern development tools along agile development toolchains -- including DevOps and DevSecOps -- can produce significant amounts of data about the creation and operation of a software product.
Data-driven metrics help workload owners and stakeholders derive the best outcomes for the business by addressing key questions:
- Is the software secure, and does it operate properly and reliably?
- Is the software or the infrastructure under attack?
- Does the software deliver expected value or achieve business goals?
- Does the infrastructure support the software properly in terms of performance or reliability?
- Can the software or infrastructure be supported and expanded adequately?
- How do costs and risks factor into new development?
IT organizations use metrics to report on the number of software defects and the average time needed to address those flaws -- including discovered vulnerabilities that might need patching. The number and type of issues might indicate software quality concerns, such as team performance and development guidelines, while time-to-fix metrics illustrate the efficiency or effectiveness of the underlying process.
But which metrics are best suited for DevSecOps environments?
10 key metrics for DevSecOps
There are a myriad of possible metrics that a business can draw from, but there is no single, uniform set of metrics for every business. The business drives metrics, not the other way around, so business and IT leaders must decide which metrics are meaningful to the organization, and how to implement and use them.
While a business can use any metrics that are relevant to its operations and goals, a suite of 10 common metrics suited for DevSecOps can include:
- Application change time.
This metric reports the time between a code commit and deployment in production. It's an indication of the development pipeline velocity that includes the time used to build, test and release an update. Shorter times can suggest more efficient development pipelines, but always consider one metric with another, such as failure or rework rates, to better understand the DevSecOps process.
- Application deployment frequency.
This is the number of deployments to production in a time period. This metric should not be used alone and is best interpreted in conjunction with other metrics. For example, a low deployment frequency might be acceptable in a proven and mature product, while a high deployment frequency is common in new or less mature product lifecycles. In addition, a low deployment frequency in the face of high issue volume or long patch times could suggest problems with the team or workflow, which demand closer examination.
The availability metric measures the uptime or downtime of an application over a given time period. This metric can be reported as time values or percentages. Availability is an important metric because it relates to application service-level agreements that the business must support.
- Change failure rate.
This metric represents the number or percentage of failed production deployments that result in an aborted deployment or restoration to the previous working version. A high failure rate -- usually not more than a few percent -- could indicate a problem with team skills or unclear operational goals, deployment process, or understanding and management of the existing deployment infrastructure.
- Change volume.
This is the number of new features or functions deployed in a given time. This metric is a general indicator of development velocity. More changes over time can indicate a strong development effort, but must be viewed in context. A high change volume with a low failure rate and low issue volume suggests a high tempo of successful development. A high change volume with a high failure rate or high issue volume might indicate the development team is struggling.
- Issue resolution time.
This is the average time needed to resolve a reported issue. Basically, this metric signals how long it takes to identify and fix a reported software defect or configuration problem. The scope of this metric varies for every business. For example, the time could run from the initial help ticket creation to the patch deployment. Similarly, the issue might be related to the deployment environment, such as the time needed to find and fix a server security configuration.
- Issue volume.
As it sounds, issue volume describes the number of issues customers report in a given time period, such as a help desk ticket creation rate. It's common to see spikes in issue volumes when software is updated or patched, but a sustained high issue volume might indicate customer dissatisfaction or broader development problems that the team is struggling to address.
- Mean time to recovery (MTTR).
Generally, this metric is the time span between a failed deployment and subsequent full restoration of production operations. Short MTTR metrics can indicate a capable DevSecOps team with strong control of the deployment environment, while long MTTR figures suggest problems with deployment preparation, workflows and operational knowledge. Long MTTRs can be detrimental to the business and often elicit strong responses from business leaders.
- Time to patch.
This is the time between identifying a vulnerability in the application and successful production deployment of a patch. While this metric is similar to issue resolution time, it's a more granular indication of the ability of DevSecOps developers and teams to find and fix a software defect.
- Time to value.
This is the time between a feature or function request and the realization of business value, such as software capabilities, competitiveness and revenue. This is the most nebulous metric and must be tailored to specific business goals. But every business seeks a short time to value.
Tools to collect and analyze metrics
Where do DevSecOps metrics come from? The answer can be confusing, because metrics can originate from many different tools used at points across the entire development pipeline. For example, suitable tools can include:
- build and release tools such as Git, Azure DevOps, Octopus Deploy and Jenkins;
- configuration management tools for DevOps, such as Ansible, Puppet and Chef;
- test automation tools, such as Selenium, Worksoft and Kobiton; and
- deployment and monitoring tools such as Nagios, Splunk and SolarWinds AppOptics.
Comprehensive metrics can be derived from tools already in use. However, the current DevSecOps toolset might not provide native support for all metrics of interest. Organizations must review existing tools, evaluate the available metrics each tool supports natively, and determine whether and how each tool can be configured with custom metrics suited for the specific business. In some cases, additional customization or tooling might be required to support unusual or business-specific metrics like time to value.
Using metrics for DevSecOps
Metrics are not perfect. They are just numbers, and those numbers mean nothing without proper human interpretation and understanding. Although metrics provide data points, they offer no guidance or insight on how to proceed; they illustrate the what, but not the why. The power and risk of metrics lies in how the business collects, interprets and uses those numbers.
Interpretation and understanding are important when metrics are used to drive change, such as new tool adoption or staffing changes. For example, consider change failure rate. What failure rate is acceptable? More importantly, what is the cause behind an unacceptable failure rate? It might be a staff skillset or workload problem, a toolset problem or a process or workflow flaw.
Metrics must be matched with a detailed understanding of the people, tools and processes in use throughout the development and deployment environment. Suppose that a metric such as change volume decreases suddenly. The metric itself does not identify the underlying cause. But intelligent assessment might reveal that a tool along the DevSecOps toolchain was updated or replaced, and users are learning to integrate and use the new platform -- meaning the issue could be only temporary and will resolve itself soon. The successful use of DevSecOps metrics depends on people, not tools.