Tech Accelerator What is APM? Application performance monitoring guide

Prev Next

Tip

How to handle root cause analysis of software defects

Root cause analysis plays a significant role in helping software teams fix defects in applications. Here's how to employ it to get the most out of the RCA process.

Stephen J. Bigelow, Senior Technology Editor

Published: 07 Apr 2025

Enterprise software involves a complex interplay of instructions, data, associated services and dependencies. When software defects occur -- as they inevitably do -- developers must identify and understand the underlying reasons for those glitches.

Root cause analysis (RCA) of software defects is an approach developers use to better understand why a fault occurred and to take steps to drive improvements. A key element in application performance monitoring (APM) programs and broader observability initiatives, the RCA process is akin to how a medical team wants to diagnose and cure a patient's illness rather than simply treat the symptoms.

In a broad sense, root cause analysis is a process that identifies underlying causes -- the whys -- of defects or failure events. Once the root cause is clear, a team can remediate the problem at its source. When software professionals perform the process properly, the development team can use RCA results to improve product design, testing and overall quality. Let's take a closer look at analyzing the root cause of software defects and how doing it effectively benefits organizations.

What are software defects?

From a software development perspective, a defect isn't just an application error message or a system crash because of a coding mistake. Defects are any deviation between an expected and actual result, such as when software works perfectly but doesn't do what the user expects.

This article is part of

What is APM? Application performance monitoring guide

Which also includes:
8 benefits of APM for businesses
APM vs. observability: Key differences explained
How to handle root cause analysis of software defects

Similarly, a defect represents a departure from the expectations outlined in a software requirements specification. Defects also occur in live software when preproduction tests fail to detect functional or performance problems.

Here are six examples of software defects:

Errors, oversights or gaps in the original software requirements. These defects can occur when a requirement is omitted or forgotten, phrased poorly, not properly understood by stakeholders or misunderstood by developers.
Errors in the design or architecture of the software. These problems occur when software designers create an inefficient software algorithm or process or when that algorithm or process doesn't yield the required precision in its results.
Errors in the coding or implementation. These defects include traditional bugs caused by everything from missing brackets to ungraceful error handling.
Errors in the test planning or test activities. These defects stem from inadequately tested features and functions.
Errors or oversights in the deployment. An example of these defects is when a DevOps team provisions inadequate VM resources.
Errors in the process or policies a team uses to govern the development cycle. These defects crop up when, for example, a team obtains sign-offs or approvals without adequate design, coding or testing review.

Once root cause analysis discovers the underlying issue, the team can take proactive steps to remediate the defect and prevent it from future occurrences. If the defect resulted from design errors, for example, developers can review the design and requirements documents and make the required corrections. If a testing mistake caused the defect, they can update the test cases and metrics.

Troubleshooting vs. root cause analysis

RCA and troubleshooting are different processes. Troubleshooting and general problem-solving methodologies solve specific problems. For example, if monitoring of an application's health and performance reveals that a software instance crashed and is unresponsive, the development team might resolve the problem by restarting the software instance or rebooting the server.

Root cause analysis of software defects, however, might reveal that the software becomes unresponsive because of a certain error condition. Perhaps the application can't access data, and it isn't designed to handle such errors gracefully. In response, the team can release a software patch that addresses the error handling and will likely prevent the problem from recurring.

Benefits of root cause analysis of software defects

Root cause analysis saves an organization money by helping it find and address software problems earlier in the SDLC. A business that uses RCA to nip issues in the bud can create higher-quality software faster and more cost-effectively. Root cause analysis that prevents problems from cropping up in live software also promotes customer satisfaction and protects a company's reputation.

The following are some common advantages of effective root cause analysis in software development:

Lower software defect rates.
Improved software quality by eliminating the same defects and repetitive mistakes.
Reduced development costs.
Shortened development cycles by reducing troubleshooting fixes and remediations.
Improved user and customer satisfaction.
Increased developer productivity because RCA enables a team to focus its effort on new features and improvements rather than fixes.
Identification of problems elsewhere in the development and production environments.

How to perform root cause analysis

A team can perform RCA in a wide variety of ways, but an organized, logical and objective approach is usually considered most appropriate and effective. The analysis typically examines application performance metrics, log data, distributed tracing results, help desk and trouble ticket details, and other evidence from an incident. As an RCA team scrutinizes this information, its members can begin to understand a defect's underlying causes and formulate strategies and recommendations to address them.

For the purposes of this discussion, consider an RCA team to be any group that gathers to discuss or determine root causes in search of corrective actions. Once in place, it should take the following steps.

1. Prepare to meet

RCA meetings can be held as needed -- perhaps in the wake of an unexpected, critical fault -- or as regularly scheduled occurrences within the software development team. The RCA team leader usually gathers details and data about each fault, including logs, traces, screenshots, reporting and other resources.

RCA team members can include representatives involved in each stage of the software's lifecycle, such as requirements, design, implementation, testing and operations, as well as anybody else involved in development. The team can also consist of individuals who worked to fix an initial problem if it recurs or a related one arises. Each RCA team member reviews the details and comes to the meeting prepared to discuss the issue from their own lifecycle stage.

Blameless reporting and recommendations

Root cause analysis of software defects only has value if a team objectively receives and implements RCA results. The biggest challenges with RCA initiatives involve the human concepts of blame perception and responsibility assignment. In other words, no one wants it on the record that a defect was their fault. Unfortunately, when an analysis points fingers, the resentment and morale loss that follows can undermine the benefits of root cause analysis and, in turn, lead to resistance from development team members, IT managers and business leaders.

It's crucial that all RCA efforts include blameless objectivity. Reporting and recommendations should always be framed as actionable steps that don't solely place the blame on an individual or team. When reporting and recommendations are blameless, a team is more likely to receive and implement changes without resentment or resistance.

2. Define the problem

With details available, the RCA team can meet to collectively assess the defect and its effect on the software. This phase of the discussion focuses on what happened by answering a variety of common analytical questions, including the following:

What is the problem?
What events or triggers led to the issue?
What systems or services did the issue affect?
How long did the issue last?
What effects did the issue have?
Who, if anyone, was involved?

3. Identify the underlying causes

After RCA team members review the evidence and clearly define the problem, they can consider the possible root cause or causes. They focus on why the defect happened and brainstorm, with help from APM software or other tools, to identify what caused it. The RCA team leader typically moderates this part of the meeting and ensures that all members can contribute ideas.

4. Select corrective actions

Once the RCA team identifies the likely root cause of a defect, it can decide on the most appropriate corrective action to address the underlying issues. Corrective actions can vary dramatically depending on the RCA finding, such as updating requirements, enforcing coding styles and standards, making specific changes or fixes to the software, adding test cases or making changes to the deployment environment.

The team should decide if it will add to the codebase fixes already made at the software level and if those changes require retesting. However, it must make sure that a fix doesn't affect any other features and functionality.

5. Select preventive actions

Software defects cost money to find and fix. By understanding the underlying cause of a problem, an RCA team's recommendations can show how to prevent similar problems in the same application or other ones. The final part of an RCA process should result in explicit guidance on that to drive ongoing development improvements. These suggestions are known as root cause preventive actions and can involve a wide range of recommendations, such as better documentation, more team training or skill set enhancements, process changes or IT infrastructure improvements.

Techniques for analyzing the root cause of software defects

Software teams can draw from numerous methods to conduct root cause analysis tasks, including the following techniques:

Fishbone diagrams. Also called an Ishikawa diagram or a cause-and-effect diagram, a fishbone diagram sorts possible root causes into categories that branch off from the original issue.
Five Whys. The Five Whys technique is a brainstorming exercise that asks a series of questions about why a problem occurred to help teams identify its root cause.
Scatter plots. A scatter plot is a two-dimensional diagram that places data points along an x-axis and a y-axis to show relationships that could point to a defect's root cause.
Failure mode and effects analysis. FMEA examines potential points of failure and analyzes their possible consequences, which can aid in the RCA process.
Pareto charts. A Pareto chart uses both bars and lines to map the frequency of a problem's common root causes, showing teams the most probable ones.

Fishbone diagrams and Five Whys are the most popular RCA techniques. Here are more details about how they work.

Fishbone diagram

A fishbone diagram is designed to help RCA teams visualize the potential root causes of a software problem for analysis. It resembles the skeleton of a fish -- thus the name. In practice, the underlying problem or issue is written at the "head" of the fish. The diagram's "bones" are the categories of possible causes. Analysts then identify the primary causes in each category; if necessary, secondary and tertiary causes can be added.

Five Whys

Asking why enables developers, IT operations professionals and others to drill down into successive layers of a software problem. The answer to each question becomes the basis for the next question. The process is similar to a child asking successive why questions -- each time the adult answers, the child uses that answer to pose another question. In root cause analysis, though, the goal is to explain why an issue happened by thinking through the situation in a logical way.

Five Whys analysis can be subjective since it doesn't use data or statistics, so the approach isn't suited for complex cases. Despite the name, it might require more than five questions to reach a root cause or take less than that, but five is often a starting point. Consider the following simple example with just four whys.

Problem: The log file from a software application is missing.

Why is the log file missing?
- The log file isn't present in the logical unit number or folder where it was anticipated.
Why is the log file not present?
- It wasn't enabled in the software application.
Why was the log feature not enabled?
- The software application wasn't configured properly.
Why was the software not configured properly?
- The development team inadequately documented the application or failed to complete a process to set up and use the software. The ultimate answer might be to enable the log and provide better documentation and user training.

Editor's note: Informa TechTarget editors updated this article in April 2025 for timeliness and to add new information.

Stephen J. Bigelow, senior technology editor at Informa TechTarget, has more than 30 years of technical writing experience in the PC and technology industry.

Next Steps

What details to include on a software defect report

Benefits of APM for businesses

APM vs. observability: Key differences explained

Using AI and machine learning for APM

Application performance monitoring vs. management