Kit Wai Chan - Fotolia

Tip

How to manage technical debt in IT organizations

"Debt" is a scary word even without "technical" before it. We can't help you with dollars and cents, but we can share what IT technical debt is and how to create a debt reduction plan.

Kurt Marko

By

Kurt Marko, MarkoInsights

Published: 15 Sep 2020

The IT industry uses a lot of metaphors: cloud to describe a shared computing utility; fabric to denote a network of interconnected nodes; even mouse for a rolling pointing device with a cord -- the metaphorical tail. And technical debt for a daunting accumulation of unresolved issues.

Used in IT, technical debt describes a situation in which the organization has neglected or under-invested in software, infrastructure, training or documentation. As a result, the team has accrued systemic deficiencies that it must rectify eventually.

What is IT technical debt?

The concept of technical debt originated in the software development world, typically in reference to applications where developers made sacrifices related to the overall system architecture and design, and instead jumped right into coding. For example, it is akin to building a house without having a complete set of blueprints. It sacrifices critical activity where there aren't obvious exterior signs of progress for the appearance of action.

Technical debt extends to IT infrastructure. An overall IT infrastructure architecture has been neglected to the point of obsolescence, with little to no investments in new hardware, software updates or security patches. It works, but it can easily cause problems, and probably could work much better.

Technical debt management overtakes important work over time. — Figure 1. Technical debt management overtakes important work over time.

Often called legacy infrastructure, these systems fall behind the state of modern technology; suffer mounting component failures with age; and eventually become bug-ridden security risks running outdated OSes and application software.

Technical debt accrues interest, similar to financial debt. The longer the debt is allowed to accumulate, the more costly it becomes to rectify. Each slapdash fix and ignored patch digs a deeper hole -- and makes climbing out increasingly difficult. Furthermore, a production environment burdened with technical debt requires that IT admins dedicate more and more time to operational troubleshooting, as the example in Figure 1 demonstrates. It steals time away from innovation and work on new services. More than 80% of the respondents to a 2018 Accenture survey of federal IT leaders agreed that technical debt limits their organization's ability to innovate, and increases its costs dramatically.

80% of the Accenture survey respondents cited technical debt as an active problem in their organization. — Figure 2. Across multiple categories of technical debt, 80% of respondents cited the debt as an active problem in their organization.

Examples of infrastructure technical debt

Technical debt can occur in every corner of IT infrastructure. Common examples include the following:

Obsolete servers near their end-of-service lifecycle. These systems have an increasing frequency of hardware failures and struggle to run modern infrastructure stacks, such as the latest virtualization or container software.
Gigabit top-of-rack network switches. These boxes are generations behind the latest 25 to 100 Gb Ethernet technology.
A storage array on an old Fibre Channel storage area network that has become a performance bottleneck and lacks the flexibility to run modern applications. If the SAN cannot support object interfaces, Ethernet LANs, or non-volatile memory express (NVMe) drives, it is a source of technical debt.

Accrued IT infrastructure technical debt is unique to every organization, but there is a sure way to spot it. Catalog discontinuity points, Accenture's term for the limitations and decrepitude of mission-critical IT systems. If it makes the IT service or application sluggish and causes chronic failures and performance stagnation, it generates technical debt. These systems are impossible to adapt to changing business needs and application requirements. In the aforementioned survey, two-thirds of respondents reported such discontinuity points multiple times over the prior decade. These profoundly disruptive incidents signify that a system's accumulated technical debt is no longer sustainable: It is past time to make fundamental changes.

In a 2018 Accenture report, two-thirds of respondents reported discontinuity points across 10 years. — Figure 3. Two-thirds of respondents reported discontinuity points across 10 years.

Measure technical debt

We can extend the financial debt metaphor to explain technical debt's multiple elements that contribute to how much it costs.

Principal. The cost to maintain outdated legacy systems and fix recurring system problems and hardware failures.
Interest. Costs that build up as IT staff spends an increasing amount of time on operational maintenance and repair of obsolete equipment. Perhaps the time to discover, diagnose and fix system problems on a legacy deployment took 10 hours of work a month last year, but has crept up to 16 hours of work this year.
Liabilities. Liabilities are indirect costs to business operations as legacy systems deteriorate with increasing frequency. Think application outages and interruptions in sales or processes.
Opportunity costs. An IT organization's investment in operational maintenance and break-fix troubleshooting deducts time and money from strategic initiatives and service development.

There isn't a universally applicable cost model for technical debt. But IT organizations should use these categories to identify and quantify the cost of deficient infrastructure within their organization.

Definition in hand, the next stage is to perform a complete inventory of IT infrastructure and assess each component's useful lifespan and fitness for new applications. Once again, this judgement is subjective from one IT organization to another. Some might only need systems capable of running a virtualization stack for legacy applications, while others, such as those developing deep learning software, need servers with the latest processors, GPUs and NVMe drives.

After a technical debt assessment, IT leaders might be in for sticker shock. In 2016, the federal government's CIO, Tony Scott, asked three of his top IT suppliers to estimate the cost to replace equipment that would reach end of life and lose its support services within the next three years. The total came to $7.5 billion.

To combat technical debt, develop a program of continual IT modernization, Scott said. Replace equipment before it is utterly obsolete. This tactic can increase capacity and lower costs dramatically.

Create a debt reduction plan

Technical debt management involves a series of activities:

Assess existing IT infrastructure.
Estimate support and maintenance costs.
Report on the state of technical debt.
Create a strategy and update schedule to address technical debt.
Maintain a standard for all IT infrastructure to prevent new problems.

Assess infrastructure regularly to track the useful life of an organization's system inventory and flag equipment that has become significantly deficient. A combination of IT professionals and finance team members should handle the auditing process.

Estimate failure rates, repair and maintenance costs and the administrative time spent per server or piece of network equipment, as adjusted for the type of system and its age. Incorporate data from application portfolio management systems to more accurately identify performance gaps and the costs to remediate via equipment upgrades or replacements.

Establish regular reports to IT and business executives that summarize technical debt audits and the implications of technically deficient systems in terms of costs, ability to support new applications and degraded user experience.

Create a strategic infrastructure plan -- and a schedule for updates -- to support preferred modes of application delivery. Your plan could include internal data center operation, use of cloud infrastructure (IaaS) and cloud applications (SaaS) and other approaches such as multisite container clusters.

Establish equipment standards for on-premises servers, switches and storage systems. Ensure all systems meet these standards. It will simplify hardware support, maintenance and vendor management.

The top three techniques from Federal IT organizations to manage and monitor technical debt include system tracking, debt reporting to the C-suite and assigning management to IT leadership. — Figure 4. Evaluate these recommended techniques from Federal IT organizations to monitor and manage IT technical debt.

Cloud infrastructure can play a critical role in technical debt reduction. Cloud services efficiently address infrastructure debt in three ways:

Eliminate major new capital expenditures.
Reduce operating costs.
Tie infrastructure spending to workload capacity and performance requirements.

Dig Deeper on Systems automation and orchestration

Part of: Tackle technical debt in DevOps orgs

Up Next

How to track -- and measure -- technical debt

Technical debt can grow without an organization's knowledge or control -- unless they address it at the start. Here are different methods and tools to track and manage it.

6 technical debt examples and how to solve them

Technical debt can be critical to an organization's success or failure -- and many don't even know it exists. Discover six examples of where it collects and how to eliminate it.

Can a loosely coupled architecture reduce technical debt?

While there are a lot of factors that impact technical debt, the level of coupling found in the underlying software architecture is often a primary culprit.

How to manage technical debt in IT organizations

"Debt" is a scary word even without "technical" before it. We can't help you with dollars and cents, but we can share what IT technical debt is and how to create a debt reduction plan.

Search Software Quality

7 essential macOS code editors
Learn about the top code editors for MacOS. Make your choice from the following list of code editors based on price, features, ...
Google adds Gemini CLI for GitHub Actions coding agent
The beta version of Google Gemini CLI for GitHub Actions starts simple and builds in security, but overall, the 'honeymoon phase'...
Scrum master certification exam questions and answers
Are you ready for the Scrum master certification exam? Test yourself on these 10 tough Scrum master exam questions and answers.

Search App Architecture

Insomnia vs. Postman: Comparing API management tools
Insomnia has a streamlined interface and focus. Postman has extensive features for end-to-end development. Choosing comes down to...
8 best practices for creating architecture decision records
An ADR is only as good as the record quality. Follow these best practices to establish a dependable ADR creation and maintenance ...
Refactor vs. rewrite: Deciding how to fix problem software
At some point, all developers must decide whether to refactor code or rewrite it. Base this choice on factors such as ...

Search Cloud Computing

Evaluating AIaaS providers: 6-point criteria for success
Is your organization pursuing innovative AI deployments that consistently achieve organizational goals and compliance? Consider ...
MELT away your cloud observability troubles with open source
In today's complex cloud environments, enterprises face a critical visibility challenge. Comprehensive observability isn't just a...
The cloud observability quiz: Are you monitoring or observing?
Ready to test your cloud observability expertise? Discover if you can distinguish between metrics, logs and traces while ...

Search AWS

Compare Datadog vs. New Relic for IT monitoring in 2024
Compare Datadog vs. New Relic capabilities including alerts, log management, incident management and more. Learn which tool is ...
AWS Control Tower aims to simplify multi-account management
Many organizations struggle to manage their vast collection of AWS accounts, but Control Tower can help. The service automates ...
Break down the Amazon EKS pricing model
There are several important variables within the Amazon EKS pricing model. Dig into the numbers to ensure you deploy the service ...

TheServerSide.com

Product backlog vs. sprint backlog: What's the difference?
The sprint backlog and product backlog are important elements of Scrum and essential to iterative and incremental development. ...
Acceptance criteria vs. definition of done: What's the difference?
Software teams must understand the important distinction between acceptance criteria and definition of done and how to use them ...
Spring, Quarkus or Jakarta EE? How to choose a Java framework
Choosing a Java framework is not about which one is best, it's about accepting their tradeoffs of stability, flexibility and ...

Search Data Center

The increasing concern of data center land acquisition
Data center land acquisition is increasing due to the growing demand for capacity and AI workloads. By 2030, facility areas are ...
Nvidia introduces entry-level RTX Pro GPU
The company's RTX Pro 6000 Blackwell Server Edition GPU and RTX Pro Server offer companies using smaller-scale enterprise ...
Server hardware guide: Architecture, products and management
Today's server platforms offer various options for SMBs and enterprise IT buyers; it's important to learn the essentials before ...

Close