Kit Wai Chan - Fotolia
How to manage technical debt in IT organizations
"Debt" is a scary word even without "technical" before it. We can't help you with dollars and cents, but we can share what IT technical debt is and how to create a debt reduction plan.
The IT industry uses a lot of metaphors: cloud to describe a shared computing utility; fabric to denote a network of interconnected nodes; even mouse for a rolling pointing device with a cord -- the metaphorical tail. And technical debt for a daunting accumulation of unresolved issues.
Used in IT, technical debt describes a situation in which the organization has neglected or under-invested in software, infrastructure, training or documentation. As a result, the team has accrued systemic deficiencies that it must rectify eventually.
What is IT technical debt?
The concept of technical debt originated in the software development world, typically in reference to applications where developers made sacrifices related to the overall system architecture and design, and instead jumped right into coding. For example, it is akin to building a house without having a complete set of blueprints. It sacrifices critical activity where there aren't obvious exterior signs of progress for the appearance of action.
Technical debt extends to IT infrastructure. An overall IT infrastructure architecture has been neglected to the point of obsolescence, with little to no investments in new hardware, software updates or security patches. It works, but it can easily cause problems, and probably could work much better.
Often called legacy infrastructure, these systems fall behind the state of modern technology; suffer mounting component failures with age; and eventually become bug-ridden security risks running outdated OSes and application software.
Technical debt accrues interest, similar to financial debt. The longer the debt is allowed to accumulate, the more costly it becomes to rectify. Each slapdash fix and ignored patch digs a deeper hole -- and makes climbing out increasingly difficult. Furthermore, a production environment burdened with technical debt requires that IT admins dedicate more and more time to operational troubleshooting, as the example in Figure 1 demonstrates. It steals time away from innovation and work on new services. More than 80% of the respondents to a 2018 Accenture survey of federal IT leaders agreed that technical debt limits their organization's ability to innovate, and increases its costs dramatically.
Examples of infrastructure technical debt
Technical debt can occur in every corner of IT infrastructure. Common examples include the following:
- Obsolete servers near their end-of-service lifecycle. These systems have an increasing frequency of hardware failures and struggle to run modern infrastructure stacks, such as the latest virtualization or container software.
- Gigabit top-of-rack network switches. These boxes are generations behind the latest 25 to 100 Gb Ethernet technology.
- A storage array on an old Fibre Channel storage area network that has become a performance bottleneck and lacks the flexibility to run modern applications. If the SAN cannot support object interfaces, Ethernet LANs, or non-volatile memory express (NVMe) drives, it is a source of technical debt.
Accrued IT infrastructure technical debt is unique to every organization, but there is a sure way to spot it. Catalog discontinuity points, Accenture's term for the limitations and decrepitude of mission-critical IT systems. If it makes the IT service or application sluggish and causes chronic failures and performance stagnation, it generates technical debt. These systems are impossible to adapt to changing business needs and application requirements. In the aforementioned survey, two-thirds of respondents reported such discontinuity points multiple times over the prior decade. These profoundly disruptive incidents signify that a system's accumulated technical debt is no longer sustainable: It is past time to make fundamental changes.
Measure technical debt
We can extend the financial debt metaphor to explain technical debt's multiple elements that contribute to how much it costs.
- Principal. The cost to maintain outdated legacy systems and fix recurring system problems and hardware failures.
- Interest. Costs that build up as IT staff spends an increasing amount of time on operational maintenance and repair of obsolete equipment. Perhaps the time to discover, diagnose and fix system problems on a legacy deployment took 10 hours of work a month last year, but has crept up to 16 hours of work this year.
- Liabilities. Liabilities are indirect costs to business operations as legacy systems deteriorate with increasing frequency. Think application outages and interruptions in sales or processes.
- Opportunity costs. IT's investment in operational maintenance and break-fix troubleshooting deducts time and money from strategic initiatives and service development.
There isn't a universally applicable cost model for technical debt. But IT organizations should use these categories to identify and quantify the cost of deficient infrastructure within their organization.
Definition in hand, the next stage is to perform a complete inventory of IT infrastructure and assess each component's useful lifespan and fitness for new applications. Once again, this judgement is subjective from one IT organization to another. Some might only need systems capable of running a virtualization stack for legacy applications, while others, such as those developing deep learning software, need servers with the latest processors, GPUs and NVMe drives.
After a technical debt assessment, IT leaders might be in for sticker shock. In 2016, the federal government's CIO, Tony Scott, asked three of his top IT suppliers to estimate the cost to replace equipment that would reach end of life and lose its support services within the next three years. The total came to $7.5 billion.
To combat technical debt, develop a program of continual IT modernization, Scott said. Replace equipment before it is utterly obsolete. This tactic can increase capacity and lower costs dramatically.
Create a debt reduction plan
Technical debt management involves a series of activities:
- Assess existing IT infrastructure.
- Estimate support and maintenance costs.
- Report on the state of technical debt.
- Create a strategy and update schedule to address technical debt.
- Maintain a standard for all IT infrastructure to prevent new problems.
Assess infrastructure regularly to track the useful life of an organization's system inventory and flag equipment that has become significantly deficient. A combination of IT professionals and finance team members should handle the auditing process.
Estimate failure rates, repair and maintenance costs and the administrative time spent per server or piece of network equipment, as adjusted for the type of system and its age. Incorporate data from application portfolio management systems to more accurately identify performance gaps and the costs to remediate via equipment upgrades or replacements.
Establish regular reports to IT and business executives that summarize technical debt audits and the implications of technically deficient systems in terms of costs, ability to support new applications and degraded user experience.
Create a strategic infrastructure plan -- and a schedule for updates -- to support preferred modes of application delivery. Your plan could include internal data center operation, use of cloud infrastructure (IaaS) and cloud applications (SaaS) and other approaches such as multisite container clusters.
Establish equipment standards for on-premises servers, switches and storage systems. Ensure all systems meet these standards. It will simplify hardware support, maintenance and vendor management.
Cloud infrastructure can play a critical role in technical debt reduction. Cloud services efficiently address infrastructure debt in three ways:
- Eliminate major new capital expenditures.
- Reduce operating costs.
- Tie infrastructure spending to workload capacity and performance requirements.