6 IT operations management best practices you need to know
ITOM staff are vital to IT platform performance, but organizations must limit human interaction and use AI -- with caution -- and automation to keep everything running smoothly.
An IT operations management team is an essential part of an organization. The ITOM team's primary goal is to keep IT platforms running at peak performance and availability. If everything runs as it should, the organization shouldn't even notice the work that these teams perform. However, when things go wrong, ITOM teams must have processes in place to quickly rectify the problem.
IT platforms must be optimized, resilient, highly available and up to date on maintenance and updates to perform at their best. Increasingly, performing all these critical tasks manually is costly, wasteful and potentially dangerous. Although humans can easily identify and manage obvious problems, they still make errors. And any mistake can lead to catastrophic repercussions for business operations.
Thankfully, there are strategies teams can employ to reduce problems with their IT platforms or to remediate issues rapidly should they occur. Here are six best practices every ITOM team should follow.
1. Alignment with business priorities
IT isn't a separate business department; its purpose is to ensure the organization can respond to the changing market and carry out its business effectively and profitably. Therefore, ITOM activity must be driven by the organization's needs.
A common problem is ITOM staff often want to pursue glitzy new technologies or more technically interesting work. However, the manager must reel in these desires and ensure that the team follows best practices that align with the organization's priorities -- no matter how mundane such actions might seem.
It's also the ITOM team's responsibility to resist development teams' attempts to introduce technology for technology's sake. ITOM staff must be involved with upstream groups and decision-making to stay ahead of potential technology intrusions.
2. Judicious use of AI
With all the buzz around AI, it can be mistaken as the ultimate answer to everything. The technology is still in its early stages of maturity and can suffer from what are known as AI hallucinations, where it presents advice as fact when that isn't the case.
In ITOM, much of what is being sought is simple automation rather than AI: Many issues can be fixed by repeating actions that have been taken before. Using automation to ensure that the same steps are taken time after time means that the same outcomes should be guaranteed.
However, AI can now be used to preempt many issues. Using AI to constantly monitor the platform and identify potential problems enables it to help resolve issues, whether that's by triggering automatic remediation processes for a known issue or alerting administrators to a possible new issue for them to investigate. If it's evident that the problem could have been corrected using automation, human input can be removed for the next instance.
3. Automated system upgrades and patches
Automation is the silver bullet for any ITOM team. There should be no need for ITOM teams to manually upgrade or patch a platform. There's a plethora of well-proven tools available to automate these tasks.
Not only are automated workflows less error-prone, but they're also a faster and cheaper means of ensuring a platform remains optimized and secure. As AI becomes more intelligently integrated into these tools, ITOM teams should be aware of any AI features in the tools they use -- and whether they have proper safeguards against AI hallucinations.
4. IT help desk operation
Although there will always be a requirement for a technical help desk in an organization, such operations should still be as automated as possible. For example, when users must contact the help desk for common issues, such as a forgotten password, there should be an automated response to the query detailing steps to remediate the problem. This can help avoid any time or cost associated with a human intervention.
Although the desirable scenario involves automated issue resolution, this can often lead to user fatigue, as users struggle to identify such self-remediation. If the user can't resolve a problem within a few mouse clicks, refer them to a qualified help desk person who can apply human and AI skills to solve the problem.
5. Continuous monitoring and remediation
The move to a more hybrid workforce has enlarged the scope of the IT platform and introduced more attack surfaces for malicious actors. This requires rigorous monitoring across the whole environment, not just the enclosed and controlled enterprise network, to ensure that any technical issues on the platform are caught as early as possible and any potential malicious activity is identified.
Security information and event management tools, as well as other tools that identify issues such as DDoS and infiltration attacks, will help quell this danger. Teams can also perform deep packet inspections of incoming streams, email and web data to avoid phishing attacks and URL-based payloads. Again, AI can help turn such actions into highly preemptive rather than reactive actions.
With more employees working remotely and away from organizational safeguards, businesses should ensure that they implement a zero-trust approach to security.
6. A codified and practiced emergency response system
Despite all the processes in place, ITOM staff must accept that things will go wrong. The massive issues with Jaguar Land Rover, Marks & Spencer and many airports in 2025 have exemplified this. In such cases, what happens after the incident separates the good ITOM teams from the bad. If teams don't have sufficient plans, recovering from such issues can be a problematic and potentially catastrophic event that an organization might not survive.
ITOM teams must have complete plans and processes ready to kick in when issues occur. These plans must be tested regularly and updated to reflect changes in the technical world. Although introducing such matters into the live environment would result in pushback from other teams inside a business, ITOM should be able to isolate part of the overall platform as an air-locked environment where example issues such as complete power loss, malware intrusion or malicious actor attacks can be run. The team's response can then be thoroughly tested.
After the scenario runs, teams must evaluate their plans, identify any shortcomings and update their procedures to remedy any issues that arose during the simulation. Again, as AI capabilities continue to improve, ITOM teams must look to AI to ensure that the need for reactive responses is lowered and that, where they are required, the time to reassert business capability is minimized.
Clive Longbottom is an independent commentator on the impact of technology on organizations. He was a co-founder and service director at Quocirca.