Move from reactive to predictive cloud management with AI
Discover how AI transforms cloud management from reactive firefighting to predictive optimization. Learn executive strategies for AI-human partnerships.
AI is evolving from basic automation tools to intelligent partners that fundamentally transform cloud management. AI has evolved from improved algorithms for assessing events, shifting the balance from reactive to predictive cloud management.
Dominick Profico, CTO at Bridgenext, a digital consultancy, said that AI has been in cloud management for a long time. "Earlier approaches focused on pattern recognition through custom-trained models, using inference to drive predictions and risk management on top of your observability platforms, and rules-based approaches to resolving issues," he said.
New generative AI capabilities are creating new possibilities for cloud teams and helping to redefine the role of the cloud administrator in conjunction with AI.
Enterprises are starting to consider a new balance between human expertise and AI capabilities. This includes developing a strategy to embrace AI as a partner that reflects the strengths and limitations of current and emerging AI tools. A new breed of agentic tools is emerging that will further bolster AI capabilities, supporting more autonomous management, but will require new guardrails to ensure responsible use.
Khalid Khan, PhD, Partner and Americas Lead in the Digital & Analytics practice of Kearney, a global strategy and management consulting firm, stated that the public cloud has not delivered on its cost and resiliency promises, and on-premises alternatives have only become more expensive. That's why more companies are moving toward a hybrid converged cloud.
"AI in cloud management can make it smarter, more adaptive and cost-efficient," said Khan.
AI operations (AIOps) help to correlate multiple alerts to pinpoint and fix root causes.
Cloud management evolves from simple automation to intelligence.
This helps to reshape governance and risk management.
Early warning capabilities shift cloud management from reactive to predictive.
Cloud management copilots guide event detection and remediation.
Generative AI capabilities help summarize information from diverse sources and past experiences.
From reactive to predictive
Early cloud automation was reactive. It helped set basic rules, such as adding more servers after CPU usage hit a certain threshold. Blair Sammons, director of solutions Cloud and AI at Promevo, a cloud solutions provider, said, "One of the most significant milestones in cloud management has been the transition from simple automation to predictive, AI-driven intelligence."
One of the most significant milestones in cloud management has been the transition from simple automation to predictive, AI-driven intelligence.
Blair SammonsDirector of solutions Cloud and AI at Promevo
Innovations in machine learning models could predict future demand based on historical data, seasonality and other business factors. This enabled a more proactive approach, in which teams accurately predict resource needs and adjust to achieve cost savings and improved performance.
Proactive cost management
Khan is starting to see the emergence of AI-powered cost and performance management tools. These are helping to shift cloud cost management from manual and reactive to proactive and autonomous cost-tracking and orchestration.
For example, AI can analyze historical usage, latency, compliance needs and cost curves to recommend where workloads should run. As these machine learning models learn from human feedback, they can improve at shifting workloads when costs spike, adapting to changing compliance rules or predicting outages. An essential element lies in monitoring for policy drift, misconfigurations or compliance violations across providers.
"These policies, effectively the guardrails for the intelligent agents, first require human expertise and judgement to set the priorities, exceptions and policies," said Kahn.
AI agents and copilot integration improve operational efficiency
Randy Armknecht, a managing director and global cloud practice leader at business advisory firm Protiviti, has been working with AI agents that assist with managing or securing infrastructure, reducing costs and supporting change management planning. This can bolster individuals and teams that have more expertise in one cloud platform among many in a hybrid scenario. The productivity gains are noticeable when AI assistants can provide detailed guidance, complete with citations to official documentation, based on their environment and use-case context across any cloud.
Reducing the friction for talented teams to collect, organize and analyze information enables the team to reach decisions quickly and with confidence. The relatively recent ability of models to properly cite their statements has gone a long way toward earning the trust of engineering teams. "Trust in the data provided drives confidence in data-driven decision making," said Armknecht.
Balancing human and AI expertise
The right balance between human expertise and AI in cloud security requires a partnership. "AI is the tireless digital watchtower, but the human analyst provides the wisdom and final judgment," said Sammons.
AI is the tireless digital watchtower, but the human analyst provides the wisdom and final judgment.
Randy Armknecht A managing director and global cloud practice leader at Protiviti
Here, the AI can analyze vast amounts of data to find patterns across diverse signals from infrastructure logs and metrics. Humans are required to investigate this AI-powered synthesis and summaries while considering the broader context. This can improve the decision-making and actions taken by AI or humans alone.
Tim Beerman, the CTO at Ensono, a managed services provider and IT advisor, recommends several best practices for balancing human expertise with AI in cloud management to ensure reliability, ethics and effectiveness. Identify the best role for AI and humans based on current AI capabilities, current processes, and team expertise. Humans tend to be better at interpreting nuanced anomalies and strategic planning, while AI excels in routine tasks.
It's also important to look for more explainable AI models that are easy for humans to understand. When an AI tool makes a bad call, it should be easy for a human to override the decision and provide feedback to improve the AI. It's also important to consider best practices for upskilling teams on AI capabilities so that they can fine-tune them for their own workflows.
AI reshapes GRC
Armknecht stated that one of the most significant shifts has been the integration of AI into governance, risk management, and compliance (GRC) across cloud platforms. This enables smarter oversight. However, it is essential to keep humans in the loop for decisions that matter.
"AI should augment, not replace, human judgment," Armknecht said. His team has found success by embedding AI into workflows where it can surface insights. For example, using assessor guidance and regulatory language to provide draft remediation actions based on the context of a specific finding. Final decisions are left to experienced practitioners. This ensures accountability while accelerating response times.
Redefining the cloud admin
AIOps has been a seminal milestone in AI-powered cloud management, thanks to its ability to apply machine learning for anomaly detection. In the past, operations teams would have to sift through mountains of logs and metrics. This often only started after an issue had affected users. "The sheer volume of data in modern cloud environments has made that manual approach impossible," said Sammons.
AIOps has enhanced the ability to detect subtle patterns and anomalies that are invisible to traditional manual analysis and can indicate potential problems long before they affect users. It also gave rise to the concept of a self-healing cloud that repairs operations by automatically restarting services and rerouting traffic.
Sammons has found that the most effective practice is to view AI as a powerful assistant to the operations team. AI helps automate data analysis and automate responses for known issues. Human experts remain essential for interpreting AI-generated findings, particularly in complex and novel situations. These responses are used to train more capable AI systems and generate more effective analysis and responses.
The future of cloud management in an AI-enhanced world
Experts predict that the future of cloud management will be more agentic and autonomous. The agentic aspects will make it easy to gather and take action across a range of IT service management, security and cost analysis functions in a more cohesive way. The autonomous aspects will support more self-driving capabilities across increasingly complex tasks as the technology matures.
Innovations in agentic infrastructure are already enabling the interconnection of multiple information sources, including log data and alerts, user experience data, GRC tools and enterprise apps, as well as security and IT management tools, with generative AI systems.
For example, Beerman sees a growing role in using AI agents to analyze code and configuration settings for complex cloud infrastructure. One example is vulnerability detection, such as when Google's Big Sleep agent discovered a critical SQLite vulnerability in 2025, which was at risk of real-time exploitation.
"This marks a transition from reactive to predictive security, leveraging AI to analyze code and systems at scale faster than traditional methods," said Beerman. He expects AI agents to play a growing role in reducing detection times and potentially preventing breaches that could cost enterprises millions in downtime and recovery.
Sammons believes we are moving towards agentic cloud management where AI agents can not only predict needs, but also autonomously optimize configurations and identify security vulnerabilities. Improvements in AI autonomy will give humans more time to focus on high-level goals and guardrails, while autonomous agents focus on the complex, real-time analysis and actions required to meet those goals.
"AI agents will not only be able to detect threats but also take immediate action to contain and neutralize them," he said.
George Lawton is a journalist based in London. Over the last 30 years, he has written more than 3,000 stories about computers, communications, knowledge management, business, health and other areas that interest him.
Dig Deeper on Cloud infrastructure design and management