AI observability: Why old monitoring fails in the GenAI era

Learning to work with AI agents requires a new approach to observability by IT orgs that offers a better understanding of how they work and when they drift.

One of the enterprise tech industry's most influential voices on all things AI called on IT pros to prepare to work with AI agents in the very near future, starting with changing how they think about observability.

The rapid shift of generative AI (GenAI) models from proof-of-concept into enterprise production is exposing critical flaws in traditional observability practices. According to independent analyst and Field CTO founder Andy Thurai, in a recent interview on the IT Ops Query podcast, the inherent differences between legacy IT systems and probabilistic AI models mean that new observability paradigms are urgently needed -- a need currently being met "poorly" across the industry.

Thurai explained that conventional IT systems are deterministic, which enables monitoring tools to reliably predict outcomes. AI models, however, are probabilistic, making the task of measuring response quality, identifying model drift and managing inference cost a far more complex challenge.

"Observability didn't exactly aid in the development of AI, but the development of AI requires a new kind of observability," Thurai said, noting that the "black box phenomenon" of these sophisticated models remains a central hurdle.

The liability of model drift

Andy Thurai, founder, Field CTO, and independent analystAndy Thurai

One of the most immediate and costly challenges for GenAI in production is model drift. Traditional machine learning (ML) models, built on fixed datasets, tend to decay slowly. But GenAI, which interacts with rapidly changing real-world data, can quickly become obsolete or inaccurate.

Thurai stressed that when a model is responsible for business decisions, accuracy is paramount. He used a pricing example to illustrate the financial risk: "If you are, let's say, putting a model out, and if the model is making business decisions for you, [and] you're thinking the company is saving money … there is a possibility, by your model screwing up, that it could cost you money. It becomes a liability."

This liability necessitates an observability layer that can accurately detect when a model begins to negatively skew business outcomes, triggering mandatory retraining or withdrawal from production.

Navigating the 'best-of-breed' approach

When asked how organizations are tackling these new observability requirements -- which include monitoring token usage, inference costs and data lineage -- Thurai offered a candid assessment.

"Short answer is poorly -- very, very poorly," he said, explaining that no single, unified product currently meets the need for comprehensive GenAI observability.

Instead, enterprises are relying on "best-of-breed" platforms, stitching together tools for data observability, infrastructure performance and ML model drift detection. He noted that while vendors are actively trying to encroach into each other's spaces, a quality, all-in-one platform has yet to materialize.

The rise of agent ops

Complexity is set to ramp up further with the introduction of AI agents -- autonomous units powered by LLMs that can take action on IT systems, such as solving incidents or automating remediation. This shifts the focus from simple AIOps to "Agent Ops," the monitoring and lifecycle management of the agents themselves.

It's dangerous if you're going to use AI and execute and implement [its] decision without validating that it could be dangerous.
Andy ThuraiFounder, Field CTO and independent analyst

AI agents, like the models they rely on, are prone to drifting or skewing, requiring new oversight into their orchestration and decision-making chains. Thurai noted that some companies are beginning to view these agents as "digital workers" -- a cheaper alternative to human staff.

"Your first choice should be hiring a digital worker, and if you're refusing it, explain why," he said of the growing corporate sentiment.

Ultimately, whether the job involves autonomous agents or general LLM outputs, Thurai concluded that caution remains the best policy in this nascent field.

"It's dangerous if you're going to use AI and execute and implement [its] decision without validating that it could be dangerous," he warned, underscoring the high stakes as enterprises race to use AI's potential while managing its complex, subjective risks.

Editor's note: An editor used AI tools to aid in the generation of this article. Our expert editors always review and edit content before publishing.

Dig Deeper on IT systems management and monitoring