GenAI Ops and organizational hurdles to production
GenAI Ops is an emerging discipline every organization will have to face, further complicated by the emergence of AI agents, according to one industry expert.
Ready or not, here comes GenAI Ops, and it's about to put your organization's structure and operational strategy to the test.
IDC Senior Research Director Nancy Gohring refers to this comprehensive operational challenge as GenAI Operations (GenAI Ops) in this interview on the IT Ops Query podcast. GenAI Ops is distinct from MLOps, which covers the entire application lifecycle for machine learning workflows.
With GenAI, Gohring's focus is on the operations piece.
"Once we get this thing in production, what do we have to think about?" she said. "There's a lot to think about that's very different than the traditional non-GenAI application."
The organizational quagmire of GenAI operations
Unlike conventional applications, GenAI's large language models (LLM) present unique risks related to the quality of their output. Gohring noted that model drift can occur for various reasons, such as a data pipeline breaking, which "causes the model to skew its output." Recognizing this skewing often involves new personas, such as product managers, who will likely notice the degraded quality before an operations team does.
Thus, the deployment of GenAI is forcing organizations to reassess ownership and responsibility within IT. Gohring highlighted the shifting organizational boundaries required to manage these applications, noting the confusion over who owns which part of the performance stack.
"You have data scientists, potentially, you have AI engineers, and then you also probably have traditional developers. And so when it comes to operations, who does what?" Gohring asked.
She explained that while the traditional operations staff typically handles latency and error rates, data scientists and AI engineers are often involved in monitoring quality accuracy, and hallucination, topics that fall outside the traditional operations skill set. This requires a new, centralized approach to governance, security and granular FinOps, though she noted that she’s "not really seeing best practices that are very enshrined" in early adopter organizations just yet.
Controlling the agentic Wild West
Nancy Gohring
The GenAI ops problem is further complicated by the rapid emergence of AI agents, autonomous units that can execute multi-step actions using LLMs. Enterprises are being slowed down further here by concerns about a profound lack of insight into these agentic workflows, Gohring said.
"Is this agent going to keep working and working and hitting LLMs a million times, then I'm going to get this insane bill for something that's kind of a stupid task?" she asked.
Gohring explained that collecting data on what an agent is doing is the necessary first step to implementing cost controls and preventing runaway bills. She also noted that work is ongoing within groups like OpenTelemetry to extend the standard, "particularly around agentic and agentic workflows."
By instrumenting and controlling agents, organizations can apply automation controls to reduce these risks, such as routing low-value requests to cheaper LLMs or limiting the number of agent "turns" required to respond to a task.
The path to centralized AI observability
Gohring suggested that the market for GenAI observability tools may follow an evolutionary path similar to traditional observability, with consolidation eventually occurring between model evaluation tools and production monitoring platforms. She predicted a "very logical connection" between pre-production evaluation and production monitoring, especially as outputs from production are used to refine and re-test models.
For now, however, establishing an AI center of excellence is proving useful for organizations that need a centralized approach and an adaptable strategy to handle the rapid, fundamental changes in AI technology.
Editor's note: An editor used AI tools to aid in the generation of this article. Our expert editors always review and edit content before publishing.