Opinion

KubeCon EU 2026: AI tightens the dev-prod loop

AI is reshaping platform engineering. At KubeCon EU 2026, vendors showcased autonomous troubleshooting, observability-driven agents and tools to manage shadow AI.

Torsten Volk

By

Torsten Volk, Principal Analyst

Published: 31 Mar 2026

Development teams need platforms that help them build faster and more cost-efficiently than the competition.

Increasingly, platform teams are turning to AI to close the gap between what developers need and what infrastructure delivers. That means automating troubleshooting so incidents resolve in minutes instead of hours, governing model usage so AI adoption doesn't become a shadow IT problem, and giving AI agents the observability data they need to reason about production systems autonomously.

At KubeCon EU 2026 in Amsterdam, that's exactly what the vendor ecosystem was building toward in the platform layer, from observability and troubleshooting to feature flag management.

In the first half of our two-part KubeCon roundup series, we explore how vendors are using AI to close the feedback loop between production systems and developer action.

Autonomous troubleshooting: Four approaches, one direction

Developer platforms focus on closing the feedback loop between production performance and developer action. The faster a team can go from "something broke" to "here's what happened and how to fix it," the faster they ship. That only works when developers and operators share the same platform -- when observability, troubleshooting and deployment control exist on a single surface rather than in a chain of handoffs. Four vendors at KubeCon showed distinctly different approaches to making that happen.

From AI copilot to AI colleague: When the agent closes the investigation for you

Martin Mao, CEO at Chronosphere, described how their AI-guided troubleshooting platform has moved from single-threaded to parallel. The previous release of the Chronosphere platform would do one step at a time, have the human sort of look at that step, then move onto the next step. Now, it explores everything in the knowledge graph in parallel. "It's almost trying to go down multiple hypothesis paths in parallel, plus it can do all the analysis there for you and come back with a conclusion," Mao said. End users are much more accepting of getting the whole analysis at once, rather than stepping through it piece by piece, he added.

Chronosphere CEO Martin Mao explains how AI-driven troubleshooting is evolving.

That's a meaningful shift from an AI copilot that needs constant prompting to something closer to an AI colleague that closes out investigations on its own. Since the Palo Alto Networks acquisition closed in January 2026 ($3.35 billion), this capability slots directly into Cortex AgentiX, Palo Alto's agentic SOC platform. The strategic play: close the detection/investigation/remediation loop autonomously, with Chronosphere providing the observability half of that equation. Together, this brings the enterprise closer to the formerly elusive vision of autonomous operations.

Taking cognitive load to zero with domain-specific agents

SUSE’s cloud-native developer platform received an upgrade of its Liz Rancher Prime AI assistant. Liz now has access to a crew of specialized agents responsible for observability, security, virtualization, fleet management and Linux configuration, with the goal of “taking cognitive load down almost to zero”, says Peter Smails, GM of Cloud Native Platforms at SUSE.

SUSE showcases its AI assistant, Liz, which uses domain-specific agents to reduce cognitive load.

The specialization enables depth. The observability agent is built on StackState's -- the Dutch startup SUSE acquired in mid-2024 -- topology-aware engine, with a "Time Traveling Topology" that lets it compare system state before and during an incident, providing a grounded chain of evidence rather than a statistical guess. A generalist agent couldn't be that deeply grounded in every domain. The crew also supports the Model Context Protocol (MCP), so third-party tools can plug in without custom integration code.

When eBPF Meets OpenTelemetry: Giving coding agents a window into production

"Observability is turning into the operating grid for the kind of modern AI-native software development cycle," says Shahar Azulay, Groundcover’s CEO. "Groundcover can now delegate the coding agents' context for production, so you can build better."

Groundcover demonstrates how combining eBPF and OpenTelemetry gives AI agents deeper visibility into production systems.

The underlying data fusion is what makes this work. An eBPF sensor captures kernel-level telemetry automatically -- full request/response payloads, headers, query parameters -- while OpenTelemetry handles distributed tracing. The two streams get stitched through Kubernetes metadata, so an OTel trace gets enriched with actual HTTP payloads, user IDs, and cross-AZ indicators. On top of that combined data layer, agent mode lets developers "troubleshoot and build inside the platform" and "hotfix better with the actual context for production" -- all running natively in the customer's own cloud via their bring-your-own-cloud approach.

Groundcover is betting that the observability platform becomes where coding agents go to understand what's actually happening in production.

From passive monitoring to active deployment control

Dynatrace is closing the feedback loop from a different angle entirely -- by making feature rollouts themselves observable. In January 2026, Dynatrace acquired DevCycle, a Toronto-based feature management platform built on the OpenFeature standard -- now a CNCF project -- and is integrating it directly into the observability platform. As a result, flag-evaluation events stream into Dynatrace in real time, so when a new feature causes a spike in errors or latency, the platform can pinpoint which specific toggle is driving the incident, without anyone digging through commits or deployment logs. That turns feature flags from a developer convenience into an observable runtime primitive.

The practical implication for development velocity is significant. Teams can roll out features progressively -- canary deployments, targeted user cohorts, percentage-based ramps -- with the observability platform watching for degradation and triggering automated rollbacks when things go wrong. In a world where AI-generated code is accelerating the pace of commits, that safety net becomes essential. You can ship faster because the platform catches problems before they reach everyone. Dynatrace is effectively converting itself from passive monitoring into an active control plane that governs not just what's happening in production, but what's allowed to stay there.

No more AI graveyard: bringing shadow AI into the enterprise fold

Autonomous troubleshooting is one thing, but platform teams face an equally urgent problem. Developers are already using AI models across the organization, and nobody has full visibility into what's being spent, what data is leaving the building, or which models are actually being used. At the same time, the cost of running these workloads is disconnected from the performance and security data that should inform optimization decisions. Let’s take a look at two vendors attacking this challenge.

Bringing the value of shadow AI to the enterprise

CAST AI CTO Leon Kuperman introduced Kimchi at KubeCon this year. "Kimchi is a new product that allows ML engineers and traditional engineers -- DevOps engineers -- to start using our open-source coding models and reasoning models in conjunction with higher-tier models like Opus or Codex," said Kuperman. The idea: route between 50+ models through a single OpenAI SDK-compatible API so you can "drastically increase your token usage without worrying about the costs because these tokens are literally a tenth of the price." For agentic coding workflows where, as Kuperman noted, "the amount of input tokens is something like 280 to 1 relative to output tokens," that model routing matters.

CAST AI introduces Kimchi, a model routing platform designed to bring visibility, cost control and governance to decentralized AI usage.

But the bigger story is governance. Engineers across most organizations are spinning up AI coding assistants with their own API keys and zero centralized visibility. Kimchi routes all of that through a single gateway with per-engineer, per-team and per-project attribution, with models deployable in the customer's own VPC. It's the infrastructure layer that turns shadow AI into managed AI -- giving platform teams visibility without blocking engineers from using the tools they want.

When cost, performance and security finally talk to each other

IBM brought what their portfolio product manager, Lorcan Cooke, called a response to "all the feedback that we're getting from the analysts, from our customers" -- an integrated experience across Concert, Instana, Kubecost and Turbonomic with "correlations across the products." IBM assembled these pieces through acquisitions (Instana in 2020, Turbonomic in 2021, Kubecost in September 2024), and the problem has always been that they operated as separate products with separate UIs.

IBM outlines its approach to unifying cost, performance and security data across its platform to support AI-driven optimization and observability.

What's shipping now is bidirectional integration. Turbonomic ingests Instana's application telemetry for cost-aware optimization with real performance context, Kubecost feeds real-time Kubernetes cost data into both and Concert correlates CVE exposure from Instana's topology maps with resilience assessments. They're also building agent observability into watsonx Orchestrate using OpenTelemetry standards -- positioning observability as the data backbone for agentic automation, not just monitoring.

Torsten Volk is principal analyst at Omdia covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.
Omdia is a division of Informa TechTarget. Its analysts have business relationships with technology vendors.

Dig Deeper on Application development and design

Search Software Quality

Comparing DevOps vs. Agile vs. Waterfall methodologies
DevOps, Agile and Waterfall offer distinct approaches to software delivery, varying in speed, flexibility, risk management and ...
Harness Artifact Registry strengthens supply chain governance
Harness makes its artifact registry generally available beyond early preview customers, with a security twist that could ...
AWS Kiro 'user error' reflects common AI coding review gap
Even internal AWS Kiro users haven't always peer-reviewed AI code output, as evidenced by a reported December outage that ...

Search Cloud Computing

Sneak Peek Q&A: Why AI governance breaks down in production -- and what comes next
Discover how industry thought leader Varun Raj helps businesses maintain robust AI governance frameworks across the complete ...
AWS launches FinOps agent, expands Bedrock cost tracking
At FinOps X 2026, AWS announced updates across FinOps tools, including an AI agent for cost analysis and new Bedrock attribution ...
A 4-step action plan to modernize legacy systems
By assessing legacy systems and prioritizing modernization, enterprises can transform old infrastructure into a modern digital ...

Search ITOperations

Secure IT infrastructure: A practical guide for IT leaders
This guide explains secure IT infrastructure, its core security pillars and how IT leaders can align investments with business ...
Top IT security challenges in modern infrastructures
Modern IT infrastructures face growing security challenges from AI-powered attacks, cloud misconfigurations, insider risks and ...
6 trends shaping IT automation in 2026 and beyond
Enterprises are expanding their use of automation in IT, where AI is changing the landscape with trends such as agentic workflows...

Search CIO

What enterprises are getting wrong about AI data readiness
AI adoption is accelerating -- so are failures. Without proper data governance, quality controls and infrastructure, even the ...
Why your IT metrics are burning out your team
Measuring IT success through speed and volume alone burns out staff and masks underlying issues. Focus on understanding the human...
Evaluating browser security in the AI search era
Traditional search returned links; AI search shares context. That shift creates new data leakage risks, pushing enterprises to ...

Search Enterprise AI

Bans on AI layoffs: Current laws and what might come next
An appellate court in China ruled that employers cannot cite AI as a reason for terminating employees. Is similar legislation ...
AI scaling is where most companies stall. Here's why
Expanding AI use requires new operating models, talent and mindsets -- not just more resources. Companies fail because they try ...
AI in law offices: How it's being used and the risks
Document discovery, court motions, patent protection, focus groups and client calls are tasks where AI saves law firms ...

Close