KubeCon EU 2026: AI tightens the dev-prod loop

AI is reshaping platform engineering. At KubeCon EU 2026, vendors showcased autonomous troubleshooting, observability-driven agents and tools to manage shadow AI.

Development teams need platforms that help them build faster and more cost-efficiently than the competition.

Increasingly, platform teams are turning to AI to close the gap between what developers need and what infrastructure delivers. That means automating troubleshooting so incidents resolve in minutes instead of hours, governing model usage so AI adoption doesn't become a shadow IT problem, and giving AI agents the observability data they need to reason about production systems autonomously.

At KubeCon EU 2026 in Amsterdam, that's exactly what the vendor ecosystem was building toward in the platform layer, from observability and troubleshooting to feature flag management.

In the first half of our two-part KubeCon roundup series, we explore how vendors are using AI to close the feedback loop between production systems and developer action.

Autonomous troubleshooting: Four approaches, one direction

Developer platforms focus on closing the feedback loop between production performance and developer action. The faster a team can go from "something broke" to "here's what happened and how to fix it," the faster they ship. That only works when developers and operators share the same platform -- when observability, troubleshooting and deployment control exist on a single surface rather than in a chain of handoffs. Four vendors at KubeCon showed distinctly different approaches to making that happen.

From AI copilot to AI colleague: When the agent closes the investigation for you

Martin Mao, CEO at Chronosphere, described how their AI-guided troubleshooting platform has moved from single-threaded to parallel. The previous release of the Chronosphere platform would do one step at a time, have the human sort of look at that step, then move onto the next step. Now, it explores everything in the knowledge graph in parallel. "It's almost trying to go down multiple hypothesis paths in parallel, plus it can do all the analysis there for you and come back with a conclusion," Mao said. End users are much more accepting of getting the whole analysis at once, rather than stepping through it piece by piece, he added.

That's a meaningful shift from an AI copilot that needs constant prompting to something closer to an AI colleague that closes out investigations on its own. Since the Palo Alto Networks acquisition closed in January 2026 ($3.35 billion), this capability slots directly into Cortex AgentiX, Palo Alto's agentic SOC platform. The strategic play: close the detection/investigation/remediation loop autonomously, with Chronosphere providing the observability half of that equation. Together, this brings the enterprise closer to the formerly elusive vision of autonomous operations.

Taking cognitive load to zero with domain-specific agents

SUSE’s cloud-native developer platform received an upgrade of its Liz Rancher Prime AI assistant. Liz now has access to a crew of specialized agents responsible for observability, security, virtualization, fleet management and Linux configuration, with the goal of “taking cognitive load down almost to zero”, says Peter Smails, GM of Cloud Native Platforms at SUSE.

The specialization enables depth. The observability agent is built on StackState's -- the Dutch startup SUSE acquired in mid-2024 -- topology-aware engine, with a "Time Traveling Topology" that lets it compare system state before and during an incident, providing a grounded chain of evidence rather than a statistical guess. A generalist agent couldn't be that deeply grounded in every domain. The crew also supports the Model Context Protocol (MCP), so third-party tools can plug in without custom integration code.

When eBPF Meets OpenTelemetry: Giving coding agents a window into production

"Observability is turning into the operating grid for the kind of modern AI-native software development cycle," says Shahar Azalay, groundcover’s CEO. "groundcover can now delegate the coding agents' context for production, so you can build better."

The underlying data fusion is what makes this work. An eBPF sensor captures kernel-level telemetry automatically -- full request/response payloads, headers, query parameters -- while OpenTelemetry handles distributed tracing. The two streams get stitched through Kubernetes metadata, so an OTel trace gets enriched with actual HTTP payloads, user IDs, and cross-AZ indicators. On top of that combined data layer, agent mode lets developers "troubleshoot and build inside the platform" and "hotfix better with the actual context for production" -- all running natively in the customer's own cloud via their bring-your-own-cloud approach.

groundcover is betting that the observability platform becomes where coding agents go to understand what's actually happening in production.

From passive monitoring to active deployment control

Dynatrace is closing the feedback loop from a different angle entirely -- by making feature rollouts themselves observable. In January 2026, Dynatrace acquired DevCycle, a Toronto-based feature management platform built on the OpenFeature standard -- now a CNCF project -- and is integrating it directly into the observability platform. As a result, flag-evaluation events stream into Dynatrace in real time, so when a new feature causes a spike in errors or latency, the platform can pinpoint which specific toggle is driving the incident, without anyone digging through commits or deployment logs. That turns feature flags from a developer convenience into an observable runtime primitive.

The practical implication for development velocity is significant. Teams can roll out features progressively -- canary deployments, targeted user cohorts, percentage-based ramps -- with the observability platform watching for degradation and triggering automated rollbacks when things go wrong. In a world where AI-generated code is accelerating the pace of commits, that safety net becomes essential. You can ship faster because the platform catches problems before they reach everyone. Dynatrace is effectively converting itself from passive monitoring into an active control plane that governs not just what's happening in production, but what's allowed to stay there.

No more AI graveyard: bringing shadow AI into the enterprise fold

Autonomous troubleshooting is one thing, but platform teams face an equally urgent problem. Developers are already using AI models across the organization, and nobody has full visibility into what's being spent, what data is leaving the building, or which models are actually being used. At the same time, the cost of running these workloads is disconnected from the performance and security data that should inform optimization decisions. Let’s take a look at two vendors attacking this challenge.

Bringing the value of shadow AI to the enterprise

CAST AI CTO Leon Kuperman introduced Kimchi at KubeCon this year. "Kimchi is a new product that allows ML engineers and traditional engineers -- DevOps engineers -- to start using our open-source coding models and reasoning models in conjunction with higher-tier models like Opus or Codex," said Kuperman. The idea: route between 50+ models through a single OpenAI SDK-compatible API so you can "drastically increase your token usage without worrying about the costs because these tokens are literally a tenth of the price." For agentic coding workflows where, as Kuperman noted, "the amount of input tokens is something like 280 to 1 relative to output tokens," that model routing matters.

But the bigger story is governance. Engineers across most organizations are spinning up AI coding assistants with their own API keys and zero centralized visibility. Kimchi routes all of that through a single gateway with per-engineer, per-team and per-project attribution, with models deployable in the customer's own VPC. It's the infrastructure layer that turns shadow AI into managed AI -- giving platform teams visibility without blocking engineers from using the tools they want.

When cost, performance and security finally talk to each other

IBM brought what their portfolio product manager, Lorcan Cooke, called a response to "all the feedback that we're getting from the analysts, from our customers" -- an integrated experience across Concert, Instana, Kubecost and Turbonomic with "correlations across the products." IBM assembled these pieces through acquisitions (Instana in 2020, Turbonomic in 2021, Kubecost in September 2024), and the problem has always been that they operated as separate products with separate UIs.

What's shipping now is bidirectional integration. Turbonomic ingests Instana's application telemetry for cost-aware optimization with real performance context, Kubecost feeds real-time Kubernetes cost data into both and Concert correlates CVE exposure from Instana's topology maps with resilience assessments. They're also building agent observability into watsonx Orchestrate using OpenTelemetry standards -- positioning observability as the data backbone for agentic automation, not just monitoring.

Torsten Volk is principal analyst at Omdia covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.
Omdia is a division of Informa TechTarget. Its analysts have business relationships with technology vendors.

Dig Deeper on Application development and design