Opinion

KubeCon EU 2026: Infrastructure catches up to AI

AI is reshaping infrastructure demands. At KubeCon EU 2026, vendors focused on Kubernetes abstraction, GPU management and automation to move AI from pilot to production.

Torsten Volk

By

Torsten Volk, Principal Analyst

Published: 02 Apr 2026

As AI-driven workflows move from experimentation into production, infrastructure is becoming a limiting factor.

Platform teams must now ensure that what infrastructure delivers -- from Kubernetes abstractions to GPU orchestration and workload isolation -- can support the scale and complexity of AI workloads.

At KubeCon EU 2026 in Amsterdam, vendors showcased what they were building across the infrastructure stack, from infrastructure-as-code and Kubernetes abstraction to GPU virtualization and bare metal automation.

In the first part of our KubeCon roundup, we covered observability agents and shadow AI. In this part we'll explore how infrastructure is evolving to support AI-driven development in production.

Making infrastructure disappear for developers

The tools that sit on top of infrastructure only work if the infrastructure itself is simple enough to consume. That's still not the case for most organizations -- platform teams spend too much time wiring up Kubernetes, writing IaC, patching base images and maintaining abstractions that leak. Three vendors at KubeCon were attacking this from different directions: making infrastructure generation AI-native, making Kubernetes opinionated enough that developers never touch it directly.

Infrastructure-as-Code meets AI-as-workflow

Spacelift launched Spacelift Intelligence, which their co-founder Marcin Wyszynski described as allowing you to "query infrastructure, refactor infrastructure, and build infrastructure at the speed in which you can type in your prompt." What makes this different from pasting Terraform into ChatGPT is context.

Spacelift introduces AI-driven workflows that allow teams to query, refactor and generate infrastructure using existing context and governance policies.

Spacelift already orchestrates Terraform, OpenTofu, Pulumi, Ansible, CloudFormation and custom scripts (bash, PowerShell), so Intelligence has awareness of your existing stacks, state files, modules and run history. The guardrail layer is equally important -- Open Policy Agent with Rego policies enforces rules at every stage. Those same policies apply to anything Intelligence generates. This is AI acting as a workflow participant inside the governance machinery, not a code generator sitting outside it.

Making Kubernetes invisible to the developer

AWS focused on abstracting away Kubernetes complexity to allow developers to build and deploy applications without having to learn the intricacies of Kubernetes itself. Their EKS Capabilities framework bundles managed Argo CD, KRO and ACK so platform teams can offer self-service deployment, resource composition, and AWS integration while application developers just consume the abstractions.

Sai Vennam, Principal Solutions Architect at AWS, describes the framework as "three capabilities that, working together, allow customers to deploy applications quicker."

Cedar takes this further by unifying authorization into a single auditable language and the Node Monitoring Agent automates health detection and repair. AWS is betting that developers shouldn't need to understand Kubernetes to benefit from it. Platform teams should handle that complexity so developers can focus on code.

The boring infrastructure bet that’s probably right

Canonical focuses on the less glamorous but equally critical problem of simplifying the developer experience to secure their containers, in order to reduce the surface of attack. They're now offering 10-year container support (extendable to 15 years) to enable customers to develop their own containers, optimize it to the bare minimum in terms of attack surface, then enjoy 10 years of support.

Cedric Gegout, VP of Product Management, highlights Canonical's approach to long-term container support and security.

For verticals like banking systems, industrials, or even telcos that want a cadence which is much slower, that's a game changer. Their chiseled containers strip Ubuntu packages down to only the file slices needed, producing minimal images still backed by Canonical's full CVE patching pipeline.

The timeline commitments are aggressive: 15-year Kubernetes LTS starting with 1.32 and 12-year LTS for any open-source Docker image. The AI angle is strategic: NVIDIA chose KubeCon to donate the GPU Dynamic Resource Allocation driver to the CNCF, with Canonical Kubernetes as the foundation. Being the boring, reliable, long-lived infrastructure layer underneath AI workloads is a bet that's probably right.

Getting AI pilots into production

Development teams can build the smartest AI applications in the world, but if the underlying infrastructure can't handle GPU optimization, multi-tenancy and workload isolation at enterprise scale, those applications never leave the lab. Several vendors are attacking this production-readiness gap from different directions.

The pilot worked. Now what?

"Last year was all about experimentation and a lot of people have found that you can't just rush in and expect to get results," said Paul Thompson from Spectro Cloud. Spectro is positioning Palette AI as the bridge from pilot to production, so organizations can start "making use of all this infrastructure they're buying" and achieve "the promise of AI in 2026."

On the sovereign cloud side, the challenges are specific: "GPU optimization and compute optimization, the policies that you have, the areas of service, and last but not least, the multi-tenancy challenges," said Thompson.

Spectro Cloud discusses the challenges of moving AI from pilot to production, including GPU optimization, multi-tenancy and infrastructure orchestration.

Palette AI went generally available in March 2026 and is now included in the NVIDIA Enterprise AI Factory validated design. They also announced integrations with Netris, WEKA, Aviz Networks and 6WIND in a single week at GTC, each addressing a layer Palette AI doesn't own natively.

The open-source off-ramp for GPU-powered AI

SUSE Virtualization -- the enterprise distribution of Harvester -- is systematically eliminating the reasons enterprises give for staying on VMware and increasingly framing that as an AI workload play.

The headline in version 1.7 is native NVIDIA MIG vGPU support: Harvester detects MIG-capable GPUs (A100, H100, H200) and partitions them into isolated instances with dedicated compute, memory, and cache -- hardware-level spatial partitioning, not time-slicing. That directly targets VMware's vSphere + NVIDIA AI Enterprise stack, which has been one of the more defensible reasons to stay.

VM Auto Balance -- the DRS equivalent -- continuously rebalances workloads. Live Storage Migration moves VM disks without downtime, and upgrades automatically live-migrate VMs off nodes before upgrading them. SUSE called it "the modern alternative for VMware off-ramping." For organizations that want to run AI workloads on open-source infrastructure without proprietary licensing surprises, the case is getting stronger.

GPU isolation for regulated AI: The case for keeping the hypervisor

VMware (by Broadcom), however, was at KubeCon too -- and they weren't playing defense.

Timmy Carr walked me through VMware Cloud Foundation's approach to GPU virtualization through vSphere Kubernetes Service (VKS). The key move: GPU slicing at the hypervisor level.

VCF takes NVIDIA MIG partitions, presents those slices through the vSphere hypervisor to VMs, and can slice the slice further for different tenants. This is hardware-enforced GPU isolation that bare metal Kubernetes with device plugins can't match with the same granularity. VKS 3.6 shipped with Kubernetes 1.35 support and NVIDIA AI Enterprise certification as the recommended path for DGX/HGX deployments. For enterprises running AI workloads where GPU isolation matters -- particularly regulated industries where workload separation isn't optional -- VCF's hypervisor-level partitioning is a capability the open-source alternatives haven't fully replicated yet.

Filling in the Kubernetes gap: Automating the physical hardware layer

None of the developer-facing gains -- autonomous troubleshooting, governed model usage, production-ready AI workloads -- matter if organizations can't efficiently provision and manage the physical infrastructure underneath. Development velocity stalls when platform teams are still racking servers by hand and migrating storage manually. Two vendors at the show are solving the layer below Kubernetes.

Bare metal lifecycle as a platform

vCluster Labs (formerly Loft Labs) launched vMetal, "born from the need to help neoclouds primarily, but also AI factories" to "take physical infrastructure and transform that into managed services," said Tom Brightbill, head of product at vCluster Labs. "There's a big amount of work between plugging hardware into a wall and offering that forward as a managed service that you can charge for, track and observe."

vCluster Labs demonstrates how automating bare metal provisioning and lifecycle management enables scalable, multi-tenant infrastructure for AI workloads.

vMetal automates bare metal discovery, OS provisioning and networking configuration, then feeds into vCluster's virtual control planes for multi-tenant Kubernetes on top. vCluster Certified Stacks -- validated by NVIDIA as Run:ai conformant -- complete the picture with GPU scheduling and workload controls. For sovereign cloud operators who can't just use a hyperscaler, it's "a more EKS-like experience" on independently owned infrastructure.

VM storage muscle memory, now in Kubernetes

Portworx (by Everpure) shipped Kube Datastore, which gives customers familiarity with the VM data stores that they were used to in the VM environment, but implemented in Kubernetes-native terms. Customers can do storage migration, vMotion, the kind of capabilities they were familiar with from VMware, now with Kubernetes concepts.

Prashant Rathi, Product Leader at Pure, showcases Kubernetes-native storage designed to replicate familiar VM capabilities and support AI workloads.

The key innovation is the distinction between static pools -- local NVMe with replication -- and dynamic pools -- shared storage with repl1 volumes that detach and reattach during failures. They also added granular VM protection -- file and directory level restores. The production numbers speak for themselves: "100,000 plus volumes in production as well as more than 30,000 VMs," said Rathi.

Those are not pilot numbers.

Takeaways

What KubeCon Amsterdam 2026 made clear is that the cloud-native ecosystem is making two moves simultaneously: embedding AI into the platform itself -- autonomous troubleshooting, AI-generated infrastructure, governed model routing -- and making the platform ready for AI workloads, from GPU partitioning to bare metal lifecycle automation. Both are required, and neither is sufficient on its own. The risk, as always, is complexity. Every new agent, gateway and integration layer adds another thing for platform teams to evaluate, integrate and maintain. The vendors who make the complex feel simple -- not by hiding it, but by handling it -- are the ones worth watching.

Torsten Volk is principal analyst at Omdia covering application modernization, cloud-native applications, DevOps, hybrid cloud and observability.
Omdia is a division of Informa TechTarget. Its analysts have business relationships with technology vendors.

Dig Deeper on Application development and design

Search Software Quality

Comparing DevOps vs. Agile vs. Waterfall methodologies
DevOps, Agile and Waterfall offer distinct approaches to software delivery, varying in speed, flexibility, risk management and ...
Harness Artifact Registry strengthens supply chain governance
Harness makes its artifact registry generally available beyond early preview customers, with a security twist that could ...
AWS Kiro 'user error' reflects common AI coding review gap
Even internal AWS Kiro users haven't always peer-reviewed AI code output, as evidenced by a reported December outage that ...

Search Cloud Computing

Sneak Peek Q&A: Why AI governance breaks down in production -- and what comes next
Discover how industry thought leader Varun Raj helps businesses maintain robust AI governance frameworks across the complete ...
AWS launches FinOps agent, expands Bedrock cost tracking
At FinOps X 2026, AWS announced updates across FinOps tools, including an AI agent for cost analysis and new Bedrock attribution ...
A 4-step action plan to modernize legacy systems
By assessing legacy systems and prioritizing modernization, enterprises can transform old infrastructure into a modern digital ...

Search ITOperations

Secure IT infrastructure: A practical guide for IT leaders
This guide explains secure IT infrastructure, its core security pillars and how IT leaders can align investments with business ...
Top IT security challenges in modern infrastructures
Modern IT infrastructures face growing security challenges from AI-powered attacks, cloud misconfigurations, insider risks and ...
6 trends shaping IT automation in 2026 and beyond
Enterprises are expanding their use of automation in IT, where AI is changing the landscape with trends such as agentic workflows...

Search CIO

Fully automated companies: What CIOs need to know
Fully automated companies might not be as futuristic as they seem. CIOs should consider how AI, robotics and sensing will work ...
Weekly news roundup: Microsoft layoffs, Muse Image controversy, xAI rebrand and Reddit ends AI slop
Stay up to date with the latest U.S. tech news, IPOs and executive moves shaping the industry each week.
What enterprises are getting wrong about AI data readiness
AI adoption is accelerating -- so are failures. Without proper data governance, quality controls and infrastructure, even the ...

Search Enterprise AI

Bans on AI layoffs: Current laws and what might come next
An appellate court in China ruled that employers cannot cite AI as a reason for terminating employees. Is similar legislation ...
AI scaling is where most companies stall. Here's why
Expanding AI use requires new operating models, talent and mindsets -- not just more resources. Companies fail because they try ...
AI in law offices: How it's being used and the risks
Document discovery, court motions, patent protection, focus groups and client calls are tasks where AI saves law firms ...

Close