your123 - stock.adobe.com

Kubernetes AI progress in 2025 and the road ahead

Presentations at KubeCon 2025 detailed efforts since last year's conference to enhance support for AI on Kubernetes platforms and previewed what's ahead.

ATLANTA -- Kubernetes AI prompted a bevy of updates and new projects within the Cloud Native Computing Foundation this year, designed to help platform engineers keep pace with the breakneck speed of AI development.

Growing connections between Kubernetes, other cloud-native infrastructure projects and AI were a prominent theme at KubeCon + CloudNativeCon North America 2025 this week. Major topics included the increasing use of cloud-native technology by back-end developers supporting AI engineers, the rapid adoption of open source AI tools, and efforts by the Cloud Native Computing Foundation (CNCF) to establish standards for Kubernetes AI.

Another thread of conversation followed up on requests from platform engineers at KubeCon 2024 for Kubernetes AI improvements, including smoother cluster upgrades, in-place node resizing, new resource scheduling for GPUs, and more sophisticated and reliable resource scheduling for framework orchestration.

"Kubernetes was following a typical adoption curve until a couple of years ago, and then there was this big plot twist of the AI age, and a couple years ago we were caught a little flat-footed," said Jago McLeod, director of engineering for Kubernetes and Google Kubernetes Engine (GKE) at Google, during a keynote presentation. "We were in the midst of a real transformation in this space."

Jago McLeod Google KubeCon keynote 2025.
Jago McLeod, director of engineering for Kubernetes and GKE at Google, presents on recent Kubernetes improvements during a KubeCon keynote.

K8s upgrades: 'Rollback is finally here'

One of the most common longstanding complaints among platform engineers is that Kubernetes cluster upgrades were too hard to manage and difficult to revert if something went wrong, a problem that intensified as generative AI workloads caused major infrastructure growth and required more frequent upgrades, according to McLeod.

A new approach to Kubernetes rollback for minor version upgrades, contributed upstream by Google and available in GKE, introduces a two-step process for upgrades. This new process preserves an emulated version of the previous control plane, making it easier to revert changes without disrupting services, according to a Google blog post. It will also support skipping upgrades, rather than requiring users to keep pace with Kubernetes' three yearly updates, McLeod said.

"This has taken literally a decade of effort," he said. "We knew early on that we needed to do this, and it was just really hard to pull off. So now rollback is really here."

DRA sets up improved node management

Another effort that gained momentum two years ago during KubeCon + CloudNativeCon North America, dynamic resource allocation (DRA), reached stable status in Kubernetes 1.34, released in August. DRA is part of efforts to change Kubernetes' "relationship with hardware," according to McLeod. Spurred by the need to more efficiently use expensive and scarce GPU resources for AI workloads than in CPU-based systems, DRA makes it possible for Kubernetes pods to share specialized hardware more flexibly and in smaller increments.

Another key initiative upstream for Kubernetes node management is in-place pod resize (IPPR), released in beta with version 1.33 in April. IPPR supports replacing CPU and memory resources assigned to containers without requiring a pod restart. In the past, distributed web applications commonly hosted on Kubernetes could tolerate such restarts; however, model training and inference workloads are much more sensitive to such disruptions.

Right now in Kubernetes, you get little to no warning of termination and a grace period of seconds, which is not fun, especially when you have a big job that's not going to finish in that grace period.
Lucy SweetEngineer, Uber

IPPR will serve as the basis for further advancements in resource scheduling for Kubernetes nodes, such as a vertical pod autoscaler, which will automate in-place pod resizing, according to Lucy Sweet, an engineer at Uber and chair of the Kubernetes Node Lifecycle Working Group, in an interview with Informa TechTarget.

"We've also been thinking about how we deal with disruption even after scheduling," Sweet said. "We have a new standard called eviction request that gives you guarantees on controlled termination. So if you're running a training job, you can checkpoint before you get terminated, and you get a warning. Right now in Kubernetes, you get little to no warning of termination and a grace period of seconds, which is not fun, especially when you have a big job that's not going to finish in that grace period."

Beyond the node: Framework-aware orchestration

Kubernetes maintainers have been working with members of high-performance computing (HPC) projects such as Slurm and Ray to make the Kubernetes scheduler aware of job dependencies and resource requirements for HPC workload orchestration frameworks.

"We're starting to see a shift from automation to autonomy [both] within Kubernetes and on top of Kubernetes," McLeod said. "This is really just the logical extension of the declarative API and decoupled controllers getting smarter."

In the longer term, two new CNCF sandbox projects, Kubernetes AI Toolchain Operator (KAITO) and KubeFleet, propose new ways to simplify running AI inference on Kubernetes and manage multiple clusters at global scale, according to a keynote presentation on Thursday by Jeremy Rickard, principal software engineer at Microsoft.

"KAITO is possibly the simplest way to run and serve AI inferencing in Kubernetes, [using] a workspace construct where you can pass on your model and then defer all the GPU provisioning tasks to its GPU provisioner," Rickard said. "You can also ground your model using fine-tuning and RAG [retrieval-augmented generation] capabilities and parameters that it provides."

Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on Systems automation and orchestration