CNCF Kubernetes AI program faces scrutiny from IT analysts
CNCF is positioning Kubernetes as a standard infrastructure for AI, but analysts questioned the level of participation by one major AI player in its new conformance program.
ATLANTA – The Cloud Native Computing Foundation's new Certified Kubernetes AI Conformance Program calls for a set of standards to ensure portability, interoperability and reliability for AI workloads. However, officials acknowledged that the market must still mature before this vision is realized.
The program, modeled on the existing Certified Kubernetes Conformance Program, was launched in beta six months ago at KubeCon + CloudNativeCon Japan, and reached a version 1.0 release with an initial set of certified vendor participants here this week at KubeCon + CloudNativeCon North America. The initial set of vendors with certified products and major contributors to the project, listed on a slide shown during the day-one keynote presentation, included Akamai, Alibaba Cloud, AWS, Broadcom, CoreWeave, DaoCloud, Google Cloud, Kubermatic, Microsoft Azure, Nvidia, Oracle, Red Hat and SUSE.
Cloud Native Computing Foundation (CNCF) officials at KubeCon cited the success of the Certified Kubernetes Conformance Program, which began with 10 members and has expanded to more than 100, in their ambitions for the Kubernetes AI Conformance Program.
"You can't go to any major cloud service or private cloud option and not have Kubernetes conformant, stable and compatible across these things. … It basically brought the whole industry together," said Chris Aniszczyk, CTO of CNCF, during the keynote. "[Now,] AI is booming. What can we do to ensure that the stability and global cooperation that communities brought to [other] workloads, we can do for AI?"
The project's primary goals, according to its public documentation, are to "Simplify AI/ML on Kubernetes and accelerate adoption; guarantee interoperability and portability for AI workloads [and] enable ecosystem growth for AI tools on an industry-standard foundation."
CNCF and Linux Foundation leaders expounded on the convergence between cloud-native infrastructure and the delivery of generative AI workloads and the heady opportunity they now see emerging during a press conference here Tuesday.
"It's not just an opportunity for CNCF, but throughout the Linux Foundation, to get us to the next 'Linux moment' in AI, where we have a full stack of open source software housed at neutral foundations that are helping in [model] pre-training, training, post-training, inference and agents," said Jim Zemlin, executive director at the Linux Foundation, during opening remarks at the press session.
Citing open source projects that have emerged to orchestrate AI workloads such as vLLM, AIBrix and the emerging "PARK" stack comprised of PyTorch, AI, Ray and Kubernetes, Zemlin predicted that over the next three to five years, a standard open source framework will emerge for AI agents.
Over the next six months, Zemlin said, consensus will emerge in the open source community about standard AI agent communication protocols, such as Google's Agent2Agent, which was donated to the Linux Foundation in June.
The CNCF touted its projects' growing connection to AI workloads during KubeCon + CloudNativeCon North America 2025.
Whither Nvidia?
During the press conference, CNCF and Linux Foundation leaders fielded pointed questions from multiple attendees who asked why AI titan Nvidia wasn't a more prominent presence at the launch of the new conformance program.
"Obviously, Nvidia is the center of gravity for all this," said Steven Dickens, CEO at HyperFrame Research. "What's your plan on pulling them into the Conformance Program and making sure that the interface points in technology, such as Google, are there?"
Aniszczyk answered that Nvidia reps have participated in AI Conformance Program meetings, but that it doesn't have a Kubernetes-as-a-Service product similar to those being certified by the program. Nvidia has been a CNCF member since 2018 and has contributed to projects such as Kubernetes dynamic resource allocation, support for which is among the first mandatory requirements of the Certified Kubernetes AI Conformance Program.
Still, projects mentioned by CNCF leaders as examples of emerging open source standards for AI inference workloads, including vLLM and AIBrix, must integrate with Nvidia's proprietary Compute Unified Device Architecture (CUDA) parallel computing framework. This is necessary to get these systems to work with Nvidia's GPU hardware, a market in which the company dominates.
Alternatives to GPUs such as Google's TPUs have gained some steam in the past year, including recent integrations with vLLM and a cloud partnership with Vast Data, in which Nvidia is an investor, announced this week. However, Nvidia has achieved a $5 trillion market valuation and secured deals worth tens of billions to supply its GPUs to major cloud providers and AI companies, including AWS and OpenAI.
Chris Lamb, Nvidia’s vice president of GPU computing software, made a keynote appearance at KubeCon + CloudNativeCon 2024, and Bob Wise, now vice president of engineering and operations at Nvidia's DGX Cloud, represented AWS on the CNCF governing board from 2020 to 2022.
But Nvidia President and CEO Jensen Huang, the public face of the company who has been a staple on the conference keynote circuit in recent years, hasn't appeared at any major CNCF event, Dickens countered.
"Turning up to meetings, some engineers contributing a bit of code, all great, fantastic," he said. "Nvidia's the ball game [in AI]. Are they going to be a tier one [player, with] Jensen on stage in a leather jacket at KubeCon?"
Potentially further muddying the waters, Nvidia has been more publicly specific about components it recently donated to open source Kubernetes-adjacent projects that aren't governed by CNCF, including Red Hat's llm-d in May.
Nvidia also publicized its own new open source Kubernetes API, Grove, this week during KubeCon, which it did not donate to CNCF. The project "defines the structure and lifecycle of single- and multi-node AI inference workloads, such as those deployed with NVIDIA Dynamo, while enabling them to scale efficiently in Kubernetes-based environments," according to a company blog post.
The risks of the 'Nvidia economy'
Other industry analysts among the press conference attendees asked whether the Linux Foundation and CNCF would take on breaking Nvidia's "stranglehold" on the AI market, the way Linux had taken on Microsoft's Windows and the CNCF had offered a strong open source alternative to proprietary middleware.
What are you doing to make the fundamental primitives for AI compute not owned by one company?
Stephen O'GradyPrincipal analyst, RedMonk
"I don't see a lot of enablement for open source alternatives to CUDA," said Stephen O'Grady, principal analyst and co-founder at RedMonk, during the press conference Q&A. "What are you doing to make the fundamental primitives for AI compute not owned by one company?"
In response, Zemlin made another analogy to the early days of public cloud dominance by AWS.
"If you look at market share now, it's fairly split -- you have Amazon, Microsoft, Alibaba, Oracle, Google -- it's a little bit more balanced," he said. "Any time you have a technical revolution like AI, it will be crazy, and then things diversify naturally over time."
Another attendee said in an interview after the press conference that he hopes a strong open source alternative to CUDA will emerge.
"If Nvidia keeps dominating as it is, there has to be an alternative," said Larry Carvalho, principal consultant at RobustCloud. "I've been reading that the US has become the Nvidia economy, with a $5 trillion market cap for one firm -- that's a massive amount of gravity on one company. People talk about data gravity. This is compute gravity."
Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT