Red Hat amps up open source AI infrastructure pitch
With a new AI Inference Server and a distributed inference open source project, Red Hat hopes to capitalize on the surging popularity of open source AI.
BOSTON -- As enterprises continue to grapple with AI ROI, Red Hat offered up fresh open source AI infrastructure updates this week as an alternative to costlier hosted proprietary services.
The vendor updated its Red Hat AI portfolio with a set of curated third-party large language models (LLMs), APIs for Model Context Protocol (MCP) and Llama Stack to facilitate AI agent development, and a new Red Hat AI Inference Server based on its acquisition of Neural Magic last year. The company also launched a new open source distributed inference project called llm-d, which integrates the vLLM project that underpins AI Inference Server with distributed compute clusters on Kubernetes.
Red Hat increased its infusion of upstream open source AI tools into its OpenShift AI Kubernetes platform with a tech preview of an optimized LLM catalog, support for distributed LLM training through a Kubeflow training operator and a tech preview of a centralized data repository for model training and inference based on Kubeflow Feast.
The news comes amid open source disruption in generative AI this year, including the introduction of DeepSeek, which shook up the AI market by drastically undercutting pricey proprietary models with support for standard hardware instead of GPUs.
But the surge in open source AI use preceded DeepSeek. According to a McKinsey & Company survey in December and January of 703 participants who have experience working with AI tech, more than 50% were using open source AI tools. Meta's Llama and Google's Gemma LLMs were the most popular of those tools at the time.
Red Hat's senior vice president and AI CTO, Brian Stevens, watched open source AI catch up with proprietary models firsthand as CEO of Neural Magic, which was founded in 2018 to bring generative AI to CPUs. Then, suddenly, the launch of OpenAI's ChatGPT in 2022 "flipped" proof-of-concept customers to proprietary models and tools, Stevens recalled during a keynote presentation at Red Hat Summit 2025's Community Day on Monday.
"The launch of ChatGPT changed everything, because what were open source transformers back then all of a sudden went closed source, and moreover … ChatGPT as an LLM was incredibly more powerful than anything that we had in open source at the time," Stevens said. "[It was] a little bit of a dark day."
However, in the three years since, open source AI has made a comeback, according to Stevens, in part because it supports user customization and control over data. The cost of proprietary models and associated hosted AI services have continued to confound enterprises' pursuit for AI ROI, another area where open source AI has become more competitive, Stevens said.
"[Proprietary vendors] talk a lot about licensing and prices based on rate limiting," he said. "When you serve your own [AI] infrastructure, you get to decide all this: rate limiting, [resource] quotas … and [control] your data."
Cost, data privacy and gravity pull AI inference away from cloud
Data privacy was where Red Hat's pitch resonated most for one conference attendee, although he has yet to settle on an AI infrastructure product.
"Data privacy is a huge thing for my company, being in healthcare, and control over intellectual property as well," said Nick Cassidy, lead innovation engineer and lead AI product developer at Stellarus, a health services business created as part of a Blue Shield of California restructuring in January. Cassidy emphasized in an interview during Red Hat Summit this week that his personal opinions do not necessarily reflect those of his employer.
"It's important that we figure out if we want to engage with a public model for some things and then private models for others -- there's a balancing act there," he said.
Cassidy said he was intrigued by the LLM compression features developed by Neural Magic, which are now part of the Red Hat AI Inferencing Server. These features convert big LLMs into smaller, more efficient versions optimized to run on hardware accelerators from Red Hat partners, while Red Hat handles the coordination between optimized models and partner hardware.
"That's pretty exciting, if they're able to cut costs, because cost is going to be major, especially with everything moving from on-premises to cloud-based servers," he said. "I'll be bringing that sort of research back to take a look at and see if it's something that we want to pursue."
Red Hat isn't unique in offering compression tools to make generative AI more efficient. OpenAI also offers model compression, while data management vendors such as Snowflake have introduced optimized models and data caching techniques to lower AI costs.
Cost will ultimately make or break the success of enterprise generative AI efforts, Cassidy predicted, because companies will be reluctant to invest in skills training for employees until they see a clear cost benefit.
In addition to cost and data privacy, industry analysts said Red Hat's AI infrastructure could find a niche in large enterprises with huge amounts of data on-premises that aren't feasible to migrate to the public cloud.
"Governments are doing a ton of massive LLM training on-premises, especially in bare metal and air-gapped systems," said Rob Strechay, an analyst at TheCube Research. "It's happening in the financial services industry as well, because they don't want to bring their intellectual property and petabytes and petabytes of data into the cloud just to train the models."
Initial model training for foundation LLMs required more resources than most companies could afford, but as LLMs give way to specialized small language models and organizations look to apply trained models to their data with an AI inference process, those workloads are more manageable on-premises, Strechay said. Thus, even AWS has options to integrate AI services such as Amazon Bedrock Agents with on-premises data via AWS Outposts and Local Zones.
"It makes more sense to bring the AI to the data versus the data to the AI," he said.
Of developers and AI agents
In a year in which most major enterprise IT vendors -- including Red Hat's parent company, IBM -- have been obsessed with AI agents, Strechay said Red Hat is so far "sticking to its knitting" in AI infrastructure.
"Red Hat is bringing in AI where it makes sense and trying to make AI easier to manage at scale," he said. "I don't think it's trying to be Amazon Bedrock or Google AI Studio, or any of these other developer services."
Meanwhile, a Red Hat spokesperson hinted in an email to Informa TechTarget last week that more will be revealed about AI agents later in the year.
"We are not currently announcing an agent builder for developers, but we are working on that area and will have more to share in the fall time frame," the spokesperson wrote.
In the meantime, this week's news includes some updates aimed at AI development: Red Hat Developer Hub, for example, is making common AI templates available, including chatbots, audio-to-text, object detection, code generation and retrieval-augmented generation within a new Advanced Developer Suite.
"These integrate with Red Hat OpenShift AI to provide an easy starting point for developers in these scenarios," according to the spokesperson. "Additionally, Podman Desktop is shipping Llama-stack support so developers can quickly start experimenting with Llama-stack locally."
There's something to be said for a pragmatic approach to AI agents amid this year's wave of hype, said Jim Mercer, an analyst at IDC.
"With LLM projects, when we had the 'Big Bang' two years ago, a lot of that failed simply because we abandoned all the rigor that we know we need to build technical solutions just to say we have something," Mercer said. "With Agile, we always said, 'We want to deliver things in smaller chunks and fail fast,' and it feels like Red Hat is focused on that 'start small' approach for building intelligent applications and agentic AI."
Stellarus' Cassidy said he's also interested in trying the new developer suite because of the AI application templates.
"I want to see how much it can accelerate our development process and give us speed to market," he said.
Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.