Getty Images/iStockphoto

Red Hat AI updates target mounting cost, sovereignty worries

Red Hat furthers its hybrid cloud AI push with Model-as-a-Service and sovereignty features amid growing enterprise concerns about ROI and geopolitical risk.

ATLANTA – Against a backdrop of increased regulation, agentic AI and AI inference growth present fresh hurdles for enterprises that want to hop on the AI bandwagon without running afoul of auditors or blowing through budgets.

A set of product updates here this week encapsulates Red Hat’s proposed answer to both concerns for enterprise customers: stop paying cloud providers for AI tokens and begin producing them yourself. It further proposes they do so using the Red Hat AI Enterprise platform, a collection of software products developed over the last two years, including the Red Hat AI Inference Server and OpenShift AI, which also ties in with the Red Hat AI Factory rack-scale system launched with Nvidia in February. Red Hat AI bundles open source large language and small language models with AI inference utilities such as vLLM and llm-d as a basis for platform engineers to deliver AI as a service to internal users.

"For most customers, inference costs really start where inference is being consumed, and that's in the public cloud AI services," said Joe Fernandes, vice president and general manager of Red Hat AI, during a press briefing on May 7. "But at scale, that becomes cost-prohibitive, and so we're talking to customers about providing other options by moving from being just a token consumer to being a token provider in their own self-managed environment."

New features for Red Hat AI 3.4, unveiled this week at Red Hat Summit, add agentic observability and security controls, AI inference management options, and support for delivering AI models as an internally managed platform service. These features include:

  • MLflow tracing for agentic workflow observability, including LLM calls, reasoning steps, tool execution, model responses and token usage.
  • Cryptographic identity management through SPIFFE/SPIRE that supports short-lived credentials and least privilege permissions for AI agents.
  • AI security and safety testing tools acquired with Chatterbox Labs in December.
  • Model-as-a-Service through a new AI gateway that integrates with identity management tools.
  • An evaluation hub and prompt registry that tracks and manages end users' AI experiments with LLMs, AI applications and agents.
  • AutoRAG and AutoML integrations that automate AI tasks such as data set retrieval and machine learning model development.
  • Request prioritization and speculative decoding support in Red Hat AI Inference that processes latency-sensitive workloads first and optimizes AI inference processing to reduce latency, respectively.
  • Red Hat AI Inference support for Kubernetes services other than Red Hat OpenShift, including CoreWeave and Azure.

Token provider or token consumer?

As enterprises move from generative AI assistants to autonomous AI agents, the industry's focus has shifted from AI model training to AI inference, in which trained models are applied to new data to generate original outputs. These shifts have worsened token costs, particularly for users of third-party services, as AI agents generate more voluminous and expensive output tokens in response to prompts than LLMs.

The result has been an explosion in token usage among cloud computing users. For example, the Amazon Bedrock inference service processed more tokens in the first quarter of 2026 than it did in all prior years combined, according to an April letter to shareholders by Amazon CEO Andy Jassy.

It's going to become a board-level thing … There's a huge budget fight in the wind here.
Rob Strechay,Analyst, TheCube Research

There are other ways to mitigate these costs, such as AI gateways, and some SaaS vendors have been rethinking performance and pricing for output tokens. But as enterprise AI expands in production, it will force a strategic reckoning for many companies, predicted Rob Strechay, an analyst at TheCube Research and Smuget Consulting.

"It's going to become a board-level thing where it's like, 'Are we really going to basically double our budget because now we have people using AI, or do we have to get rid of people to fund tokens?'" Strechay said. "There's a huge budget fight in the wind here."

Hyperscalers aren't standing still in the face of rising token usage: major cloud providers are also starting to offer AI inference services. These include AI infrastructure based on specialized chips that come at a lower cost than high-end GPUs, such as the Google TPU 8i, due to ship later this year, and Amazon EC2 Inf1 instances, which the cloud provider's website claims costs up to 70% less than GPU-based EC2 instances.

Red Hat competitor Broadcom also proposes an escape from cloud costs with its private AI platform based on VMware. Red Hat has stuck to its hybrid cloud stance, which could strike a middle ground between the expense and toil of managing on-premises AI infrastructure and the cost of renting public cloud AI services, according to Strechay.

"Red Hat is approaching it knowing enterprises are not going to do everything on-premises," Strechay said. "What it's going to do is optimize the infrastructure so it's more efficient, and buyers can start with it in AWS, Azure and Google. And it doesn't matter if they're using AMD, TPUs from Google or Nvidia GPUs -- pick your stack."

Red Hat vs. SUSE in sovereign AI

As generative and agentic AI use has proliferated over the last two years, regulatory frameworks such as the EU AI Act have emerged globally to govern its use. AI systems and associated data repositories are also subject to data privacy laws such as GDPR and concerns about general digital sovereignty.

Various countries have begun to view AI as a national security priority, launching sovereign AI development projects. Concern is growing in the EU and UK this year about US-owned companies' dominance of the cloud computing market, especially in light of regulations that could compel US-based cloud providers to hand over customer data to the government.

On the heels of this month's general availability for IBM's Sovereign Core, which is built on OpenShift, Red Hat beefed up sovereignty features in multiple products, including on-premises telemetry and localized software delivery that don't send monitoring data to or download updates from outside the sovereign boundary. Red Hat publicized new sovereign cloud deals with providers such as Core42, Datacom, Fujitsu and NxtGen, and previewed Red Hat AI Factory support for Nvidia Confidential Computing, which encrypts data in-use.

These updates constitute a de facto response to SUSE's AI Factory launch at SUSECON in April, which emphasized digital sovereignty, analysts said.

"Both SUSE and Red Hat seem to be iterating toward a pretty similar end state, but for sovereignty, there are many angles to address, specifically sovereign workload control, sovereign geopolitical control and sovereignty compliance," said Brent Ellis, an analyst at Forrester Research. "SUSE is leaning into all three categories; I think Red Hat is leaning into sovereign compliance."

Theoretically, cryptographic enclaves such as Nvidia Confidential Computing, combined with customer control over encryption keys provided with IBM Sovereign Core, could offer some protection against memory scraping as a means for US-based entities to access data in cloud environments, but it's not ironclad, Ellis said.

For some organizations in the EU, no amount of privacy tech would be enough to overcome the fact that Red Hat and its parent company, IBM, are US-based, Ellis said.

"Red Hat is seen as a US subsidiary of an iconic US company, and the four biggest hyperscalers are all US-based," Ellis said. "They want independence from that supply chain, and the shortest path is through infrastructure software like SUSE."

Beth Pariseau, senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism. Have a tip? Email her or connect on LinkedIn.

Dig Deeper on Systems automation and orchestration