Getty Images/iStockphoto

Enterprise IT awaits ripple effect from Nvidia Vera Rubin

Mainstream enterprises are unlikely to own an Nvidia Vera Rubin rack-scale system, but could feel its effect on cloud services, according to analysts and early AI adopters.

When Nvidia's Vera Rubin rack-scale AI inference system ships in late 2026, its impact on enterprise IT will be mostly indirect, industry watchers say -- but still potentially significant.

The Vera Rubin platform, unveiled this month at CES, is a rack-scale hardware and software system designed to support mixture-of-experts (MoE) AI inference. AI inference is a phase of the generative AI workflow in which a large language model (LLM) applies its understanding of training data to generate new output. Nvidia's Vera Rubin launch was hailed by industry experts as a sign that generative AI infrastructure has matured beyond a focus on the initial training of frontier models and into harnessing that training for fresh insights.

This transition will prompt AI infrastructure designs focused on efficiency for AI inference as well as performance, which was the overwhelming priority for GPU-based model training architectures, said Steven Dickens, CEO and principal analyst at HyperFrame Research.

"We're going to see a bifurcation in the market," Dickens said. "We're going to see the best of best chips needed for training. And then we're going to need the mass, from a GPU perspective, around inference, and you're going to see innovation on both ends of that spectrum."

The promise of efficiency…for whom?

The MoE approach is preferred for AI inference workloads because, unlike monolithic machine learning architectures, it can split work into fewer steps shared among multiple smaller LLMs to boost performance. Nvidia's Vera Rubin is specifically built for the performance characteristics of the MoE architecture, including a high demand for VRAM as MoE models must be loaded into memory. It also combines high-end GPUs with less expensive Arm-based CPUs.

The Vera Rubin NVL72 rack-scale system combines six new chips: the Rubin GPU, Vera CPU, ConnectX-9 SuperNIC, BlueField-4 data processing unit, NVLink 6 switch and Spectrum-6 Ethernet switch. It also incorporates networking and security hardware designs, as well as management software, to optimize the AI infrastructure stack. Nvidia claims the new system delivers AI inference up to 10 times more efficiently in terms of cost per token than its previous generation of GPU-based Blackwell systems.

To take advantage of Vera Rubin's advances, however, data center operators must also own advanced, pricey liquid cooling systems and have the expertise to integrate the new units alongside existing deployments. Nvidia and its cloud partners have not published pricing for Vera Rubin, but such systems can start at tens of millions and reach hundreds of millions of dollars in total cost of ownership, putting them outside the reach of most mainstream enterprises, said Naveen Chhabra, an analyst at Forrester Research.

"Cloud hyperscalers have the capital to make these investments, along with the Dells and HPEs of the world, that are making servers and selling them, as well as neoclouds using them to develop their own AI models and renting them to wholesale buyers, and large government procurements," Chhabra said. "The last [potential] early buyer, a very meager percentage, is …companies like Siemens and Mercedes-Benz that have significant investments in factory floor automation.

"I expect it to take at least a year and a half to two years before it can even hit, let's say, a reasonably sized retail store or a reasonably sized financial services company," Chhabra added. "It is not meant for the average enterprise, at least as of today."

Nvidia Vera Rubin platform
The Nvidia Vera Rubin platform is a massive stack of advanced hardware and software that's beyond the budget of most mainstream enterprises, analysts say.

The AI infrastructure ripple effect

Enterprise IT vendor Red Hat has a different view of the target audience for Nvidia Vera Rubin. The infrastructure management software vendor previewed fresh integrations between the Vera Rubin platform and its products, including a new version of Red Hat Enterprise Linux for Nvidia.

The company anticipates demand for Vera Rubin systems, delivered via Red Hat and its channel partners, from enterprise customers, said Red Hat CTO and senior vice president of global engineering Chris Wright during a press briefing call Jan 8.

Chris Wright, CTO and senior vice president of global engineering, Red HatChris Wright

"There are important large-scale enterprises that are … sourcing from a public cloud, whether that's a traditional hyperscaler or a neocloud, but also really focused on building their own infrastructure," Wright said.

Wright and Nvidia's vice president of enterprise AI products, Justin Boitano, also emphasized Nvidia's extended support for confidential computing infrastructure on its new Vera CPUs, as designed with security-conscious large enterprises in mind, including financial services companies. One joint customer of Red Hat, Dell and Nvidia -- the defense contractor Northrop Grumman -- described a large-scale rollout of Blackwell-based systems on-premises last year during a presentation at an OpenShift Commons event.

Microsoft Azure also issued its pitch this month to enterprises that might want to rent Vera Rubin capacity it runs in its data centers, touting its experience running previous generations of Nvidia hardware and supporting technologies in its data centers, from liquid cooling to object storage systems, in a blog post.

Early AI adopter hopes for reduced cloud outages

One early adopter of AI infrastructure, Verint Systems Inc., hopes for at least an indirect effect from cloud hosting providers deploying a more efficient system for AI inference: better data center availability and reliability.

"We have experienced scaling issues across all the hyperscalers when it comes to model inference reliability," wrote Ian Beaver, chief data scientist at the contact center-as-a-service provider in Melville, N.Y., in an emailed statement to Informa TechTarget. "One of our bigger pain points is the smaller regions where cloud providers have not invested heavily in data centers, and we need to keep the data in-region to support our customers’ data residency requirements.  We see more service outages and interruptions in these smaller regions due to demand outstripping availability."

If these new Nvidia chips help solve the data center scaling problems we encounter today and can also decrease inference costs, then this is good news.
Ian BeaverChief data scientist, Verint

These issues have prompted Verint to build its own software abstraction layer in its AI platform to switch between models, cloud providers and its self-hosted infrastructure to accommodate such outages, Beaver said.

"We hope the availability of more efficient chips for inference will reduce outages, but as AWS and Google already have their own specialized inference chips and still have problems servicing model inference demand, I suspect it will take a while to see these scaling issues decrease," he added. "If these new Nvidia chips help solve the data center scaling problems we encounter today and can also decrease inference costs, then this is good news."

There's no guarantee that lower operational costs from a system such as Nvidia Vera Rubin will translate into lower prices for customers, but they could potentially get more throughput at the same price point as older generations of GPUs, said Larry Carvalho, independent consultant at RobustCloud.

"Higher throughput may reduce supply constraints on AI capabilities," Carvalho said. "Some of the less efficient older GPUs could relocate to areas with lower energy costs."

AI inference efficiency could also benefit enterprises if cloud providers' AI services can perform the same amount of inference in a shorter period of time, based on advances in hardware, according to Forrester's Chhabra.

"If a 10-year drug discovery and trial phase can be shrunk down to four to five years, it would be a huge win for a pharmaceutical company," he said.

Beth Pariseau, a senior news writer for Informa TechTarget, is an award-winning veteran of IT journalism covering DevOps. Have a tip? Email her or reach out @PariseauTT.

Dig Deeper on IT systems management and monitoring