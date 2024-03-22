SAN JOSE, Calif. -- Nvidia has won the business of the largest cloud providers with powerful GPUs to run their AI models and services. It's now heading downstream with a broad toolset and partner army focused on the enterprise data center.

This week, Nvidia GTC, the company's annual developer conference, attracted thousands of data scientists and electrical and computer engineers hoping to learn how to build, deploy and manage software unique to AI. Joining the technologists were customers and partners making deals and promoting an industry that analysts say will transform business.

Today's data centers of CPUs powering the servers that run business software will need to make room for infrastructure unique to generative AI models. Deploying and running the GenAI models will require new toolsets.

"General purpose computing has run out of steam," Nvidia CEO Jensen Huang said during his opening keynote here this week. "We need another way of doing computing."

Huang unveiled version 5 of the company's AI Enterprise platform with new technology that the executive described as a Nvidia inference microservice, or NIM. Together, the combined software simplifies the process of creating and developing GenAI applications that leverage Nvidia's CUDA parallel computing platform and programming model for the company's GPUs.

Analysts expect many enterprises to deploy small language models in-house so they can fine-tune them on corporate data without moving sensitive information to a public cloud. Also, running a model in the data center can sometimes be less expensive than the cloud.

Nvidia partners target enterprises Nvidia's NIM is helpful because it simplifies the process of regularly feeding real-world data to a trained model so it can make up-to-date responses, a process called inference. Having tools that automate processes related to models means traditional software engineers can do the job instead of hard-to-find AI experts, said Robin Bordoli, chief marketing officer at Weights and Biases, an AI model-training platform maker. Weights and Biases has integrated its software with Nvidia's inference engine so developers can do training and inferencing from a platform supporting 30 foundation models. Today, Weights and Biases has 1,000 customers, many of whom are government agencies and life sciences organizations, Bordoli said. "We're helping the next set of customers, enterprises," he said. "They're never going to build a model from scratch, but they want to take an existing model and fine-tune it on their enterprise data." Nvidia has built NIM to run as a container on Kubernetes, an open-source container orchestration platform familiar to enterprises, said Patrick McFadin, vice president of developer relations at DataStax, a provider of vector databases for AI applications. "What I noticed right off the bat is it's deployed using Kubernetes," McFadin said. "People who run infrastructure at large enterprises are using Kubernetes, so they've plugged themselves into that really nicely." Nvidia partner Dell Technologies offers a variety of PowerEdge servers with Nvidia's AI Enterprise software and GPUs, the H100 and the L40S. "What we're seeing from the majority of enterprises is to take off-the-shelf models, whether it's large models or small models, and combine them with proprietary enterprise data," said Varun Chhabra, senior vice president of infrastructure and telecom marketing at Dell. Dell believes Retrieval Augmented Generation (RAG) will be as important as inferencing within enterprises. RAG is an architecture that incorporates an information retrieval system to secure private data. "RAG is a big area of focus for us," Chhabra said. The most significant benefit of Nvidia NIMs is packaging many of the microservices needed for inferencing in a single container, Chhabra said. "It does that in a turnkey fashion." AI software and the accelerated computing needed to run it is changing the data center, Chhabra said. "It definitely feels like we're at an inflection point," he said. "That complete rebuild of the data center is coming."