your123 - stock.adobe.com

News

Nvidia's AI supercomputer garners support of Microsoft, Google

Nvidia details plans to deliver multiple AI supercomputers this year using its new GH200 Superchip. The new system is capable of delivering 1 exaflop performance.

Ed Scannell

By

Ed Scannell, Freelancer

Published: 30 May 2023

Nvidia took aggressive steps to strengthen its position in the AI supercomputing market this week, outlining plans to deliver multiple systems by year's end that aid developers and users in creating and deploying AI-based applications faster.

The centerpiece of the announcements is a large-memory Nvidia DGX system using the company's new GH200 Grace Hopper Superchip, which is tightly coupled with Nvidia's NVLink Switch System. The new system is purpose-built to create next-generation models for generative AI language applications as well as recommender systems and data analytics workloads.

By coupling the NVLink interconnect technology with the NVLink Switch, the system is able to link up to 256 GH200 superchips and permit them to act as a single GPU. This lets the system provide 1 exaflop of performance and up to 144 terabytes of shared memory, or about 500 times more memory than the previous generation of DGX A100 GPUs.

Jensen Huang, Nvidia CEO

Jensen Huang

In his keynote at the Computex 2023 conference in Taiwan over the weekend, Nvidia CEO Jensen Huang said that generative AI, large language models and recommender systems are "the digital engines of the modern economy." He claimed machines such as the DGX GH200, with its added speed and networking capabilities, will "expand the frontiers of AI."

Jack Gold, an analyst at J. Gold Associates LLC, said, "The issue most supercomputers face is a lot of the compute limitations they have has to do with bandwidth restrictions. Communicating among many chips, or from chips to memory, can slow overall performance. So anything you can do to increase the bandwidth among all those connections can be a huge benefit to how your system is going to perform."

Nvidia said Google, Meta and Microsoft will be the first to have access to GH200, primarily to explore potentially new capabilities for generative AI workloads. AWS will offer the DGX GH200's design via the Nvidia MGX server specification, a modular reference architecture to help other manufacturers and cloud providers build as many as 100 server variations that support a range of AI-based high-performance computing and Omniverse applications.

Software bundled with the system includes Nvidia Base Command providing AI workflow management, enterprise-class cluster management, a number of libraries that help accelerate compute, storage and network infrastructure and other system software tuned to run AI-based workloads. Also included is Nvidia AI Enterprise, a software layer supplying developers and users with 100 frameworks, pretrained models and an assortment of development tools designed to simplify the deployment of AI applications into production environments.

"Adding essentially starter models can be a big deal for many companies that don't have the money to build their own [AI] models from scratch," Gold said. "That could take some shops months to train a large AI model, and the associated costs can be monstrous."

Nvidia announced a second AI supercomputer, Helios, made up of four DGX GH200 computers that will be used internally by Nvidia development and research teams. Each of the four GH200 systems will be connected to the Nvidia Quantum-2 InfiniBand network and used to train large AI models.

Adding essentially starter models can be a big deal for many companies that don't have the money to build their own [AI] models from scratch. That could take some shops months to train a large AI model, and the associated costs can be monstrous.

Jack GoldAnalyst, J. Gold Associates

"This system [Helios] is a research cluster that was built at the Center for Scientific Computing," said Ian Buck, vice president of accelerated computing at Nvidia. "It will be used for climate, weather material science for genetic research and other massively complicated complex scientific problems that are important for the National Computing Center of Taiwan."

Helios is expected to be up and running by the end of this year, Nvidia said.

Nvidia unveiled plans for a third AI supercomputer for Israel-based researchers. The system, called Israel-1, will deliver up to 8 exaflops of AI computing and will be partly operational by year's end, Nvidia said.

In a related announcement, Nvidia and SoftBank Corp. said they are working jointly on a new platform for generative AI as well as for 5G- and 6G-capable applications. The platform will use the MGX reference architecture and leverage Arm Neoverse-based GH200 Superchips.

SoftBank plans to roll out the platform to a new AI data center in Japan. SoftBank said it will build the upcoming data centers in concert with Nvidia to host AI applications and services on a multi-tenant common server platform.

Supermicro and Quanta Cloud Technology said they hope to be the first to market with systems based on the MGX design this August. Systems from each company will contain the GH200 Grace Hopper chip.

As Editor At Large with TechTarget's News Group, Ed Scannell is responsible for writing and reporting breaking news, news analysis and features focused on technology issues and trends affecting corporate IT professionals.

Tech News This Week 06-02-2023

18:29

Dig Deeper on Artificial intelligence platforms

Search Business Analytics

What makes an effective data science team structure?
Data science team structures vary in strength, and their success depends on how roles and leadership align with business goals to...
Synthetic data vs. real data for predictive analytics
Synthetic data helps simulate rare events and meet privacy compliance, while real data preserves natural variability needed to ...
7 predictive analytics skills to improve simulation modeling
Predictive analytics skills such as statistical analysis, data preprocessing and model evaluation can help data professionals ...

Search CIO

How to become a Web 3.0 developer: Required skills and guide
Becoming a Web 3.0 expert means mixing old and new skills.
Quantum computing technology pushes for IT advantage
Tech and funding issues remain. But work on error handling, an expanding software stack and the growth of quantum ecosystems are ...
Why digital literacy in the workplace is important
Some examples of digital literacy that are necessary for the contemporary workplace are knowing how to use Excel and generative ...

Search Data Management

Hadoop vs. Spark for modern data pipelines
Hadoop and Spark differ in architecture, performance, scalability, cost and deployment. They offer distinct strengths for modern ...
Informatica adds MCP support, spate of AI-fueled features
With Model Context Protocol helping standardize how enterprises develop and deploy agents, support for the open standard is ...
What is data lineage? Techniques, best practices and tools
Organizations can bolster data governance efforts by tracking the lineage of data in their systems. Get advice on how to do so ...

Search ERP

9 top ERP software picks for the retail industry
Some ERP software is better than others for companies that are in the retail industry and need certain functionality. Learn some ...
10 software requirements to look for in an MES
In production environments, a manufacturing execution system can help boost efficiency. Learn the required features that an MES ...
Learn benefits and challenges of CRM and ERP integration
Integration can be difficult because of technical challenges and organizational change. Learn the benefits and potential issues ...

Close