NetApp is jumping into the AI storage market with a platform that combines OnTap-powered All Flash FAS arrays with...
NetApp AI OnTap Converged Infrastructure is a validated architecture combining NetApp's FAS A800 all-flash NVMe array for NFS storage with integrated Nvidia DGX-1 servers and graphical processing units (GPUs). NetApp said the reference design verified four DGX servers to one FAS A800, although customers can start with a 1-1 ratio and nondisruptively scale as needed.
"The audience for this type of architecture is data scientists," said Octavian Tanase, senior vice president of NetApp's Data OnTap system group, during a live webcast this week. "We want to make it simple for them. We want to eliminate complexity (and give) a solution that can be integrated and deployed with confidence, from the edge to the core to the cloud. We believe they will be able to adopt this with a lot of confidence."
The product is intended to help companies implement data analytics that bridges a core data center, edge computing and cloud environment, said Jim McHugh, a vice president and general manager at Nvidia Corp. He said Nvidia DGX processors build on the foundation of Nvidia Cuda GPUs developed for general-process computing.
"Every industry is figuring 'we need better insights,' but better insights means a new computing block," McHugh said. "Data is really the new source code. When you don't spend time writing all the features and going through QA, you're letting data drive the solutions. That takes an incredible amount of parallel computing."
The joint NetApp-Nvidia product reflects a surge in AI and machine learning, which requires scalable storage to ingest reams of data and highly powerful parallel processing to analyze it.
Capacity and scaling of NetApp AI OnTap
The NetApp FAS A800 system supports 30 TB NVMe SSDs with multistream write capabilities, scaling to 2 PB of raw capacity in a 2U shelf. The system scales from 364 TB in two nodes to 24 nodes and 74 PB. NetApp said a 24-node FAS A800 cluster delivers up to 300 gigabits per second of throughput and 11.4 million IOPS. It supports 100 Gigabit Ethernet and 32 Gbps Fibre Channel network connectivity.
The NetApp AI storage platform is tested to minimize deployment risks, the vendors said. A NetApp AI OnTap cluster can scale to multiple racks with additional network switches and storage controller pairs. The product integrates NetApp Data Fabric technologies to move AI data between edge, core and cloud environments, Tanase said.
NetApp AI OnTap is based on OnTap 9.4, which handles enterprise data management, protection and replication. Each DGX server packs eight Nvidia Tesla V100 GPUs, configured in a hybrid cube-mesh topology to use Nvidia's NVLink network transport as high-bandwidth, low-latency fabric. The design is intended to eliminate traffic bottlenecks that occur with PCIe-based interconnects.
DGX-1 servers support multimode clustering via Remote Direct Memory Access-capable fabrics.
Enterprises struggle to size, deploy AI projects
AI storage is a hot topic among enterprise customers, said Scott Webb, who manages the global storage practice at World Wide Technologies (WWT) in St. Louis, a NetApp technology partner.
"In our customer workshops, AI is now a main use case. Customers are trying to figure out the complexity. DGX and AI on a NetApp flash back end is a winning combination. It's not only the performance, but the ability for a customer to start small and [scale] as their use cases grow," Webb said.
John Woodall, a vice president of engineering at systems integrator Integrated Archive Systems, based in Palo Alto, Calif., cited NetApp Data Fabric as a key enabler for AI storage deployments.
"The speeds and feeds are very important in AI, but that becomes a game of leapfrog. With the Data Fabric, I think NetApp has been able to give customers more control about where to apply their assets," Woodall said.