GPUs, CPUs and Storage: Bringing AL and ML to Data Centers Everywhere
Leaders from Nvidia discuss capitalizing on GPUs and new parallel programming models to make artificial intelligence and machine learning available everywhere.
00:39 [promo video playing] Speaker 1: I am an explorer, searching for the origins of our universe, and charting a safer path to other worlds. I am a helper, moving us forward one step at a time, and giving voice to every emotion.
01:22 Speaker 2 (promo video): I love you.
01:26 S1 (promo video): I am a healer, modeling the future of medicine and finding a needle in a haystack when every second counts. I am a visionary, uncovering masterpieces lost to the ages and finding new adventures in a galaxy far, far away. I am a builder, driving perfection in everything we create. I am even the narrator of the story you are watching. And the composer of the music. And when the world faces its greatest challenge, I give us the power to take it on together. I am AI.
03:37 Manuvir Das: Hello everyone and thank you for joining us at this Nvidia session at the Flash Memory Summit. We hope you were as inspired watching that video that we just played for you, as all of us at Nvidia are. This is why we come to work every day, it's our life's work. And the reason we're here with you today is because we would like to work together with you on advancing this work. There are two speakers from Nvidia with you today. My name is Manuvir Das, I work in enterprise computing.
04:10 Kevin Deierling: And I'm Kevin Deierling, I came to Nvidia through the Mellanox acquisition, and I run marketing for our networking business units, and we're excited to be here and talk about the intersection of AI, networking and storage.
04:30 MD: Thank you, Kevin. The reason we are so focused on AI at Nvidia is because of the life-changing things it can do. You can see some examples of that in the slide, whether it's the fight against the coronavirus, discovering new drugs, or empowering humans to do great things. If you think about what has changed AI in the last decade, it's really a scale of two kinds: a scale in the amount of compute that is available through GPUs to process as a large amount and a scale in the amount of data that is now available to actually train these models so that they can operate effectively.
05:12 MD: So, what this really means is that the effectiveness of AI is really dependent on data, access to that data from the compute. And as we all know, that's really a problem of storage and how we do storage effectively.
So, here's the problem. As the workload is running and processing this large amount of data, it's constantly being read from the data repository into the powerful compute, and then the results are being written back. And this is true in any kind of computation. You can see from the horizontal bar shown on this picture at the top, where you're running on a regular CPU, the way you arrange your algorithms is so that you can be efficient, most of the time is spent on the compute on the CPU and a little bit of time is spent on the I/O.
However, when you accelerate the compute part of the workload onto GPUs, you get the pattern that is shown on the bar below that, where you dramatically reduce the amount of time spent in the computation on the actual compute, which means that the bottleneck now becomes the storage I/O.
06:27 MD: Unfortunately, the problem is even worse, because beyond the fact of the compute getting compressed, if you think about modern application architectures, they create even more East/West traffic, based on whether you're scaling the application horizontally, you're sharding the application in some way so that it becomes embarrassingly parallel. And more and more now, with streaming data coming in at the edge and other places where a lot of data is flowing through the system. All of this together means that the pressure on the traffic, the I/O traffic, between servers and the data center, is really growing dramatically.
07:09 MD: So, as we have practiced AI at Nvidia, and the reality is Nvidia not just produces technology for AI, we are one of the largest consumers of AI ourselves. So, what we have realized is that we really must solve the I/O bottleneck in order for AI to flourish. And we've done this by creating a set of technologies that we call Magnum IO, that are all about how we move the bits around. How do we move bits from one GPU to another across servers and within a server? How do we move bits from GPUs in a server to storage that is outside the server? How do we use the network through which the bits are already flowing to piggyback some of the computation that needs to be done even as the bits are flowing through the network to dramatically reduce the amount of network traffic?
08:02 MD: So, here's a schematic of what Magnum IO looks like as a software platform. It's what all of our AI software stack is based on. But it really requires a lot of work with people who are involved in developing high performance storage. And that's why we're here talking to you today.
08:26 MD: I would like to stress a key point of realization that we've had as we've done this work, which is that the fundamental technology we're using to make I/O grow fast across the network in our AI workloads is RDMA, which as you all know, is a technology that has existed for quite some time. But all of our stack, everything we talk about Magnum IO is all based on the effective use of RDMA.
08:57 MD: Now, the really nice thing about RDMA is that it's an industry technology, it's not specific to AI in any way. And so, we can see that over time, the ecosystem around RDMA, the adoption of RDMA, in a variety of contexts, has grown quite dramatically. And this is what we can benefit from.
09:18 MD: So, we need your help. And that's why we're here today. Because in order for RDMA and the Magnum IO technologies to have the effect that we think they can continue to have, we need all of you to work with us on embracing these technologies within your own stacks, for storage and other things, to really make them fly.
We would like to stress that the focus should really be on how do we build performance storage at high scale. That's the unique problem that AI exposes. For a long time in storage, there's been a dichotomy between performance versus scale. But really, what we need here is the combination of both things. And then finally, the second part of this conversation today from Kevin is about the technology we've worked on for the DPU for which Nvidia has a piece of hardware called the BlueField. And this is really an opportunity... It is a place where you can do your innovation and place your software to get the best effect for I/O acceleration in the data center. And Kevin's going to be talking to you about that.
10:36 KD: Thank you, Manuvir. Indeed, the data center architecture needs to evolve to meet the requirements of artificial intelligent workloads. And really, there's a new element that forms the trinity with the CPU, the GPU, and now the DPU, or the data processing unit. Because each of these elements is good at certain workloads. The CPU is good at running applications, GPUs are good at running accelerated computing and massive parallelism. And the DPU is fantastic for the I/O tasks and managing the data. And that's what's important for this storage.
11:23 KD: So, the DPU actually changes the way that data center infrastructure works, because we can put that now, instead of running on the CPU, and run it on the DPU. So, things like software-defined networking, software-defined storage and software-defined security, as well as all of the management, can now run on the DPU.
So, what we do here is we offload, we accelerate, and we isolate all of these capabilities and run them on the DPU. And that works for virtual machines, that works for containers. And because we've freed up the CPU, we're actually able to run more application processing. So not only do we get the benefits of running faster, but we actually get more efficiency, because there's more CPU cores available for that application, because we've offloaded that to the DPU.
12:16 KD: At Nvidia, our DPU is the BlueField-2. And it really provides data center infrastructure on a chip. This is a massive device that has an incredible amount of performance, almost 7 billion transistors and embedded ARM processors and are 64 bit. It does 100 gigabits per second of IPsec. It does regular expression, video streaming. And most important for the storage community, it can do over 5 million on an NVMe IOPs. And if you look at the capabilities, the compute capacity of the DPU, if you try to do everything in software, it would take over 125 CPU cores to do what the DPU can do. So, an incredible amount of compute and data processing capabilities are offloaded from the CPU cores.
13:18 KD: We have all of this great DPU horsepower, but without software, of course, you can't take advantage of that. That's the other area where Nvidia is investing heavily. We've developed something called DOCA, which is our data-center-infrastructure-on-a-chip architecture. This is an entire software platform. It really supports all of the things we talked about, software-defined storage, security, networking, we have a tremendous number of accelerators and also management and telemetry, which are becoming more and more important.
What's vital here is that we're not asking partners like yourselves or our enterprise partners to change the way they do things. We're supporting VMware, we're supporting Common Analytics Framework. And really, everything is familiar with the way that you design and use your systems today. We do all of that, but now it's running on top of DOCA in the DPU. And there's lots of different partners here for security and telemetry, video streaming, load balancers and firewalls. We have lots of partners that we'll talk about, but really, we're not asking you to change anything. We're just going to make it run faster and more secure.
14:40 KD: The great thing here is that DOCA is actually a platform that will remain constant. Even as we progress with new devices like the BlueField-2X, the software platform will stay the same. You can leverage all of the investment that you've made on top of DOCA. With BlueField-2X, we combined the DPU with an Ampere GPU. And now we can bring to bear a tightly integrated system that has been validated and integrated with AI and data processing capabilities. We fully utilize GPUDirect and CUDA, which are the AI platforms that allow you to develop all sorts of applications across a range of different vertical business workloads. The investment that you make today on DOCA with our BlueField will carry forward with BlueField-2X and Bluefield-3 as we move forward.
15:46 KD: But not only are we committed to a long-term road map for our DPUs and a software application framework with DOCA, but also, we're investing heavily in our partners so that we can certify systems and enable a secure hybrid cloud for enterprises. These are just several of our partners that we're working with, who are validating systems that are using our BlueField-2, our ConnectX adapters, and our Ethernet InfiniBand switches, along with really, the integrated building block, which is EGx and HTx, that incorporate the Ampere and the networking. All of this is really being integrated and validated with our OEM partners. We invite more of our OEM partners to come work with us, validate these, and really enable all of the AI workloads and the data processing acceleration that we're able to achieve with the DPU. If you're not on this list, please come work with us. We're anxious to enable AI everywhere.
16:57 KD: Of course, storage is a critical part of enabling AI because of the massive data sets and the AI workloads that we need the efficient access to the storage and the data. Here, we're also validating GPUDirect storage partners. And so, we have growing ecosystem partners, we have development partners, as well as partners that were involved with testing today. Again, we'd like to work with more partners so that we can bring to bear all of the efficient access to data that's needed to put AI everywhere in the enterprise.
17:38 KD: With that, I'd like to close and say that to put AI everywhere, we need your help. We need the storage community to embrace RDMA and Magnum IO accelerations, and really to figure out how we can deliver AI performance at scale everywhere in the enterprise. Come work with us at Nvidia with our new DPUs, with our AI and our GPUs, our integrated platforms like EGx and HTx. For the storage community, we want to work with you to certify GPU direct storage, and for the server OEMs certified systems and incorporate these AI building blocks. The world is changing dramatically with AI, and we want you to be part of that transformation with us.
18:36 KD: Thank you very much for joining us today. We're excited to work with you to bring AI to every business because every business is becoming AI. For those of you that are interested to learn more, we have some links here for Magnum IO and for certified systems. We'd like to work with you to qualify those systems. For those of you that are here live, we're going to open this up now for some live Q and A. We have people that are experts in Magnum IO and in our network and in DPUs. Thank you so much for joining us and let's go change the world together and make AI everywhere.