Performance at Scale for Model Training SPI NAND Host-Side Error Correction
Guest Post

SmartNICs: The Key to High-Speed Converged Networks

The SmartNIC can improve both throughput and scalability, and can provide a variety of functions, including cybersecurity, storage functions such as deduplication and mirroring, and more. This is a discussion exploring the topic.

00:03 John Kim: Hello everyone, welcome to Flash Memory Summit, Session D-10 on SmartNICs. I'm your moderator for today, John Kim from Nvidia, and we have a great panel discussion here today to talk about SmartNICs, the key to high-speed converged networking.

So, our speakers today on the panel are Manish Muthal, vice president, data center group and general manager for cloud and enterprise acceleration division at Intel. And then we have Bob Doud, senior director of marketing at Pensando; followed by Eliot Rosen, marketing manager for compute and connectivity . . . Sorry, for the compute and connectivity business unit at Broadcom; and finally, Rob Davis, vice president of storage technology at Nvidia in the networking business unit.

So, let's get started with this panel. So, first up will be Manish, and then afterward, we'll take your questions and I think we'll have some great conversations here in this panel. Manish, please go ahead.

01:03 Manish Muthal: Thank you, John. Hello, everyone. It's great to be here today on this panel to share my thoughts and Intel's vision and strategy for SmartNICs. The fundamental premise of SmartNICs, as most of you know, is to accelerate key infrastructural services -- such as networking storage, security -- to help scale data center infrastructure to higher port speeds within virtualized and bare-metal environments, especially driven by the trend towards software-defined infrastructure, software-defined networking, software-defined storage, software-defined security and so on. All of these trends actually are adding up to significant load, as represented on the host CPU, and by being able to accelerate that off the whole CPU, you effectively are driving increased performance or TCO for these data centers and helping them scale. SmartNICs are thus intended to drive increased performance and network agility, enable new use cases like virtualized switching, disaggregated storage, disaggregated acceleration, while freeing up more CPU cores to scale application performance, thus effectively improving the performance for TCO.

02:19 MM: Now at Intel, we are investing in developing and offering to our customers focused solutions for SmartNICs at the silicon level, with Xeon SOCs, with FPGAs that are optimized for SmartNICs with NIC controllers, as well as at the platform level, where we have development platforms and complete production platforms that are specifically optimized for data center and edge use cases within the cloud, both public and private cloud, and communication segments. We don't believe, at Intel, that one-size-fits-all approach really works for customers in this space, and as a result, we're offering customers programmable choice points around platforms that are modular, that are scalable, that are flexible and extensible. This enables the tailoring of the underlying hardware and the software to address customers' unique needs for infrastructure acceleration.

03:18 MM: Finally, the last piece of our strategy is the ecosystem. We are investing in growing the hardware and software ecosystem of IP, of adaptor, of system, of software partners and solutions partners that allow us to go scale our platform offerings to the broad market across multiple end segments, like cloud, communications and enterprise. Thank you.

03:43 JK: Manish, thank you. Great presentation, I'm sure we'll come back with a lot of questions for you afterward. So, next, we have Bob Doud of Pensando. Bob, if you're ready, please go ahead.

03:56 Bob Doud: All right, thank you, John, and welcome, everybody, to this panel. At Pensando, we're not just developing SmartNIC. We are developing an entire platform we call the distributed services platform. By distributing services, and as Manish said, things like storage, networking, security, visibility, offloads, essentially at the edge of every server, you get far superior scalability of the network. You get better visibility because you're deploying these services right at the server edge, very close to the applications, and you get excellent performance because you can offload functions that would otherwise be run on the server itself. And because you're essentially putting this kind of services capability at every server edge, the scalability is really remarkable. As you add more servers to the data center, you get more services commensurately.

05:00 BD: So, the platform that Pensando is rolling out consists of three elements. The first and the foundational element is the hardware, is the distributed services card shown on our slide here in the center, and basically, it's a card centered around an ASIC that we have developed that has a multicore ARM control plane together with a P4 programmable data pipeline engine that achieves wire speed performance at over 100 gig and can offload all the data plane services from the server. And then we also have targeted crypto and compression and deduplication engines on the device as well. On top of that hardware, we have deployed over 150 software engineers here at Pensando to actually write the applications they're running on this device, so this is part of the solution approach that we're taking.

05:54 BD: So, we are developing solutions like distributed firewall, storage functions like NVMe over Fabrics, NVME virtualization and so on, as well as visibility functions like ERSPAN and the ability to capture network data and transmit it back up to the last element of our platform, which is our centralized controller, the policy and services manager. So, the PSM is a central management controller, it's high availability, runs on multiple servers, it's a distributed microservices-based architecture using RESTful APIs and it essentially can manage the entire network of cards from lifecycle management, firmware updates, things like that, to deploying policy, capturing telemetry data in real time and so on. So, really with all these three elements, we're able to bring to market a complete turnkey solution that the customer does not have to necessarily do any development on.

06:55 BD: Now, we do have some customers, as an example, a storage target company is using our product embedded into their array where they do want to do their own custom software development, and that's something we do facilitate as well with the open programming environments, but much of our marketplace is served by a turnkey solution package that we offer. So, that's a quick summary of what we're doing here at Pensando, and we look forward to your questions at the end of these presentations. Thank you.

07:24 JK: Bob, thank you very much, that was also a great presentation, so now let's move on. We have Eliot Rosen of Broadcom. Let me go ahead and advance the slide.

07:35 Eliot Rosen: Thanks, John, and again, thanks for inviting us to this panel. So, let me start out with a kind of a basic overview of what we believe are the core tenets of a robust SmartNIC. So, one, it's server-class CPUs to enable efficient offloading of applications and host services. Two, it's high-performance NIC for data path to maximize packet-per-second throughput and minimize both the host and the SmartNIC CPU utilization. Third, it's additional offload engines to improve system-level performance. And then finally, a multi-tier approach to security protection.

OK, so let me just cover a little bit about our Stingray SmartNIC technology and how we cover the bases that I mentioned before. So, one, we offer a server-class ARM CPUs running at an industry-leading 3 GHz. We integrate our NetXtreme NIC IP, including our hardware-based TruFlow advanced packet processing engine. We add additional accelerator engines for things like storage and crypto. We provide interfaces like NVMe and VirtIO to simplify host driver management.

08:48 ER: And then we protect everything with our well-established security technology called BroadSafe, which is a silicon root of trust technology. And on top of that, since 2017, we've been shipping Stingray and we have over three years of real-world experience deploying into both hyperscaler cloud and also enterprise environments.

Just kind of briefly, just touch on our open platform and ecosystem view here, the key goal here is really to provide an open environment, with standard operating systems and tools to essentially speed time to market. The goal here is really to drive broad adoption of SmartNICs and enable third-party ecosystems. And then, finally, I just want to finish off my presentation, since this is FMS, and talk about storage, is to introduce a storage application that we think exemplifies what can be accomplished with SmartNIC technology and how that can be deployed to disrupt the market.

09:50 ER: So, I want to introduce here Nebulon and their cloud-defined storage. And so, Nebulon was actually founded by 3PAR and HP 3PAR execs -- their mission is to really simplify enterprise storage. And they do that by collapsing their entire enterprise data path services stack into essentially a storage array on a PCIe NIC. They call it their services processing unit or SPU, and that's based on our Stingray technology, as well as our SaaS IOC technology. A couple of things to point out is that this is completely OS- and hypervisor-agnostic, and what that means is that there's no dependency on the host at all, that the host essentially sees a SaaS interface and uses standard SaaS drivers.

The other thing, or the last thing here, just to point out is, what really brings us all together is that they have a technology called Nebulon ON, which simplifies the cloud-based management. And so, they automate that and they provide a full set of tools and APIs to manage all the storage that is within their network. And, essentially, we think that's a great example of how SmartNICs can be deployed in new and innovative ways. Thanks, John.

11:15 JK: Eliot, thank you. A very interesting example there. OK, our fourth panelist and last before we get into the questions is now Rob Davis of Nvidia. Rob, please go ahead.

11:31 Rob Davis: Thank you, John, and thanks to the Flash Memory Summit for putting together this panel with us. Hello, everybody. So, what is a DPU anyway? Everyone's been talking about SmartNICs. Is this supposed to be a SmartNIC panel? And it is, but like usual in high-tech with new technologies, there are multiple names for similar solutions. And Nvidia see the DPU as the companion to the CPU and GPU for accelerating secure data movement around the data center. Our DPU is based on ASIC BlueField. The major components are the ConnectX-4 IO, the ARM CPU, a PCI switch and multiple data accelerators. For Ethernet NICs greater than 10 gigabit, ConnectX has 70% market share with millions shipped every quarter. The ConnectX-D60X can do speeds from 10 to 200 gigabit Ethernet, and it can also do InfiniBand, the No. 1 networking technology for HPC and clustering Nvidia GPUs in a data center. So, the I/O is very solid. Then, on top of that, we add a multi-core 2.5 GHz ARM CPU and a PCI switch for application flexibility.

12:55 RD: Last are our accelerators, which are hardware engines for offloading the ARM or the whole CPU from cycle-intensive or low-latency-dependent functions like RDMA, NVMe and crypto, compression, VirtIO . . . dedup, CRC-64 and many more. Our customers interface to all this through Doka, it's a framework of interfaces which stay constant from generation to generation, and this is for hardware independence. So, like CUDA is for our GPUs, it enables the customer applications to easily migrate between BlueField generations. DPUs come in many flavors, single-chip server cards with variant core numbers, and configurations and form factors, including even OCP form factor, very small OCP form factor. Next build, John. And we even have a version called the BlueField-2X that has an integrated GPU on the card for applications like traffic shaping and steering and dynamic security orchestration, low abnormality detection and automated response. Thank you.

14:16 JK: Rob, thank you. So, a very interesting idea combining a DPU and a GPU. So, hey, great presentation, everyone. Great, so, the start. We do have some questions here, so let me start out with some questions, and I guess I'll go ahead and send each question to the person that I think makes most the sense to start with, and we'll hear what everyone thinks.

So, the first question is sort of a basic one, and it is . . . What is the definition of a SmartNIC? So, Manish, why don't we start with you? Maybe from your standpoint, what is the definition of a SmartNIC? Manish, you're still muted. If you could just unmute there.

14:52 MM: There you go. Yeah, so I think as Rob alluded to, there are a variety of terminology that is actually out there in the market with respect to SmartNICs. That is IPU, that is DPU, but fundamentally what it really comes down to is by definition, a SmartNIC is a network interface card with the smarts to accelerate host networking functions, but most importantly, is adaptable to both existing use cases and emerging use cases. What you've seen over the last couple of years is that the use cases that can be mapped onto SmartNICs are growing from simple networking functions, to more complex networking functions, to storage functions like NVMe front-end virtualization, to security functions like root of trust encryption to complete offloads of hypervisor to enable bare-metal use cases all the way going to disaggregated storage, disaggregated acceleration-type architectures to go enable the next-generation disaggregated data center.

15:55 MM: But, fundamentally, what it comes down to, as some of my panelists already talked to, is a NIC controller element and a programmable compute element, either in the form of an FPGA or in the form of an SoC with some built-in offloads that can deliver significantly better performance and agility at significantly improved TCO. Now, it's your turn, John. You unmute.

16:23 RD: You're muted, John.

16:24 JK: Manish, thank you. And I got caught on with the mute button as well. Would anyone else like to weigh in with a different or an add-on or a different opinion of what is a SmartNIC?

16:37 ER: I mean I'll jump in just for a minute here, John. So, and I think in general, we all have a little different interpretation of what a SmartNIC is, but to Manish's point, you're essentially trying to do a couple of things, right? You're trying to offload host services as applications become higher performance and then network demands become greater over time. You want to have some flexibility within the network side of the host server, so to speak, to provide additional CPU capabilities, in our opinion, and then in network function capabilities. So, essentially, it's to maximize, in many cases, kind of the real-estate footprint and the performance that you can get out of that as well as a network performance.

17:27 JK: Eliot, it makes sense. Thank you for adding that.

17:32 RD: I think, John, I would add one thing. I think SmartNICs DPUs from our perspective have come along because in reality -- and probably Manish isn't going to like this being from Intel -- but Moore's Law is basically ended and because of that, can't get faster and we have lots and lots of cores, which is one way to get them faster. But I think what DPUs do is offload the security, the data movement, the networking, the storage functions from those CPUs and let the CPUs use the cycles for the applications the customers buy them for.

18:15 MM: So, just to kind of add to that, Rob, right? The way we look at this from an Intel perspective is not necessarily whether Moore's Law is scaling or not, but more from a fundamental aspect of what is the best place to provide compute cycles. And, in most cases, we would all agree that the best place to provide compute is as close to the source of the data as possible. So, the same tenets that apply to, say, computational storage, where you want to move compute functions closer to storage, the same things apply to SmartNICs where you want to do the operations, compute operations, as close to the source of the data because it's most power efficient, it's most performance efficient, and hence, it's most TCO efficient to kind of do the performance, do the processing of certain capability near the source of the data. So, it's just kind of a different take on that.

19:04 RD: And I think that there's also the GPU piece of this, right? So, it's not only the CPU doing the applications now. GPUs are accelerating applications, and so I think DPUs kind of provide that middle ground of managing what the CPU is doing, what the DPU is doing, and making sure that the security and the data movement is all orchestrated correctly and securely for whichever one of those parts of the computer system of the data center is right for the particular application. John, you're still muted somehow.

19:49 JK: All right, very interesting, thank you. I always love a little spice in the panel discussion. Let's move on, however, to another question. Another question we have is, it's clear that SmartNICs can provide an offload benefit in the storage target, so, of course, I'm a storage guy, so that makes a lot of sense just to me. But what is the justification for having one in the initiator servers or in the regular servers or clients? So, maybe Bob from Pensando, let's start with you on that.

20:14 BD: Yeah, thanks, John. Yeah, it's pretty clear that there are multiple benefits to running some of these functions at the client side, the initiator in the case of a storage transaction. First of all, if you can do an operation like compression at the source, then of course you're using less bandwidth over the network, you're sending less bits on the wire, and that's naturally going to improve performance. It's going to clog up the network in the data center less, and it's also going to reduce latency because latency is going to be dependent on how many bits you're having to move across the wire both to and back again when you get acknowledgements and so on. So, that's one example.

When you do security, if you're doing encryption, for example, whether it's data at rest or data in flight. Of course, encrypting as close to the source is a tenant of good security, you want to protect the data for as much of its lifecycle as possible, so doing that closer to the applications is certainly a good thing.

21:11 BD: And then lastly, the idea of getting telemetry on storage data specifically to allow you to measure transactions and monitor lost packets and retries and things like that, right at the source is also a very useful feature for the network administrator. All of these things really argue for putting many services there at the initiator side. Obviously, the target can benefit from SmartNIC technology as well, doing similar offloads, compression, deduplication and so on, but it really is a play for both ends of the length.

21:50 MM: So, if I may add to Bob, I think there's an entire continuum of services, like you mentioned, from compression to encryption. Even stuff like key-value offloads, right? It can do key-value offloads on the initiator side and save a bunch of network traffic for object store, right? But if you think beyond that and think in terms of, for example, bare-metal environments, in bare-metal environments, you have no choice to be able to . . . And then offload storage device models to the SmartNIC to be able to offer bare-metal tenant services. So, that's another classic example of why you would want to have this capability on the initiator side.

22:25 RD: And one thing I would add to it is not only security from encryption, but security from isolate . . . Isolisat . . . Anyway, the word I'm trying to say you guys know is . . . And because of that, what the SmartNIC is able to do, especially through NVMe emulation and the I/O function being on the same card, is you can isolate a bare-metal solution, like a cloud solution from any type of security issues that occur within that server, even if it's a fault in the CPU that allows a server, a hole for a hacker to get in, because any I/O, whether it's storage or whether it's networking going in and out of that bare-metal machine, has to go through the SmartNIC, which is running software that's completely isolated and from the provider. Not from whatever operating system or whatever hypervisor or application might be loaded by the user.

23:26 ER: Yeah, I . . .

23:27 JK: So, I like . . . Oh, Eliot, I was just going to say . . . Go ahead, Eliot, go ahead and comment on it.

23:31 ER: Yeah. I was going to agree with that actually, because I was going to bring up that point. Offloading the services, all the storage services that Bob and Manish were talking about, are very important. But the other aspect is, especially in bare-metal environments, which are, seemed to be a predominant of what's going on right now. In that providing that secure root of trust, air gap isolation, by being able to essentially plug in and look like an NVMe device, and then you can do all of the storage services off of the back end is something that we see as well.

24:04 RD: And the one thing we haven't talked a lot about anyway, is the network acceleration for things like OVS. We have a technology called ASAP2 which allows you to offload the CPU from basically trying to figure out where all the packets need to go in a virtual system. So, there's a whole lot of different use cases, and I think the DPUs are in the perfect position to take them over.

24:33 JK: So, it sounds like while we don't all agree on exactly which services are best run at which times, we're all in agreement there's a good use case to run a SmartNIC both at the target and the initiator, so I definitely find that very encouraging. We have another question here, and this is maybe a . . . Actually, it's somewhat relevant to some of the things you guys just said in answering the last question, but it's, what trends are driving offloads to SmartNICs? Eliot, why don't we start with you on that one.

25:00 ER: Sure, so we started seeing this when we first started introducing our SmartNIC technology three years ago. And those trends really started in the cloud, in the hyperscalers, so I think Rob mentioned before OVS. So, that was a really big use case where when you start looking at some of these services like OVS, they'll consume two, four, six CPU cores. And, again, for hyperscalers, especially who are generating revenue off of those cores, you want to be able to offload those services -- that's a great area that a SmartNIC can fit into. And, but the requirements and the trends and what's happened with SmartNICs has become more pervasive than just through in the cloud environment. So, we've seen different levels of requirements for security offloading functions through guys who are providing CDN services, to enterprise companies who are looking to do end-to-end security within their environments. Again, storage offloads, NVMe over Fabrics is one.

26:04 ER: There's multiple others. I had mentioned Nebulon before, they're doing enterprise-class storage -- that's a good example of how you can offload all of those services and put those onto a SmartNIC. And there's many other examples, I'm sure the other guys have them as well. But yeah, we're seeing lots of use case trends, this kind of started in the cloud and is really evolving now into the enterprise world as well.

26:28 BD: Yeah, if I could interject, I think Eliot is bringing up both enterprise and cloud, and something that I think is a trend that is becoming more and more interesting is that enterprises, especially the larger ones that are running big data centers, they're kind of looking over their shoulder at the cloud guys and saying, "You know, they've got some really interesting ideas. They've kind of turned things upside down, they're getting rid of appliances, they're just doing everything as a server." And that's the cloud model: step and repeat, homogeneous array of compute, and then if you need certain networking functions, just run it on a server. And, so, the enterprise guys are essentially wanting to build a private cloud. They're wanting to build their networks to look like the cloud guys because it's more fungible and easy to deploy services and so on. And, so, the idea of SmartNICs in the servers just leverages on that whole concept of being able to deploy services like distributed firewall or encryption or storage functions on any server at any time and then repurpose them because these things are reprogrammable.

27:34 RD: All right, Eliot, Bob thank you.

27:35 ER: I would add to Bob's, John just quickly . . . I just got two minutes.

27:39 JK: Go ahead.

27:39 ER: OK, just real quickly. In the last two minutes I would add to what Bob said and that's from the enterprise perspective, I think what you see happening with VMware, and they're looking to move a lot of the functions, actually, maybe even their whole hypervisor onto the DPUs in order to provide, do basically a lot of the kind of things we just talked about them already. The CPU is becoming more of a resource than along with the GPU, so they can direct traffic depending on what their customers want to run on their VMs to the right CPU or the GPU, depending on the function they need. You see companies maybe that are doing hyper-converged like Nutanix, or even VMware, really is what they're doing, so thinking about using SmartNICs for offloading a lot of their functions that they provide as the base of their systems because what their customers really want to do is use those CPUs to run their applications, not to run their storage networks and their security networks, and their storage networks.

28:50 MM: Effectively, software-defined data centers that the SmartNICs become the orchestration point for managing and orchestrating all of the disaggregated resources within a software-defined data center, right?

29:04 BD: Exactly, Manish.

29:04 JK: Makes sense. Very good. OK, I'm going to try to squeeze in one more question here before we're out of time, that question is . . . So, Rob, you mentioned DPUs. I'm sure there are some people apparently wondering what is the difference between a SmartNIC and a DPU? So, Rob, if you can take that and we'll see if we can have time for multiple responses here.

29:19 RD: All right. I think I've kind of covered it. I think SmartNICs . . . A lot of the ConnectX products we ship are SmartNICs from the functionalities we've talked about like network acceleration, right? So, I think a DPU is a little bit more. I think our CEO in the talker this month said DPUs are going to represent one of the three major pillars of computing going forward: the CPU for general-purpose computing; the GPU for accelerated computing; and the DPU, which moves data around the data center, does the data processing -- and I think in one quick quote, that's the difference.

29:57 JK: OK, Rob, thank you. It looks like we are just hitting our 30-minute time limit, so I think this was a very exciting panel. Thank you all very much for contributing and for answering the questions. It's more lively than I expected, so that's always a good thing. So, with that everyone, thank you for all our viewers, thank you for watching this exciting panel discussion about SmartNIC acceleration, and we'll go ahead and sign off on this session, and I hope to see all of you at another session soon.

30:24 MM: Thank you.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center