00:03 Kurtis Bowman: Hello, my name is Kurtis Bowman, and I'm the president of the Gen-Z Consortium. I'm also a member of Dell's server CTO team. And today, I'm going to be talking to you about Gen-Z.
Let me go ahead and get started with the presentation. And what you're going to see here is, I'm going to be talking to you about the . . . Gen-Z is an ultra-high-speed interface for system-to-system communication, so let's jump right into what is Gen-Z fabric?
00:32 KB: So, the Gen-Z fabric is a modern memory semantic fabric, meaning -- I'm going to be talking on this fabric with typical read write load stores -- I don't need a driver to communicate between the CPU and the endpoint devices. And those endpoint devices are going to be in pools. They're going to be in memory or persistent memory, they're going to be accelerators, they're going to be NVMe drives. And what Gen-Z brings is true high performance, because it's got a very high performance, or high bandwidth, as well as low latency. It's got a scalable architecture, it's also highly reliable. It allows for multiple links and built-in re-tries, and then it has hardware-enforced security. And it also drops into your environment without requiring code changes, so you don't need a new OS, you don't need updated applications, and you can just start to use it right away. And then because Gen-Z is an open standard, it allows components to be made by multiple vendors, and it's easy for you to find what you need for your environment by shopping across multiple vendors.
01:48 KB: So, let's talk a little bit about the bandwidth. Right here, I'm doing a little bit of a comparison. If you look at DDR5, the typical bandwidth on a channel is about 50 gigabytes per second. Very respectable. If you look at PCIe Gen 5, that is going to be at 128 gigabytes per second. With Gen-Z, you're going to see that one port is 200 gigabytes per second, and that's using our current 50 Gigabit per second PHY. We're also working on a spec for 100 Gigabit per second PHY, and that will give you twice the bandwidth or 400 gigabytes per second. Now, you... When you think about it, when I want to go to my pools of memory, my pools of accelerators, I've got a lot more bandwidth than I might have in some of the more legacy interfaces.
02:37 KB: In addition, it comes with security. And being a modern fabric, it has security built in. We know that in front of mind for most people is, how do I make sure I'm in a secure environment? So, Gen-Z does a couple of things. It brings keys to the equation. The first one is an access key. If you don't have access to a particular pool of devices, let's call that memory here, then as my computer tries to access that pool, it actually gets stopped at the switch. It never even makes it to the component, because the switch knows I don't have access to that pool of memory.
03:16 KB: In other situations, I am going to have access to the pool of memory as I'm supposed to, but I don't have access to all of the memory. I only have access to my area, and so we have region keys that make it so that when I'm inside the pool, I can only access the memory I'm supposed to get to, not the memory of a different system. In addition to that, there are places where, because of multi-tenancy, you may want to also encrypt that data so that you've got a data protection at rest as well as on the fly. Gen-Z brings that out and allows for crypto signed packets to be used in the communication between the compute node and that endpoint like memory.
04:00 KB: And then finally, we try and stop the man-in-the-middle attack, and we do that by making sure that the packets have to come in order. In addition, they have timestamps, so that if you start to replay a packet that's old, it's going to come in with wrong timestamps and be kicked out as well. So, you can see that it's really important for security, and if you have a modern fabric like this, it's got security built in.
04:29 KB: Now, let's go back to our system. One of the things that's going on these days is innovation is occurring at a fantastic rate. It's something that's wonderful for our industry, but we're seeing new types of memory come out, particularly persistent memories, we're seeing new types of accelerators come out all the time, and we're seeing new types of storage come out. So while you have this system set up and you have your allocations of your different pools, as new memory comes out, you want to add that to your system, and you can do that is just adding another box. As you add another box to the system, you put that new memory in, and you're allowed to use that on one or more of your servers.
05:08 KB: Same thing would be true if new accelerators came out, you'd be able to put those in and be able to use those in your system. Now, one advantage of this is I'm also spreading my load out, and so in a lot of data centers that might be power-limited or cooling-limited in a particular rack or a particular row. I now can spread that out across my whole data center, and really put it to where I can get optimized cooling and optimized power to these components. And when I start to think about the heat that's coming out of your CPUs and the heat that we expect out of GPUs, being able to separate those really puts them into some nice environments where they both get fresh air and can help with their performance as well as their longevity.
06:00 KB: Now, let's talk about use cases. Four, in particular that I want to talk about are composability, shared memory, in-memory computing and then the memory tiers that are coming in the storage conversions that we see happening.
So, as we look at this slide, it's showing the composability where I can take CPUs, accelerators, networks, different types of memory, different types of storage, and combine them together to make bare-metal servers, depending on what I'm doing. You can see like the web server, very compact, uses very little compared to something like a database server, where I've got a large amount of compute as well, along with different types of memory. I've got DRAM, I've got persistent memory and I've got plenty of storage. It's all put together to help bring the required horsepower needed in the compute notes, the communication side, as well as the storage side.
07:04 KB: Now, imagine in the future, instead of me doing that, knowing what I need to do, I can be told by a VM what its requirements are, I could be told by a container environment, maybe Kubernetes is telling me, this is a kind of system I need built to deploy the workload. Now, that starts to be automated.
As we get better at this, what do we start to see? Well, first, we start to see servers that are right-sized. There's not going to be a whole lot of over-provisioning that occurs. And so, we're able to shrink the size of those servers, maybe shrink some of the energy usage to something that's needed because I know it's easy to grow, if I need to add more memory, more compute, more storage, more communication bandwidth, I can do that. Also unlocks those trapped resources that might be in a system, so if I currently buy a server and I bought it with a lot of storage, but I don't use it, I'm just starting to be able to maybe allocate that to another server.
08:08 KB: But if I have to do the same thing with memory, right now, I can't do it because I don't have any medium fast enough to get between the two servers. This allows for communication between systems that's fast enough to support sharing of memory between those systems, because I can purchase those resources independently, actually save money as well as I become much more agile. And then, I increase my reliability. When I think about maybe a server going down, I don't have all the components necessarily inside the server, so I can deploy another server to those resources and pick up where I left off.
08:51 KB: The pieces that I want to retire start to become more interesting because I could start to use those in something like a print server. And then finally, as I talked about, as technologies evolve, it's easy for me to add them into the system while I maybe am retiring some of the older versions.
All right, back into our Gen-Z environment. Let's talk about what it does for you, economically. I want to share my resources. And if I think about it in a large memory system, what does that mean to me? Well, let's say I'm doing something that requires maybe four terabytes per server. A large in-memory database would be a good example of that.
09:35 KB: To get enough memory to get four terabytes, I actually need to use 32 128 gigabyte DIMMs. And when I did a quick view on the internet, I found the cost of these to be about $32,000 just for the memory, and it's because these 128 gigabyte DIMMs aren't currently the sweet spot in our industry.
If instead I start to decompose the system and I put some memory inside the server and I put more in a memory pool, well, then what I can do is start to buy sweet spot DIMMs, so I buy 16 DIMMs for the server at 32 gigabytes a piece. I take the rest of the memory, I put it outside in a memory box, and now, I have 112 32 gig DIMMs out there. I've reduced my memory cost in half, so I'm down to about $16,000 in memory cost versus the 32,000. This is a great way to save money, plus it keeps you agile.
10:36 KB: Let's talk about in-memory compute. We all want to take advantage of the speed that we can get. And one of the things that we do right now is we tend to move data to the compute node and it's because that's what we've always done. What we want to do though, if we can leave that data all in one place and then have the actors come to it to transform it, then it starts to really decrease the amount of power I need to move data around, as well as the time that it takes me to get to results because I'm not moving that data around.
11:12 KB: In addition, I can use something like a CPU to curate the data. Then I can message over to my accelerator to say, "Okay, it's your turn, I've got the data formatted. Go at it, do your work."
When it's done, it messages back to me that it's completed its work and tells me where the results are, or tells me that it's ready for doing its next step, and we continue in that cycle where the data stays in one place and we very simply bring the actors to it. I think that's one of the . . . Really, as I look forward, one of the most important pieces we're going to see with Gen-Z is being able to leave your data in one place and have actors come to it.
12:00 KB: And then finally, the memory tiers. If you look at what we're doing today, we've got DRAM, and then maybe some little quick cache off on the side, and then we go right out to storage. What we think is in the future, what we're going to see is the fastest memory is going to be on package memory, now it has about the same latency as directly attached DRAM, but it has much, much higher bandwidth.
We're going to take advantage of that higher bandwidth, we're still going to have DRAM attached to the most systems, but then when we want to go out to our next layer or tier of memory, we're going to go across buses like Gen-Z to get to that memory, and be able to manipulate the data out of that space. And what I think we're going to see is OSes and applications start to change, so they understand the characteristics of the memory they're going to be at and they'll start doing . . . Keeping their execution code close, but putting their data a little further away, and what I expect to see is the very hot data will be an on-package memory or directly attached DRAM.
13:15 KB: The hot data will be in this Gen-Z attached memory, and then finally, the colder data will still be out on drives. And the reason for that is what you see at the top of this slide. DRAM is going to be at about 100 nanoseconds latency. This persistent memory is going to be out in the 200 to 300 nanosecond range, and that's still much, much faster than, say, going to something like flash or rotational media. And so, that ability to tier all the way out to my storage starts to make a real nice environment for software to take advantage of in, and OSes will learn how to use that.
14:01 KB: Now, let me tell you a little bit about what Gen-Z's doing in their proof-of-concept work. Well, we're a consortium that actually has the spec as our product, we want to make sure this stuff works for everybody, and so we've been working with our members to show what's possible. And last year, about this same time, what we showed was the ability to take servers with AMD, Intel and IBM processors, from companies like Dell, HPE and IBM, and put them into our Gen-Z fabric, going through a switch, and then going out to media boxes. And I'll get to the media boxes in just a minute, but it suffices to say, these boxes are where we can put large amounts of memory that I've been talking about and still allocate those to the different servers.
And so we did that at Super Computing and we're able to show very good results in the environment where we actually saw about a 5X improvement in the latency compared to going to NVMe, and we saw the ability to allocate memory to all these different servers through the Gen-Z fabric.
15:17 KB: And the big part of that was the media box. And so, let me talk to you a little bit about the media box. This is something that Gen-Z is actually offering up to its members for a development kit, and what it is, is it supports six of the ZMMs or the Gen-Z memory modules, that Smart Modular makes, and these modules come in 64, 128, and 256 gigabyte capacities that can go right into the media box, giving you a large amount of memory out there that can either be allocated to one server or assigned to multiple servers.
In addition, Smart did what they call a micro-development kit. One of the things we really want to do is be able to enable the industry and enable the software folks to do their work, enable things like in-band management and fabric managers to be developed in a fairly low cost environment. And so, this micro-development kit comes in a very low cost and still allows you to do the work you need to do for your software.
16:27 KB: And finally, we've got a new project that we're taking on under way, and that is to allow those accelerators to be plugged in. So we're calling this just a bunch of slots, and it allows for GPUs, FPGAs, NVMe drives to be plugged in and use a feature that Gen-Z has called Virtual PCIe -- that or our logical PCIe -- that allows you to put PCIe devices at the end of the Gen-Z link and then assign them back to the compute nodes. And this is really important because now I can start to disaggregate my whole environment and then compose it back up. We're really trying to prove here the last tenet of the Gen-Z piece, and data will show that we can allocate memory, we can allocate the PCIe devices across the Gen-Z fabric, making a fully decomposed or fully compostable environment possible.
17:29 KB: And then what's next? I always like to tell people, what's next? We are ready for the future. We've really thought through what the future is going to need, the kind of bandwidth, the kind of latency it's going to need, and a lot of that comes through from what I call, we gotta wrangle the data. Yeah. There's a lot of folks out there talking about the amount of data that will be available each year, it's somewhere in the 165 to 180 zettabyte per year range. With that, we need to make quick use of that data to get to our insights.
18:02 KB: Gen-Z helps you do that by putting the storage, putting the memory, putting the persistent memory where you need it, and then allowing compute to come to that area to do all the data analysis that's needed. In addition, Gen-Z treats everybody as a first-class citizen, so FPGAs, GPUs, CPUs can all get to the shared memory with the same rights levels. And by doing so, it makes it very easy for me to just simply message between those types of devices to tell them who's the next actor, what's the next thing to do.
And we see this is a real benefit because, as I mentioned before, it will reduce the power that you need to get to your results, and it will reduce the time you need to get your results. And we do believe that time to give results is the new measure that people will be using. So with a low latency, high bandwidth, flexible fabric, you're going to be able to get to your results faster, and that's what businesses are going to measure when they start to look at the types of fabrics that they need to put into their data centers going forward.
19:15 KB: Finally, we've got a very broad industry support here, and as you look at the members of the Gen-Z Consortium, you'll notice that we have vendors from all walks of life, everything we need to build up a full ecosystem from the CPU vendors to the accelerator vendors, the connector vendors, the silicon vendors, the ODMs, the OEMs, everybody's involved here, so that we can get an entire ecosystem up and running. And we actually see that occurring out in that timeframe of say, 23, 24 . . . Certainly, by the middle of the decade, you're going to see a lot of Gen-Z systems out in data centers.
20:02 KB: And, finally, one of the things we'd love to see is more members. If some of the stuff I talked about is interesting to you, I encourage you to go to our website, genzconsortium.org. Take a look. In addition to that, there's lots of educational material out there. We have a YouTube channel that you can go check out, we have a number of presentations that talk about many of the tenets that Gen-Z brings. In addition, you can access the full Gen-Z spec set that would allow you to look at it and decide if it's right for you or not. And then, if you want to keep up with what we're doing, follow us on LinkedIn, follow us on Twitter. We'd love to do that and with that I come to the Q&A session, so I'd love to see if there's any questions. If there's no questions today, then please drop by the Gen-Z booth. We've got a virtual booth here at Flash Memory Summit and we'd love to talk to you there.
20:58 KB: Thank you for your time.