NVMe Technology: Powering the Connected Universe
President of NVM Express Amber Huffman talks about fixing the memory and storage hierarchy with NVMe at the center, refactoring for the next decade of growth within NVMe, and more.
Download the presentation: NVM Express Technology: Powering the connected universe
00:00 Amber Huffman: I'm Amber Huffman, I'm a fellow and chief technologist of IP engineering at Intel and also president of NVM Express. So, I'm pleased to talk to you about NVMe today.
00:12 AH: So, there's four things I want to talk to you about today. The first is, how have we gone about fixing the memory and storage hierarchy with NVMe at the center? The second is on refactoring for the next decade of growth within NVMe. Then we're going to touch on architectural advancements in NVM Express that we've done in the past year, and then look forward to computational storage.
00:38 AH: So, if we look at fixing the memory and storage hierarchy. Now, you may recall, Intel has been talking about this for a long time, and so you've seen many charts that look like this or other variants. And so, what this chart shows you is, if you look at memory performance, it's lagging the speed increases that we are seeing in compute performance. And so, one of our key problems is, how do we really address that gap of the exponential rate that compute performance continues to grow whereas memory performance at a linear rate. And this is where some of the innovations with, for example, 3D XPoint have been pretty critical. Now, as we evolve also, graphics and AI workloads are some of the more memory-bound and starved workloads, and so there's never been a time where we have not needed this advancement more.
01:28 AH: So that's two years ago, let's look more now. So for more than a decade, Intel has been showing a version of this memory and storage hierarchy chart that you see here, and highlighting both the cost performance gap of what we see in storage, between tertiary storage often with hard drives today versus secondary storage with SSDs. And then certainly a storage performance gap and a capacity gap, so how do we address that storage performance gap with something like Optane SSDs, and how do we address that capacity gap as we move up to persistent memory and beyond, and just how are we building up through that hierarchy?
Now, we actually first envisioned NVM Express, I mean I remember back in 2006, envisioning NVMe as the high-performance SSD interface that would enable that birth of a new memory type, which became 3D XPoint. And to make the progress we have achieved a little bit simpler to follow, I think we need a more engineering chart, than a pyramid. And so that's where I wanted to show you this different way of looking at things as how we understand what progress we're making.
02:33 AH: So, in this chart, the top edge of each box, denotes performance or bandwidth that's delivered, and the right edge denotes capacity. And so, it's pretty funny to look at this, of the cliffs we have in the different technologies that we see today, and how do we address some of these challenges?
02:51 AH: So, for reference, tertiary storage with hard disks have a higher capacity than DRAM and compute caches with SRAM, have much higher bandwidth than 3D NAND flash. But let's start with the basics and move forward and look at that underlying media.
Now, for this crowd, these three memory types are all pretty darn familiar. I really enjoy geeking out on these types of pictures when somebody in marketing hires a graphics artist to draw them. And so, I wanted to share them with you and really talk about some of the advances in both 3D XPoint and 3D NAND.
03:26 AH: So, I put these different cells side by side, so you can see for yourself the difference. The memory arrays look completely different, the cells themselves look completely different, and based on their cell architectures they each have very different characteristics. But what's really stunning about putting these images side by side is when they're drawn to relative scale with each other, which is what that white circle's about. That DRAM cell, in the white circle, is what's required and necessary to hold just a single bit of memory. Meanwhile, if we look at that same circle, in the 3D XPoint picture, in the middle, it's much smaller, yet again, as we look at 3D NAND.
So, thanks to QLC technology, we can also store four bits of memory just in that single cell on the 3D NAND circle . . . that DRAM is. Now when you think about the ability for the two technologies on the right, between XPoint and 3D NAND, as we think about those scaling, they can scale with stacking vertical layers, which DRAM can't do. And so, you really see very clearly why DRAM as a memory for capacity is going to continue to be limited. As we keep growing to zettabytes and beyond, the higher density options of 3D XPoint and NAND are needed in the memory hierarchy.
04:44 AH: Now next, I'd like to take you through a 3D NAND roadmap. Intel's had 30 years of experience with flash cell technologies. And going back to . . . I used to . . . When I started at Intel, there was plenty of work on NOR. And our 3D NAND technology is the highest areal density storage technology in the industry today. Now, one of the things that we've done is, the industry's moved towards 128 layers as that next generational space, it looks at the next industry target.
05:12 AH: With Intel what we've done is we've actually looked at 144 layers. And so, in September 2019, at the Intel Memory Day in Seoul, Korea, we announced that Intel's skipping over the 128 layers that most of the industry are developing and jumping straight to 144 layers. The first live demo of that 144-layer technology was Rob Crooke's presentation itself. We remain on track to take this technology into production by the end of the year. Our four bits cell, QLC technology is really helping change the economics of storage and will help fill that cost performance gap within storage that I showed earlier, allowing for SSDs to continue to displace hard drives at higher capacity needs.
05:53 AH: Now let's jump to the little bit on 3D XPoint. So, on 3D XPoint, obviously in the first gen that we delivered back in 2017, we were very proud of delivering that. So Optane products based on 3D XPoint have been delivered to the market as have persistent memory products.
Now, we've been working hard on this memory technology and doubled the density going, from a two deck Optane media in 2017 to a four deck that we're working on today for that second gen Optane media. We learned a lot from our first generation Optane SSD and how we can deliver more performance. So, our second generation SSD is going to move to PCIe Gen 4, and our goal is to exceed 2X performance gain. You'll see lower latency from this SSD, especially with new optimizations for single sector reads. So, get ready for that.
06:43 AH: Now, the thing that I love to see, is this is innovation powered by NVM Express, so that media is awesome, and it takes a ton of breakthroughs, and the folks that work on the media at Intel, I'm super impressed by and in the industry in total. But what was pretty cool was seeing the way that NVMe is bringing that pathway to deliver all this value to the end customer.
And so, if you look at the plethora of products where you talk about Optane SSDs or Optane plus QLC NAND SSDs, or you look at 3D NAND SSDs, and just look at the form factors on this page, I don't have a picture of a BGA SSD because it'd just be a little too small, but it goes all the way from BGA to the quote unquote, "ruler SSDs," and so it just has that gamut of fast, simple and truly scalable from the small to the large.
Now, if we look at that memory and storage hierarchy, again, of that picture of the gaps, we've really filled in blue, quite a bit of those what were gaps. So, if we think about persistent memory that I really didn't talk about, but that's filled the gap of lowering that cliff from what was DRAM all the way down to flash. We have the 3D NAND SSDs and the Optane SSDs which are really giving that performance and bandwidth opportunity and capacity to fill these gaps.
08:02 AH: Now, all of that, on the right is powered by NVM Express, so we've really move from 10 years ago of all the gaps, to today we're seeing a lot of solutions that are looking pretty amazing for our industry and that innovation, a large part of it is powered by NVM Express, especially as we pave that way for that memory type, as we can then use the memory type more so in those higher ends.
Now, if you look at this growth from an NVMe, powering that connected universe, just take a look at the unit growth, the proliferation in capacity. So, if we look at this chart, the data's breathtaking, the speed in which we proliferated. So, the first NVMe SSD launched only in the second half of 2014, it's not that long ago, a little more than five years, I guess six, now. The growth in each segment though is really on a 10X scale. So if you look at from enterprise, cloud or client of taking a look at 2016 versus 2020 numbers, you're talking around 10X growth rates in number of units, on the left, and on the right, that average capacity just keeps going up and up and up.
09:13 AH: So then if we look at how has NVMe technology growing from the amount of media shift? We grew from three petabytes in 2016 to 29 petabytes in 2019. And for 2020, the projection is 54 petabytes. So obviously, NVMe technology is required and especially in our work from home, study from home lives of today, and just powering all of our lives for tomorrow.
So now I want to talk a little bit about . . . We've had a decade of growth, now what does it look like for the next decade of growth and how do we make sure we deliver that? Now, if I look at NVMe, we have had this evolution of a technology, so back when we were . . . This isn't exactly a kitten, but kind of back when we were kittens and imagining this technology that could be an industry leader, a lion, how did we evolve? And we had a few stepwise focuses in our journey to where we are today.
10:10 AH: So, the first step was, back to 2010, how do we unify PCI Express SSDs? And so, our focus then was defining an NVMe architecture and command set, unifying those PCI SSDs around a common interface and doing things as boring as getting inbox drivers in major operating systems. And so, we did that first wave.
Then the second wave was really looking at, how do we scale over fabrics? And this was really bringing that next wave of deployments broadly, is to understand, how can we scale over ethernet, over InfiniBand, over all of these different fabrics that people want to utilize, and make sure that NVMe works and seamlessly. And so that was the second stage.
10:52 AH: Now, the third stage is where we're at today, which is really, how do we enable innovation? So the key thing we're looking at now is, how do we define what the core of NVMe is and facilitate that innovation, enable new command sets, but do all of that while we do use cases without hurting the core? And how do we make sure, when we make a change, we don't end up causing unforeseen issues with other parts of the specification or other features?
So, when we've looked at the specification over the past three or four years, what we've seen is the original NVMe spec, one of the reasons people loved it is because it was 100 pages and it was easy. Now, as we added NVMe over Fabrics and NVMe/RDMA, and then the TCP layer, we've added more and more features over time of Zoned Namespaces, key value and now we're on the frontier of looking at computational storage.
11:48 AH: Well, the challenge is whenever you start a spec and then you just start sprouting things on top of it, obviously it evolves and it sometimes becomes a challenge of, "Are we having issues with this feature or that feature, and how are they interacting?" And so, you really need to clean that to make sure that you aren't causing unintended consequences when you make spec changes and add new features or other pieces. And so, what we're seeing is, all this expansion is awesome, but it's too complicated and too dependent on human glue where there's, "Who was that expert that wrote the spec or wrote that feature, are they still present 10 years later and can tell you how this works and that works?" So what we've done is we've tried to say, "Hey, let's look at this differently, how do we refactor NVM Express, so that it's back to that clean, original base specification that we wrote back in 2010, to get it on that same footing in 2020?"
12:44 AH: So, what we've done is we have refactored. And so now we have that NVMe base specification that integrates the core functionality of fabrics as well. We've cleanly segregated out the command sets, which you can see on the top of block I/O, key value, Zoned Namespaces. And then on the bottom, we've got the clean transport layers, which include PCI Express, RDMA, TCP.
And so that just gets us back to our core values of that center is fast, simple, scalable, and can layer on additional capabilities. And that way we can cleanly foster an . . . If you were interested in adding capabilities to NVMe, you need to understand that base specification and then just layer on top of it in an extensible way. And if you're not using key value or you're not using RDMA in what you're doing, you can just completely ignore those pieces and not worry that it has any impact on what you're doing.
13:41 AH: So how this looks is, specification families. So, we have the core, that includes whether you're going to be doing PCIe or fabrics on top of it, we've got the command sets that are clean as a family of . . . A clean specification for key value versus a different clean specification for Zoned Namespaces. We have those transport layers that I spoke of. And so just breaking those apart into very clean modular capabilities that you can easily understand and move forward. We also, not shown, have the management interface specification that also was a separate modular specification from the start and continues to be so. So, this is the foundation for NVMe innovation moving forward.
14:26 AH: Now, let's talk more about some of the recent NVMe architecture advancements, because you're probably wondering what we've been doing lately, beyond just refactoring. So, this is the NVMe roadmap of what we've delivered over the past five years, we'd love to go further back, but it gets a little noisy. So you can see we continue to have a beat rate of capabilities that we've been delivering over the past five years, including, if you look back at 2019, we delivered the NVMe over Fabrics with TCP, I/O Determinism, a lot of key capabilities that people are taking advantage of in today's products. What we're delivering in NVMe 2.0 is we've resistant changing it from . . . You guys know we've gone from 1.0 to 1.1 to 1.2 to 1.3 to 1.4 and been boring. But we finally feel like the refactoring really recognizes that major re-architecture of denoting a 2.0 upgrade of the version number. So that new NVMe base spec of the refactor will be called NVMe 2.0, we've got that merger with fabrics. And then what I want to share with you next, is how we are enabling that clean separation of the capabilities, and what some of those new capabilities are.
15:37 AH: So, the first thing I wanted to highlight is namespace types. So, one of the things that we have to have is, if we're going to cleanly segregate things, we need to be able to know what are the different capabilities and how do we know those? So that's a long-winded way of saying, if I go and figure out, underneath each NVMe controllers in this picture, I have an NVMe controller, I have four of them, and then behind each of them, I have a couple of namespaces.
Now, what this is showing me, is that I've got my blue namespaces have one command set, my green have a different type command sets, say key value, and my purple have yet another command set, say zoned. And so what this allows me to do is I can go out and query each of my controllers and understand the namespaces attached and what those namespaces support in terms of their command sets, and then I can easily then use the right command set and as the host just flexibly understand that and move forward. So, this is how we're going to cleanly support that underlying infrastructure that allows us to add other command sets over time and to cleanly support block I/O versus key value versus zoned today. And even layer on computational capabilities in the future.
16:49 AH: Now, you've heard about Zoned, we keep talking about Zoned and it won an Innovation Award with Flash Memory Summit this year, so we're so pleased to have that great work be recognized, that the team has delivered. What zoned is really trying to do is what I'm showing at the top is we have several zones. There's a different, varying amount of storage of LBAs, logical block addresses that are associated with each zone.
Now, what's the key thing about Zoned? I won't make you read the statement sheet because that seems mean, but what we're really trying to go after is evolving to address underlying media changes with larger erase block sizes. So, as we look at things, what a zone requires is that you write sequentially in a zone. And so that can take advantage of media types that are coming on board, more 3D NAND, QLC capabilities that have very large erase block sizes, where we want to write them in order as much as possible, to avoid unnecessary write amplification and over-provisioning.
17:47 AH: So this capability allows the host to device contract and make sure that we can allow those capabilities and not have to do anything fancy on the device side or unnecessarily cause a wear out of the device. So that's one key capability and paired with it, to a great extent, is Endurance Groups. So in Endurance Groups, what you have is you can gather media sets together, so media units, and you can understand what is the wear on that particular media unit, so you can understand, how do you lay out things based on access pattern and whatnot? So, you can understand when you need to look at, your SSD has become too used in a certain area. So that's another key feature often paired with Zoned Namespaces.
18:34 AH: So, moving on to one other one that I really would like to highlight is domains and partitions. So a key thing as we move forward with NVMe is it's not just your SSD in your laptop or a few SSDS in a server, we really have these large-scale solutions where you have a lot of alternate paths to a set of media, and you also need to be able to handle that redundancy and other capabilities.
So in this example here, I've got five different NVMe over Fabrics controllers, I have many paths to seven, each of them has their own path, and I have, for much of the media, if we look at the namespaces behind here, from namespace A to namespaces E, there are different paths to get to those namespaces. And so, what this is allowing us to do is to make sure that we always have a path, and what we can do is we can do segregated updates of the different pieces. So what I can understand is, how do I, for example, update the firmware or get telemetry or update pieces and do maintenance flows on fabric controller number five, while fabric controller number one through four are still doing their work? So that's another key piece of enabling us to scale.
19:49 AH: So this is the non-sexy part of the NVMe, I'm sure that it's all so exciting as you guys are realizing already, but we're doing a lot of infrastructure enhancements and continuing on that path from, when we look at simple copy commands to telemetry enhancements, all these pieces, we are making those highways that enable us to scale across client, cloud and enterprise. And so we have enhancements in management telemetry, large storage systems, all these pieces, and we try to always ensure that they scale from the low end, low end being client, but obviously client matters, but that lower end power, I should say, from that perspective, all the way to the enterprise.
20:30 AH: Now, you might wonder, What's the future? So, one of the key things that I wanted to talk about, that excites me is the future evolution towards computational storage. Now, one thing that you guys have probably seen is that the database universe is complicated. So, we have all these data warehouses that are appearing, and they store lots of data and they don't tend to store them in block format.
So what I'm showing here in the middle is compressed, encrypted, arbitrary format, it's not laid out in nice blocks, they've got these two pools and how you often understand them is dependent on reading through the data, it's just complicated. You have data stored that's compressed and encrypted, which is another complication. And one of the things that's frustrating, is these formats and data are constantly evolving, there's not a single industry standards body that's saying, "This is the database and you shall comply." And there can't be, the innovation is too rapid and too complex. So a key thing that we have to figure out, is if I'm just trying to get that storage table on the right to understand my superhero list of; I've got my eight superheroes, but I have a much bigger database, so I have a ton of superheroes, how do I find the superheroes that live in New York?
21:45 AH: So, let's just look at that, finding the needle in the haystack, which is where I see computational storage as a critical aspect that we want to start working through. So if we look at that, finding the needle in the haystack, on the left, I have this compressed, encrypted, arbitrary format, I really don't want to transfer that entire database over to the host just to search for this filter. Just give me a count of the number of names that are in New York. So what I want to do, is I want to, on the device itself, on the NVMe side, I want to decrypt the names and understand what they look like, then I want to decompress the names and get my superheroes of which I've only showed eight here, and then I want to filter the names based on New York. And so, I get Bruce Wayne and Jessica Jones and Peter Parker. I'm sure they're having a great time together. And then I need to aggregate my count to three.
22:38 AH: Now, this is much better done on the device side, than transferring all that data across and looking for that needle in the haystack. So, this is what we're trying to do in NVMe is establish that infrastructure in order to be able to do computational storage offloads but do that in a way that the environment's going to keep changing. So, what we're working on is a foundation, we're working on, how do you enable a host to download a program that can be invoked in a standard way on the device? And the reason it's a program, is because the data formats keep changing.
So, I need to download a program in hardware-agnostic bytecode that you can run locally on the device and you can choose whatever you'd like for a CPU or whatnot. And that program that you've downloaded, it understands the data format, you don't have to understand that data format within the hardware offload, it is just downloaded with the program. So, as these database formats keep evolving, you just evolve your program, it's not a big deal.
23:42 AH: We also allow, where there's fixed programs, that if you have a decrypt or a decompress that that can be locally called from within the device. And so, your program that you downloaded, that's an eBPF. So eBPF is a commonly used format called Berkeley Packet Filter format that's using networking that we're leveraging, you can take advantage of that to point to that fixed program and operate on the data on device memory, which is shown in that computational program memory. So get familiar on the computational storage work we're doing, because we're setting the foundation and making sure it's a flexible foundation, because this is a fast moving area with fast moving requirements, and we want to make sure that we are able to not be fixed, we need to be flexible and enable where the industry chooses to go. So, get on that journey with us.
24:33 AH: So, that's what I had to say on the key things that I wanted to touch on today, but I really wanted to say the evolution of NVMe technology, I feel very blessed to have worked with a lot of great people over the past 10 years, and looking forward to working with all of you and seeing you all again next year, I hope, in person, and how we keep moving NVMe forward over the next decade and keep roaring like a lion, although we're just little kittens inside. So, thank you guys. So, with that, let's move to Q&A.
25:08 Speaker 2: Thanks Amber. The first question is, when will the NVMe 2.0 specification be released and what are the most exciting new features?
25:16 AH: So the NVMe 2.0 specification is trending towards end of Q1, you can imagine with all the work we've done over the past decade, it takes a long time, with a lot of hands-on work by many people, to make sure we pull all of that together in a clean way and it's fully validated and reviewed thoroughly to make sure we didn't introduce any issues in it. So, it's looking like end of Q1 is currently what it's appearing for our release date. The key new feature or some of the new features that I'm excited about are things that I already touched upon, so Zoned Namespaces is one of the key capabilities that obviously we've also separately ratified, Zoned Namespaces, key value, we also have, continue the evolutions of fabrics and some of the ability to understand and discover fabrics and larger scale implementation. So those are a few that I would highlight.
26:12 S2: The next question is, what effect will computational storage have on NVM Express technology? How has the NVM Express organization been supporting the growth of computational storage?
26:22 AH: I'm pretty excited on computational storage, we've been in a collaboration with SNIA and looking at the end use cases for probably 18 months, two years now. And what we're seeing is, we've learned enough that we're able to understand, Hey, how do we create that base infrastructure? And so, we really moved on that in earnest in the past six months. And so what you're going to see with NVMe is just like what we established with the baseline of the original specification with fabrics, we're establishing a foundation, and then what you'll see is a lot of evolution over the next five years. And as we see more of that evolution, we'll build more and more and more. So I'm just excited that computational storage, it's an area that is going to keep evolving and with the amount of data and the amount of analytics and the amount of AI, it's going to evolve more and more, and NVMe, I think is at the center of it with this foundational infrastructure and allowing people to innovate.