Download the presentation: A look at Intel's new memory technology
00:02 Alper Ilkbahar: Good afternoon everyone. Even though just virtually, I'm really glad to be with you here today, and thank you all so very much for making the time to attend this talk. Before we start, most importantly, I hope that you, your families and your teammates are all doing well and staying safe. Once again, thank you for being here with us.
I had the privilege of giving a talk in this conference about a year ago, and this was right after we launched the first generation of the Intel Optane persistent memory products. So, I thought I'd come back this year and give you an update about the progress we've made over the past year, and how things are going and where we are going with this new technology. With that, let's get going.
00:50 AI: One of the major trends that we see in a data center is this insatiable and exponential growth of demand for processing power. The data-driven, data-centric world that we live in, enormous amounts of data creation and the desire to process and extract insights and value out of this data is really driving the demand for more and more processing power.
The unit by which I tend to measure the processing power is a CPU core. And with that, I will try to show you how over generations of different products and over time, how the CPU core counts have, or our Intel CPU core counts have increased on our Xeon processors.
Now, this growth that is happening at an exponential rate is happening, despite the fact that Moore's law is slowing down, the scaling of transistors have slowed down, but through architectural innovations such as disaggregated integration of small chiplets on a package and integrating to make larger and larger CPUs have continued the scaling path that Moore started many years ago.
02:05 AI: The same however cannot be said for DRAM. DRAM technology scaling was following the original Moore's law up until the mid-'90s. They were scaling at about 4x in density every three years. In the 2000s, that scaling slowed down to about 2x density every two years, and over the past decade or so, it's been going on at the rate about 2x density scaling every four years.
This is just the density scaling, cost per bit scaling actually has been running even slower than that. We're seeing maybe 10%-12% on an annual basis per bit cost scaling. So, when you look at what a balanced computer looks like, the memory bandwidth and the memory capacity need to scale in proportion to the compute power. This is Amdahl's law.
03:02 AI: So what I did is I've taken the core growth that I have showed on the left slide here, left side of the slide, and apply that to a curve, which is this yellow curve that shows the memory needed to commensurately scale with the core growth that we're seeing. And against that need, that memory need, I am also drawing here in blue curve what the technology, the DRAM technology's scaling up is capable of delivering.
So, as you can see, we are seeing an exponentially growing gap between the need for memory and what the technology can do. When you have such a growing gap, it really creates a significant bottleneck for our server platforms. As a matter of fact, we view this gap as one of the major bottlenecks and issues, limitations in the data center going forward. And many of our customers share the same view. So, let's take a quick look at what we're doing about this.
04:18 AI: The memory bottleneck we just talked about in terms of the memory capacity and cost, as well as the storage bottleneck in the system that many of you in the audience are working out, are going to be major bottlenecks in our systems, and we at Intel, have a vision of addressing these bottlenecks through re-architecting the memory storage hierarchy. We envision a two-tiered architecture for both memory and storage. In the case of memory, we see implementation of a performance tier, a thin performance tier with DRAM and a capacity tier with Optane persistent memory that's attached to the DDR bus, that's accessed through load and store instructions over hardware has near DRAM speed, yet significantly cheaper than DRAM.
In the case of storage, we also envision a two-tiered architecture, but in this case, the Optane persistent memory is used as the performance tier. It's non-volatile and it's super fast and it allows us to achieve the performance metrics while our traditional storage elements like the SSDs are going to be significantly slower, but a lot more cost effective implementing the capacity tier.
So, this overall architecture is how we envision to solve these major bottlenecks in the data center. This information, what we discussed so far, is more or less a refresher on persistent memory and our vision thereof. So hopefully, many of you already knew about this, but I thought I would just go over that so that you have the context under which we talk about this.
06:09 AI: Next, I want to start talking about how we spent the last year, what are the things that we have done around persistent memory since launch of the product. One of the most important things you have to do when you're introducing a totally new product category to the marketplace, is making sure that you have built a robust ecosystem around that product. We at Intel are really proud of the growing and vibrant ecosystem around the Intel Optane persistent memory.
At the foundation of this ecosystem, we have our hardware partners, OEMs and systems integrators. Here you will see the logos of pretty much all major OEMs and systems integrators that have designed, qualified and made available for sale platforms, server platforms with Intel Optane persistent memory. You can go and buy a system today from any of them that is supporting Intel Optane persistent memory.
07:09 AI: The next tier of the ecosystem, are the cloud service providers and communication service providers. These partners of ours, either use persistent memory as part of enabling certain services that they're offering to their customers, or directly instances with Intel persistent memory that you can go and buy today. As you can imagine, this is a very competitive area, and many of the companies in this space view their innovation around persistent memory as a competitive advantage and choose to remain a bit quiet about that. So, we are only using the names who we have permissions for, in this case.
And at the top of the ecosystem layers, you're seeing the software vendors, these are ISVs and software vendors that we've been working for many years in certain cases. They have either qualified their existing software around persistent memory, or in many cases, actually totally created new software, totally innovative software around persistent memory and optimize their existing software around persistent memory and supporting the revolution that we have going on. I really need to take a quick minute and thank all of our partners, our customers, and those of you in the audience today who have contributed to this ecosystem through your hard work, through your wonderful ideas and innovation, to this field that's just so ripe with fantastic growth and innovation. Thank you all very much for all your contributions.
08:52 AI: Next, I like to talk about our customers. We've been in the marketplace for about a year. How have we been doing in the market? I'm happy to report that over the past year, we have, right now, about 200 Fortune 500 companies, that have either directly deployed Optane persistent memory or are in the stages of POCs. We have been really lucky and blessed to see a POC, or proof of concept testing, to sales conversion or a deployment conversion of over 85%. This is a pretty amazing number. You don't get this kind of numbers in typical new products. And then these POC conversions to date, actually result in over 270 production meetings. We are really pleased with this momentum we're seeing in the marketplace.
09:57 AI: When you look at our customers and talk about, "How are our customers using? What benefits are our customers seeing out of persistent memory?" we typically see three categories of deployments.
In one category, and we call that the TCO savings, we see customers typically with applications and workloads that require large amounts of memory. And by replacing DRAM in their systems with persistent memory without losing performance, these customers are able to reduce their overall systems cost and grain significant TCO savings in the process. So, when you're seeing 30% to 40% type of overall system TCO savings, this is certainly the type of really impressive numbers we have gotten used to seeing in this domain.
The second area, the second category of customers use cases, are around increased throughput. What we see in this case typically are workloads that are otherwise memory bottlenecked, memory constrained. By using the persistent memory and increasing the memory capacity in the system significantly, we remove these memory bottlenecks and unleash the processing power of the servers that they have. When you look at some of these numbers like improving the amount of work that's getting done, the number of jobs that they're running -- two, three, four, X -- type of numbers are not uncommon in these kind of environments.
11:20 AI: And finally, in the third category, faster time to insights. Our customers typically here, are those who are using the persistent nature of our memory. Either by removing certain storage bottlenecks or using unique features like instantaneous recovery or fast start-up type of features that give us better uptime, better availability, and higher productivity. Given the incredible characteristics of persistent memory as a storage device, seeing improvements like or drop managers or higher are really not uncommon in this area.
One of the biggest news for us this year has been the introduction of our second-generation persistent memory, the persistent memory 200-series. For this new product, we designed a brand-new media control. This media controller allows us to not only increase our performance by about 25% over our first generation, but also allows us to attach Optane to the next generation of DDR4 buses, the higher speed buses. We introduced this product along with the third-generation Xeon scalable processor earlier this year, but it will also support the upcoming ASIC processor when we introduce it later this year.
12:46 AI: So, in order to showcase you the performance of the 200-series, I'm going to rely on this graph here. I'm going to compare it against a couple of other devices. The first one in green here is in an NVMe SSD using 3D NAND from Intel, the yellow curve here is our Optane SSD and other NVMe device. And the two Optane persistent memories here, our first generation, the blue line and the second generation, the 200-series in the orange line. I like this kind of a graph which actually shows the read latency in y-axis is a function of the bandwidth that's running, the traffic that's running on the device. So, this is a good representation of how real-life stress test would be run, how under real life workflows the device would perform. You really want to see some traffic and then the performance of the device while it's running traffic on.
13:42 AI: So, let's first look at the latency compared to an SSD. Right around one gigabyte per second type of a traffic, a modest traffic, you're going to see that Optane persistent memory actually delivers 1000x lower latency compared to a NAND SSD. It's just phenomenal difference between these two types of devices. When you look at the bandwidth, we are up to 3.8 times higher bandwidth than compared to a NAND SSD as well. From generation to generation, between this blue curve here and the orange curve, you're going to see that we have roughly a 25% improvement in our performance, so this is what the new second generation product is offering us. And these are just really fantastic results, and we will talk about how we are taking these kind of performance numbers and translating them into this architecture we've talked about to resolve some of the memory and storage bottlenecks in the data center.
14:48 AI: Whenever I talk about Optane persistent memory, one of the first workloads we go and characterize is SAP HANA. SAP HANA is an in-memory database that uses really large amounts of memory, and SAP really optimized it perfectly for Optane persistent memory. So, we've taken our 200-series product, and in our IT department tested its performance running SAP, two different configurations. In the first one, we actually were able to increase the memory size in our systems by 2x, and still achieve similar cost compared to a DRM system, while delivering six times improvement in the database restart times, so significant improvements.
At the same time, we also had a second setup, where a very large database was actually being implemented on multiple servers. So, by increasing the memory content in each server, we're able to reduce the total number of servers by 50%, while still delivering the same data capacity and without any performance loss, achieving 52% lower cost in our hardware.
So, these are pretty compelling numbers for SAP HANA users, really great easy improvements as well as restart time improvements that reduced the downtime and increased the availability and reliability of their systems, so very, very compelling use cases and improvements and valid propositions for this product.
16:23 AI: Next, I want to talk about a totally new innovation from Intel, combining Intel persistent memory with a software innovation that we have we call DAOS, which stands for Distributed Asynchronous Object Storage. It's a software stack that we developed with our HPC -- high performance computing -- customers. Our HPC customers were seeing bottlenecks in their storage solutions and wanted to really re-architect their storage subsystems to get around this. What kind of real-time, real-world problems were they seeing?
16:57 AI: Here's a couple of examples from the HPC world. The first one is looking at HPC simulations. One of the problems that we have in these environments is the fact that ultimately when you're using storage, you're using devices that are block-based device or the access are block devices, so you have to take the data that you're creating and serialize them in blocks and then write them into your devices and when you read them back to do the same thing back.
So, when you do that, you're creating overheads in your software doing the serialization and de-serialization. And even worse, which is even more amplified in the AI and analytics case, and in this case, what we see is new types of data structures, a lot smaller in their size, much more unstructured type of small data pieces that are distributed all over different blocks.
17:54 AI: The issue with this kind of an architecture is when you have an HPC environment with significant number of users, huge number of IOPS running on multiple processes. It's quite probable that one of the processes goes and accesses the block to get a piece of data, and while that block is being accessed, the entire block is locked and cannot be accessed by other processes. So, if another process wants to get data from the same block, they have to just sit and wait. As I said, when you have millions and millions of IOPS running on your device, non-stop, these clashes happened pretty frequently and start really bringing down your overall performance. So how did we solve this problem?
18:44 AI: DAOS combines Intel Optane persistent memory with traditional block storage devices. And, by placing the right data type on the right storage device, it worked around some of the problems we talked about earlier. The byte-addressable, super-fast Intel Optane persistent memory receives small data structures that require low latency and frequent axes such as metadata, indices, low latency I/Os -- they all go to the persistent memory. Your bought data that is properly aligned to blocks already, naturally aligned to blocks already, ends up going to your storage devices like NAND-based SSDs, or even Optane-based SSDs.
So, by placing the right data type, as we talked about, on the right device, some of the serialization over adding software is eliminated. The locks we talked about, because multiple processes clashing on the same block are eliminated. So, we see significant storage performance boost out of DAOS because of these improvements.
Another important consideration for us was making sure that developers could consume DAOS easily, so that they didn't have to rewrite their code. So, to ensure that, we made sure that DAOS implemented popular APIs and developers simply can choose the API that they are using in their software and plug into DAOS through that existing software interface. So, we are seeing really some incredible results with DAOS and, on the next slide, I'm going to share some of those with you.
20:26 AI: So, here's the great results I've been talking about. We've been breaking world records in storage with DAOS. The IO500 list is a list of high-performance computing storage benchmark results that people are gathering and reporting globally. And we, with Intel DAOS, placed at the top of the list. We scored roughly 2x their number two solution, while using only 15% the number of nodes.
On a more equal playing field, where the number of nodes was kept constant across all the different solutions, which is called the 10 Node Challenge, the number one, number two and number three scores are all Intel DAOS-based and persistent memory-based. Intel is number one in our partner's Texas advanced computing center, and the Argonne National Labs are placing number two and three, significantly outperforming anything else that's out there today. So, we're really thrilled about this performance that we're seeing with DAOS. It is an open source software stack available to you. And I'm hoping that you get a chance to go and check it out and see the concepts and how we've achieved these amazing results with DAOS and persistent memory.
21:50 AI: Whenever I talk about persistent memory and storage, I have to talk about Exadata, Oracle Exadata. This is a combination of multi-year collaboration, a very intimate collaboration we've had with Oracle, and I'm really proud of this relationship, the partnership and the results that we got out of it. By embracing the persistent memory architecture concepts and really embodying them in a perfect outcome, Exadata engineers were able to marry RDMA technology, which was over the converged Ethernet, and put persistent memory in their storage service as the hot tier, just exactly the two tiered that we talked about. They were able to achieve 2.5 gigs higher IOPS compared to their previous generation of Exadata appliance, achieving 16 million IOPS on the appliance. They received 10x better latency than previous generation, achieving less than 19 microseconds per database reads, which is phenomenal. Having the great market reception and success, Oracle recently announced that they are going to now make Exadata X8M available on their cloud offering. And, from the looks of it from the data that I have seen, this is a major competitive advantage for Oracle or their competition right now. So, be really thankful to Oracle for the fantastic growth that they have done.
23:26 AI: We talked about Oracle using RDMA as a key ingredient in their solution, and RDMA is really phenomenal when combined with persistent memory and storage applications. Specifically, replication, which tends to be actually a key feature of storage subsystems, and in many cases, especially in transactional processes, a bottleneck of the storage system performance. In the graph on the left, I'm showing performance numbers, latency numbers comparing replicating to an NVMe SSD over TCP IP protocol or RDMA versus replicating to Intel persistent memory over RDMA. You can see that the improvements in latencies are more in order of magnitude, more than 10x, better with the persistent memory. And here you can see on the right side, the breakdown of the improvements.
24:24 AI: Not only is the media significantly faster, but a lot of the savings actually come from the fact that we are entirely eliminating the software overhead associated with moving data in the absence of an RDMA protocol with a memory infrastructure. We have talked about this, and until recently, implementing more proprietary RDMA implementations has been sort of the domain of experts who have really specialized around doing this. So, we thought that we needed to democratize this wonderful technology. And we have very recently made available on our developer's kit called PMDK, Persistent Memory Developer Kit, a new set of libraries called librpma, that really, we believe, is going to make it significantly easier to integrate and use RDMA in your solutions. So, we are really excited about this and really looking forward to hear from you your feedback, your questions, your suggestions as to how you're going to be using RDMA with persistent memory in your solutions.
25:37 AI: Before I close my talk, I wanted to remind you of some of the resources available to you regarding persistent memory. For most, our book, the book on programming persistent memory. If we were able to do this in person, I would have had the pleasure of giving each one of you a hard copy of this book, but no worries. We're still making it available for download for free on our website. As well as data and more detailed implementation details on products and solutions for persistent memory, and how our customers are innovating around persistent memory and using and deploying and taking advantage of this phenomenal technology. So, I would greatly appreciate if you can please go and check out our websites and these sites to learn more about these.
26:29 AI: And finally, I would like to thank each one of you so very much for being here and giving me your attention for the past 30 minutes. As always, we all at Intel are looking forward to collaborating and working with each one of you, to further the persistent memory revolution together. Thank you all very much and stay safe.