Download this presentation: Challenges in Hyperscale: What Hyperscalers Care About
00:07 Ross Stenfort: Alright, welcome to our talk today. Today is Flash the Lens of Hyperscale. I'm Ross Stenfort of Facebook. We also have Lee Prewitt of Microsoft and Paul Kaler from HPE. So, with that, how about we get started here?
00:35 RS: Let's talk about Facebook and its mission. Facebook's mission is to give people the power to build community and bring the world closer together. Today, we have 1 billion people on Instagram, 1.3 billion people using Messenger; in all total 3.1 billion people using Facebook products today.
Here you see one aisle from one data center, where you see lots of hardware that's optimized to give our users the best experience possible when they use our applications. Here you see a picture of our data centers. Each of these is roughly four football fields long, and house large amounts of hardware, and enables us to meet our users' needs. And with that, let me pass it to Lee.
01:30 Lee Prewitt: Hey, so, morning everybody. This is Lee Prewitt from Microsoft. Just want to talk a little bit about Microsoft's mission statement: Empower every person and every organization on the planet to achieve more. So next slide.
This article is part of
So, Ross talked a little bit about the scale of Facebook and their data centers. Microsoft Azure also has quite a lot of global scale. You can talk about the different metrics around the amount of fiber that we've laid, the background, the backbone and the amount of data that gets transferred, the different regions that Azure has clusters in, clusters of data centers, which they're now over 100, and of course millions and millions of servers. Next slide.
02:23 LP: So one of the things that Microsoft is doing, because all of these data centers have an enormous eco footprint, what we're looking at is ways to mitigate that, as well as bring services closer to the people that are using them. And with that, we did a project here called Natick, where we took big steel tube, stuffed it full of servers and sunk it in the ocean. And that data center in a tube . . . Oop. Back a bit. There. Hold on. Yeah.
03:00 LP: So, we sunk that data center in the ocean, and we were able to use that data center for about two years, a little more. We pulled it back up recently and took a look at what happened there. And we found that we were able to, with the dry nitrogen atmosphere and the fact that nobody disturbed those servers, the failure rate for the components inside that data center tube was about 1/8th of what it would have been on dry land. So that should inform you of some of the things that we're going to talk about next. So, next slide. So, the next section is hyperscale deployment challenges, and, Ross, take it away.
03:48 RS: Thank you, Lee. So, let's talk about some of our form factor challenges and needs versus trends. On the left here, you'll see the hyperscale needs, where the IOPS per terabyte first need to scale linearly. So, as the capacity grows, the performance also needs to grow linearly. Low air flow is critical to ensure data center air flow is sufficient. And serviceability really matters to us, and solutions need to scale for the future. Then on the right, if you look at the market trends, the M.2 is unable to scale the IOPS per terabyte past 2 TBs, due to the power and performance limitations. And NANDs are getting larger, resulting in larger-capacity drives. PCIe Gen 4.0 and Gen 5.0 are coming, which increase power.
And so, as you see in the table on the right, as the capacity increases, so does the power needed for the device increase. However, it's not all bad news. If you look at the watts per terabyte on the right-hand column, you see the trend is actually down and decreasing.
So, what does this all mean? If we go to the conclusion here, the scale IOPS per terabyte, hyperscales will not be able to deploy M.2s past 2 TB. As capacity increases, the device power budgets will increase and the watts per terabyte will decrease, driving increased efficiency in the data center. Now, Lee, if you want to talk about flash form factors.
05:27 LP: Yes, so if M.2 isn't going to cut it anymore, what can we do? About two-and-a-half, three years ago, we formed the EDSFF consortium to explore moving to new form factors that were specifically designed to be data-center-friendly. And with that, we were able to come up with several form factors for the data center, including the E1.L, and specifically here, the E1.S.
And with that, we find that the ability for these devices to share a connector, share a form factor and share the innards of the device with the ability to then put on different packaging around them to fit different power envelopes, is super powerful. And so with that, the E1.S form factor allows multiple things like hot plug support, works in different IU, especially 1U for the hyperscale folks, and the E3 variants work very well for the enterprise folks. There we have a path to Gen 4.0 and Gen 5.0 and is fully standardized in SNIA. And now we have, as the graphic here shows, with all the different samples that you can see there, we have very broad market support going forward. Next.
07:02 RS: So, let's talk about next-gen E1.S flash platforms. Here you'll see a couple platforms. If we start on the left, you'll see a picture of an E1.S latch and a 10U blade. You'll see chassis with both 1 and 20U blades in them. And these provide excellent density, low airflow and flexible CPUs of flash ratios, and excellent serviceability. For those of you who have more interest in these boxes, I'd encourage you to come visit OCP Storage where Facebook is working at donating these platforms to OCP. And with that, Lee, do you want to talk about debug challenges?
07:50 LP: Sure. So as we talked about in Project Natick where we had sunk a data center in the ocean, we could see where the ability to debug what's happening in that data center, and how to service that, and how to keep everything running as best as possible, has certain challenges. You can't just send a tech out to replace a device or go out there with your JTAG debugger and access those devices to see what's happening. And so the ability to do remote debugging is critical. And so, obviously, as you say here, no JTAG, no UART, no remote access . . . Or no access, physical access, right? And so with that, we need the ability to be able to have the drive tell us what's happening through telemetry, through logs, through any sort of smart data, and now all that stuff has to be rich, robust and frankly human-readable. Next.
09:00 RS: And with that, let's talk about OCP NVMe Cloud SSD. So, if I start here with the hyperscale SSD market problem, the problem is everybody wants their SSDs with firmware customization that is similar but different. And of course, everybody wants their SSDs now. What that leads to is customers battle with marketing on who's getting their SSDs first, engineering struggles with challenging priorities, engineering lacks infinite resources and struggles to build SSDs, quality suffers due to the resources, time and inability to focus on a single solution. And on top of that, bugs are very painful to put in branches and to port back to the trunk and validate. The end result is engineering is overwhelmed, they miss their schedules due to lack of resources, and everybody struggles.
09:55 LP: So, what's the answer to that? So, Facebook and Microsoft have a modest proposal, trying to bring everybody together. And we want to combine all, at least to start, combine the Facebook and Microsoft requirements for SSDs in our data centers into one unified specification, to align the industry and help everybody's time to market. Next slide.
10:28 RS: So, with that, here you see a picture of the NVMe cloud SSD specification, which has been contributed to OCP. There's a link at the bottom where you can go get it. And version 1.0 is available now.
10:45 LP: So, what does this new specification encompass? Currently, it's around 70 pages long, it's got quite a few, I believe 430 individual line item IDs. We actually ID'd each requirement such that it can be tracked and is testable. And so around that, we're talking about things around NVM express, so the protocol, specifically where NVM express may call things out as optional but they're required for our data centers, then that's called out. Things around the PCI Express and the logging that we talked about earlier, reliability, thermals on . . . Especially security, which is a major topic. So, everything needed to build an NVMe Cloud SSD. Next.
11:38 RS: So, what are the system makers saying about this? If we start off here, we have Jonathan Hinkle at Lenovo: "This is a new cloud SSD specification and it is excellent development from Microsoft and Facebook. We previously aligned with them, Intel and other key industry movers to develop the EDSFF drive standards such as E1.S, optimized to create new value in the data center systems. Now their cloud SSD spec takes us a step further, providing further industry alignment through the common requirements of reducing unnecessary cost for drive suppliers in the industry." Paul, would you like to comment here?
12:25 Paul Kaler: Sure, so HPE sees a significant value in combining the requirements for cloud and enterprise system providers into one open spec, thereby improving the time to market and quality of the drives by reducing multiple development streams and test efforts. Those all increase cost and fragment the resources of the industry today. And we have a little bit more to share on this later in the deck that'll shed some light on the unified capabilities that we're targeting.
12:56 RS: Thanks, Paul, we look forward to hearing more from you later in this presentation. So how is the industry and the ecosystem embracing this? So, "UNH-IOL is excited to be working with compliance tests for the OCP NVMe Cloud SSD specification. This will serve as an excellent complement to our existing suites from the NVMe compliance suites." That's from David Wolf at UNH. At the bottom of the slide here, you'll see how there is a posted test plan from UNH, so I encourage you to go take a look at it.
13:38 LP: So, from our partners at Teledyne LeCroy and OakGate: "Teledyne LeCroy, the industry's standard for protocol compliance has developed a comprehensive OCP Cloud SSD test solution that includes both software and services offerings. The test read is available as part of the OakGate test and validation software for SVF Pro, Enduro, and additional services are offered through Austin Labs for conformance and pre-compliance testing to the test specification." And that's from Aaron Masters, VP Engineering, OakGate products.
And also, Quarch. "Quarch is developing a range of scripts for OCP NVME Cloud SSD compliance testing. These scripts will allow effective testing of power, performance and power loss requirements." And that's from Mike Dearman, Quarch CEO. Next slide.
14:29 LP: So, key takeaways. Benefits system makers and SSD providers; what do we mean by that? So, if we have one spec, then there's not multiple firmwares, there's hopefully a single firmware, which becomes much better tested across all of our partners, the vendors, and our own test reads internally as well. So, this makes sure that everything works together as best as possible. And enables additional collaboration between hyperscalers in the industry. Again, as you can see from the quotes above, people are aligning behind this, and if everybody is looking to do the same sorts of testing and same sorts of drive requirements, then you can see where, if everybody's aligned here, then the thought of many eyes make for shallow bugs, which makes for a more robust ecosystem overall. Next slide.
15:31 RS: So, if there are questions about what hyperscale risk require, the NVMe Cloud SSD specification is public, it's available now, this document enables the industry to be successful in partnering with hyperscale, and I encourage you to go take a look at it. So, with that, the next question here is, what are the next steps with OCP NVMe Cloud SSD? And let me hand things off to Paul here to talk.
16:03 PK: Thanks, Ross. So, I wanted to talk a little bit about how HPE sees value in what's going on with the OCP NVMe spec and some of the benefits. And, traditionally, HPE has kind of had our custom firmware specifications for drives, and we've developed those specifically because we had critical value and benefit that we saw in those. Some of those values were ensuring that we would get consistent behavior across our drive portfolio.
As Lee mentioned earlier, there's a lot of optional features and industry standards, and we need to be able to specify which of those optional features we really need to have as mandatory for our drives. Then there's also, I think everybody's aware, there's a lot of vague language in the industry specs and they can be interpreted one way or another, and so being able to clarify that vague language was important to us, and so that was a key reason why we had our own firmware specifications. Assurance of supply was another big one. So once you get that consistent behavior, that enables multi-source because now you've got multiple suppliers that are all delivering the drives that behave the same way, and you can do multi-sourcing, which is great for assurance of supply.
17:10 PK: And I think Lee mentioned this earlier too, about trying to get better issue resolution and debug, and so we had additional telemetry and metadata logs so that we would be able to get very specific detail about what a drive is doing, what kind of failure mechanisms that were involved, and of course that always helps with the getting faster issue resolution.
And the other big one was we just overall would get improved quality. We could spec out the best practices that we've kind of learned lessons over time, and so we'd have a lot of that value put into our specs. And when we first read the new OCP NVMe spec, we saw, "Hey, this has got a lot of commonality, a lot of the same type of benefits that we saw in our firmware specifications." So those features and requirements were a pretty good then overlap with ours, and so we started thinking about, well, how can we leverage, and get even more leverage with that OCP spec, by maybe creating a common unified spec between cloud and enterprise use cases? That not only achieves, of course, our original value benefits that I just mentioned, of all the reasons why we had our own custom specs, but it also increases your ability to drive economies of scale.
18:23 PK: And so those open requirements that are there now can also enable third party compliance tests to work more fully, right, they have the complete specification for what your drive supports so they know how to test. In the past, we might have had some special telemetry or metadata logs that they might not have known about. Now if everything is out in the open, it's easier for them to develop really robust and complete third-party compliance tests. And of course, as it was mentioned earlier, that benefits both the drive suppliers and the system providers like HPE, because you have more eyeballs looking at the same code, the same firmware, and the same test compliance suites, and so all of that results in better quality and quicker time to market.
19:09 PK: So HPE is working with the original authors, Microsoft and Facebook, of the OCP Cloud SSD spec, to get to a common unified spec for both enterprise and cloud use cases. We're actually, at the time when this slide was created, we didn't have Dell on board yet, but they have recently signed on, and so can announce that they are also working with us, Dell and HPE, to come up with more of the enterprise requirements to work with Microsoft and Facebook.
And so, we are all working on planning to release an updated specification in December of this year. And we just recently announced this to the OCP Storage working group on October 8th, so just a little while ago. And so if you guys want to go look at the slides that were presented at that time, you can get those at the Storage OCP wiki; there's a link there to go get the monthly meeting notes and minutes, and you can see what we presented there to the OCP Storage working group. So, I think that brings us to the end. Thank you very much for your time and I look forward to any future questions.
20:20 RS: Yes, thank you very much. I appreciate you listening to us today.
20:25 LP: Yeah, thank you everybody.