Storage Environments With NVMe-oF Technology
Chuck Piercey, director of product management at Kioxia America, explores how the NVMe-oF protocol enables a new deployment of storage called "desegregation," which enhances Kubernetes deployment.
00:06 Chuck Piercey: Hi, my name is Chuck Piercey. I'm the Director of Product Management at Kioxia America, for a software product called KumoScale, which focuses on software-defined storage. Today, I hope to spend about 15-20 minutes talking about how the NVMe-oF protocol enables a new deployment of storage called "desegregation" that enhances Kubernetes deployment in a really exciting way.
My core takeaway of this is that Kubernetes was made for NVMe-oF or vice versa. And NVMe-oF storage really delivers the last mile of Kubernetes orchestration. I'll do this by talking about first NVME, then NVMe-oF, and then we'll talk about what NVMe-oF enables. And then we'll talk about what that does for Kubernetes.
01:04 CP: So, first, NVMe. The NVMe specification was introduced and provided a set of instructions and commands built from the ground up for flash-based storage media. It utilizes the PCI interface and locally attached server slots to deliver fast SSD performance. This protocol has a number of architectural features that give it tremendous performance advantages.
First, protocol efficiency. NVMe dramatically reduces the internal locking that is needed to serialize I/O. This in turn improves in-route handling efficiency. It also supports Message Signaled Interrupts (MSI-X), and interrupt steering to prevent bottlenecks at the CPU level, enabling massive scalability. When combined with the large number of cores available in modern processors, the NVMe protocol delivers massive parallelism for connected devices, increasing I/O throughput while minimizing I/O latency.
This article is part of
Flash Memory Summit 2020 Sessions From Day One
02:04 CP: Second, command set efficiency. The NVMe protocol utilizes a streamlined and simple command set that requires less than half the number of CPU instructions to process one I/O request. When compared to legacy protocols like SCSI, it also delivers higher input/output operations per second, per CPU instruction cycle and lower I/O latency in the host . . . stack.
02:28 CP: And third, it has massive queue parallelism. With 65,535 parallel queues, each with a queue depth of 65,535 entries, the NVMe protocol supports massive I/O parallelism. Each CPU has a private queue to communicate with the storage device and is capable of achieving high I/O speed because there are no locks between the queues that live on separate processing cores. Since each controller has its own set of queues per CPU, I/O throughput increases linearly with the number of available processing cores. This is like going from a single-lane highway to a highway 65,000 lanes wide. It's a tremendous improvement in I/O.
03:15 CP: The NVMe-oF specification extends these technical advantages over a variety of network transports. As the NVMe protocol replaces the older SCSI protocol, the NVMe-oF protocol replaces the older iSCSI and ISCR network storage protocols. The NVMe-oF protocol enables service to access remote NVMe-based SSDs over standard network protocols with about the same performance and latency of locally attached drives. The specification can accomplish this because it uses Remote Direct Memory Access (RDMA) to efficiently access the memory of another computer without using the CPU or the operating system of either. This enables storage I/O to bypass the . . . stack entirely, delivering streamlined low latency performance with minimal CPU consumption. NVMe-oF is now supported by the majority of commercial and open source operating systems.
04:16 CP: The protocol is evolving to support security multi-target and in-band authentication. This will let you support secure tunneling, have multiple network pass to a target, and have a higher secure in-banding authentication as sponsored by the trusted computing group. Because of these architectural advantages, NVMe-oF delivers remote volume performance rivaling locally attached flash drives, both in IOPS and access latency.
This chart shows the relative IOPS between NVMe-oF and DAS, direct-attached storage. And you can see there on par, there's a slight tax for the remote, but really nothing significant. This has implications for how you should think about data center storage architectures. You should be leveraging NVMe-oF to connect computing storage resources together more flexibly and cost effectively.
05:14 CP: There's a couple of trends that are going to help with this, but first, let's talk about what people do today. Today, data center architectures are built around the direct-attached storage because of performance. It used to be that was SCSI. You really wanted your storage right next to the workload, but there are problems with this.
With advancements in scale at applications and virtualization, directly connected server flash memory has become a utilization bottleneck, as locally attached drives have either too much flash storage or not enough. This forces customers to over-provision either more storage than what's required or less, and you end up with an under-utilization of these valuable resources and an increasing capital expenses that translates to increases in per-user cost.
Further, as the latest SSDs are simply bigger and faster than what a single server needs, we at Kioxia have customers testing 30-terabyte SSDs today. These are incredible devices, huge performance, huge capacity.
06:00 CP: The NVMe-oF protocol should change how architecture design and evaluate data center storage systems going forward. The combination of high-density NVMe SSDs with the plummeting price of flash make the storage . . . for IOPS and storage re-latency the critical metrics for data center storage architecture based on NVMe-oF specification. In particular, next-generation storage infrastructures can leverage the protocol to completely disaggregate storage media into standard white box storage servers where these tremendous devices can be fully utilized across data center workloads.
06:52 CP: So, the solution for this is to disaggregate into a standard white box server at the bottom of racks. This lets you create right-size virtual volumes for each client and lets you optimize across devices within a storage node, and you are able to share this tremendous flash storage across many, many more compute nodes.
07:12 CP: There are several other technological trends that are further driving disaggregation. First, the PCI bus itself. With the advent of PCIe Gen 4, most servers can deliver now 16 gigabytes per second internally, in terms of throughput, removing internal I/O bottlenecks.
Second, the storage media itself, as I mentioned, a single NVMe-based SSD can deliver millions of IOPS, and this far exceeds the requirements of the normal workload running on a single compute server. At the same time, SSD densities are up to 30 terabytes and beyond per drive, also exceeding what a single computer node can effectively use. These two trends increase the economic value of shared storage.
Third, fast networks. Data centers are globally replacing slower network connections with 100 and 200 and 400 gigabit Ethernet that remove bandwidth limitations and bottlenecks across the data center.
08:11 CP: Finally, and this is a big change, orchestration. Orchestrators like Kubernetes needs storage mobility. Kubernetes . . . orchestrators need the mobility that shared storage delivers, otherwise, moving data-intensive storage volumes requires a high-latency copy operation that limits the ability of orchestrators to optimize workload placement across servers. Effectively, locally attached storage creates a gravity well around your workload. The combination of high-density NVMe SSDS with the plummeting price of flash makes storage dollars per IOPS and storage read latency the critical metrics for data center storage architecture based on NVMe-oF.
08:56 CP: There are many benefits to disaggregation. You get fewer node types, fewer compute and storage nodes, increased infrastructure flexibility, and a significant reduction in overall resources required, as well as the amount of flash required. In addition, the flexibility and the ability to quickly provision to different customer needs will increase overall storage utilization. So, net-net, you get a significant reduction in total resource requirements, number of servers, and you get a significant reduction in total flash media card, because you're able to share those high-density flash media.
09:34 CP: Okay, now on to Kubernetes. Kubernetes container orchestration is a platform that helps build platforms, and it has made infrastructure fungible in a way that was inconceivable five years ago. It delivers a highly flexible and efficient form of virtualization that transforms what used to be a hardware problem into infrastructure as code.
Traditional data centers built with locally attached storage resources on each server make persistent data a challenge for Kubernetes-based applications. Kubernetes orchestration uses lightweight container technology in place of VMs to manage workloads and all of their dependencies in an isolated state from other applications or system processes. However, containers are more mobile than VMs, and traditional storage infrastructures have had a hard time keeping pace. You write a bunch of data into one drive and then the workload moves to another server, and now you've got a problem.
While locally attached storage provides good performance, when Kubernetes orchestration schedules a data intensive container like that on to another server, it forces a storage copy that has high latency, and this inhibits container adoption for data-intensive applications.
10:46 CP: The NVMe-oF specification is a major step forward for Kubernetes infrastructures because it enables shared storage across the network at an access latency similar to locally attached drives. Multiple NVMe drives can be centralized into standard storage nodes with high bandwidth network interfaces, as I show in this diagram here, and then you can serve these storage volumes, virtual volumes up to the compute-only nodes running Kubernetes containerized workloads.
So, what I'm showing here is a three-way replicated volume in the slide, and the client only sees one logical volume, but through standard NVMe RAID, you're actually connected using NVMe-oF to multiple back ends. So this gives you resilient data, and because you're using the shared storage we're able to serve up just the storage that this workload needs, and when Kubernetes moves the workload, the volumes simply reconnect back to the volumes without losing any of their data.
11:50 CP: This centralization of storage resources adds a level of on-demand flexibility and cost efficiency, and when a workload requires additional storage, the volume allocation can be scaled up on demand under the covers transparently. When an application storage needs to shrink or disappear entirely, the storage resource can be added back to the resource pool for other workloads. So, this flexibility is really a marriage made in heaven between these two technologies, because NVMe-oF really delivers the last mile for Kubernetes, solving the storage problem and giving you a very mobile back plane for your storage infrastructure.
12:33 CP: So, what to look for when you're selecting an NVMe-oF solution? NVMe-oF really redefines the data center landscape fundamentally. NVMe reinvented storage I/O, and then extending that over standard network transports gives you a whole new model for how you deploy in a data center. The result is a sea change to the way data centers are built and deployed. Instead of spreading storage across computer nodes, flash devices can be disaggregated to shared storage services, they're deployed across failure zones and regions. The result produces increased resource utilization and end-to-end data resilience. Companies in this business need to aggressively evaluate and adopt NVMe-oF storage infrastructures in their data centers.
13:17 CP: There are a few features that you should look for in an NVMe-oF native storage solution. First, you want true NVMe-oF performance. Look for a solution that hooks volumes to targets directly using the protocol and then gets out of the way, thus you take a full advantage of the technological breakthroughs represented in NVMe-oF.
Second, you want a solution that's analytics driven, and that is flash media aware, for volume mapping. Flash storage technology has evolved considerably, and this requires that a storage solution. You should understand the performance characteristics of the different flash media types, like QLC and TLC, and manage efficiency versus load, minimize write amplification and analytically map workload type to the best available media type.
Third, you want online volume migration, something like what I showed you on the diagram there, to support Kubernetes workload mobility. You want the basic features -- think provisioning, volume expansion, data resiliency and QoS management -- so you can keep your customer satisfaction high, as well as snapshots.
14:28 CP: Finally, you want support for both RDMA and TCP/IP networks for deployment flexibility, and you want your solution to support bare-metal Kubernetes and OpenStack environments, so you can drive it into any of your data center environments. We have a solution called KumoScale that we believe meets these criteria. We invite to comment and look at that at our website, kumoscale.kioxia.com. And with that, I thank you for spending time with me today.