NVMe over Fabrics (NVMe-oF)
NVMe over Fabrics, also known as NVMe-oF and non-volatile memory express over fabrics, is a protocol specification designed to connect hosts to storage across a network fabric using the NVMe protocol.
The protocol is designed to enable data transfers between a host computer and a target solid-state storage device or system over a network -- accomplished through a NVMe message-based command. Data transfers can be transferred through methods such as Ethernet, Fibre Channel (FC) or InfiniBand.
NVM Express Inc. is the nonprofit organization that published version 1.0 of the NVMe specification on March 1, 2011. Later, on June 5, 2016, the same organization published version 1.0 of the NVMe-oF specification. NVMe version 1.3 was then released in May 2017. This update added features to enhance security, resource sharing and solid-state drive (SSD) endurance.
The NVM Express organization estimated that 90% of the NVMe-oF protocol is the same as the NVMe protocol, which is designed for local use over a computer's Peripheral Component Interconnect Express (PCIe) bus.
Vendors are working on developing a mature enterprise ecosystem that supports end-to-end NVMe over Fabrics, including the server operating system, server hypervisor, network adapter cards, storage OS and storage drives. In addition, storage area network (SAN) switch vendors -- not limited to Cisco Systems Inc. and Mellanox Technologies -- are trying to position 32 gigabits per second (Gbps) FC as the logical fabric for NVMe flash.
Since the initial development of NVMe-oF, there have been multiple implementations of the protocol, such as NVMe-oF using remote direct memory access (RDMA), FC or Transmission Control Protocol/Internet Protocol (TCP/IP).
Uses of NVMe over Fabrics
Although it is still a relatively young technology, NVMe-oF has been widely incorporated into network architectures. Using NVMe-oF can help provide a state-of-the-art storage protocol that can take full advantage of today's SSDs. The protocol can also help in bridging the gaps between direct-attached storage (DAS) and SANs, enabling organizations to support workloads that require high throughputs and low latencies.
Initial deployments of NVMe were DAS in servers, with NVMe flash cards replacing traditional SSDs as the storage media. This arrangement offers promising high-performance gains when compared with existing all-flash storage, but it also has its drawbacks. NVMe requires the addition of third-party software tools to optimize write endurance and data services. Bottlenecks persist in NVMe arrays at the level of the storage controller.
Other use cases for NVMe-oF include optimizing real-time analytics, as well as playing roles in artificial intelligence (AI) and machine learning.
The use of NVMe-oF is a relatively new phase in the evolution of the technology, paving the way for the arrival of rack-scale flash systems that integrate native, end-to-end data management. The pace of mainstream adoption will depend on how quickly across-the-stack development of the NVMe ecosystem occurs.
Benefits of NVMe over Fabrics
Benefits of NVMe-based storage drives include the following:
- low latency;
- additional parallel requests;
- increased overall performance;
- reduction of the length of the OS storage stacks on the server side;
- improvements pertaining to storage array performance;
- faster end solution with a move from Serial-Attached SCSI (SAS)/Serial Advanced Technology Attachment (SATA) drives to NVMe SSDs; and
- variety of implementation types for different scenarios.
Technical characteristics of NVMe over Fabrics
Some technical characteristics of NVMe-oF include the following:
- high speed;
- low latency over networks;
- credit-based flow control;
- ability to scale out up to thousands of other devices;
- multipath support of the fabric to enable multiple paths between the NVMe host initiator and storage target simultaneously; and
- multihost support of the fabric to enable sending and receiving of commands from multiple hosts and storage subsystems simultaneously.
NVMe over Fabrics vs. NVMe: Key differences
NVMe is an alternative to the Small Computer System Interface (SCSI) standard for connecting and transferring data between a host and a peripheral target storage device or system. NVMe is designed for use with faster media, such as SSDs and post-flash memory-based technologies. The NVMe standard speeds access times by several orders of magnitude compared to the SCSI and SATA protocols developed for rotating media.
NVMe supports 64,000 queues, each with a queue depth of up to 64,000 commands. All input/output (I/O) commands, along with the subsequent responses, operate on the same processor core, parlaying multicore processors into a high level of parallelism. I/O locking is not required, since each application thread gets a dedicated queue.
NVMe-based devices transfer data using a PCIe serial expansion slot, meaning there is no need for a dedicated hardware controller to route network storage traffic. Using NVMe, a host-based PCIe SSD is able to transfer data more efficiently to a storage target or subsystem.
One of the main distinctions between NVMe and NVMe over Fabrics is the transport-mapping mechanism for sending and receiving commands or responses. NVMe-oF uses a message-based model for communication between a host and a target storage device. Local NVMe will map commands and responses to shared memory in the host over the PCIe interface protocol.
While it mirrors the performance characteristics of PCIe Gen 3, NVMe lacks a native messaging layer to direct traffic between remote hosts and NVMe SSDs in an array. NVMe-oF is the industry's response to developing a messaging layer.
NVME over Fabrics using RDMA
NVMe-oF use of RDMA is defined by a technical subgroup of the NVM Express organization. Mappings available include RDMA over Converged Ethernet (RoCE) and Internet Wide Area RDMA Protocol (iWARP) for Ethernet and InfiniBand.
RDMA is a memory-to-memory transport mechanism between two computers. Data is sent from one memory address space to another, without invoking the OS or the processor. Lower overhead and faster access and response time to queries are the result, with latency usually in microseconds (μs).
NVMe serves as the protocol to move storage traffic across RDMA over Fabrics. The protocol provides a common language for compute servers and storage to communicate regarding the transfer of data.
NVMe over Fabrics using RDMA essentially requires implementing a new storage network that bumps up performance. The trade-off is reduced scalability compared to the FC protocol.
NVMe over Fabrics using Fibre Channel
NVMe over Fabrics using Fibre Channel (FC-NVMe) was developed by the T11 committee of the International Committee for Information Technology Standards (INCITS). FC enables the mapping of other protocols on top of it, such as NVMe, SCSI and IBM's proprietary Fibre Connection (Ficon), to send data and commands between host and target storage devices.
FC-NVMe and Gen 6 FC can coexist in the same infrastructure, enabling data centers to avoid a forklift upgrade.
Customers use firmware to upgrade existing FC network switches, provided the host bus adapters (HBAs) support 16 Gbps or 32 Gbps FC and NVMe-oF-capable storage targets.
The FC protocol supports access to shared NVMe flash, but there is a performance hit imposed to interpret and translate encapsulated SCSI commands to NVMe commands. The Fibre Channel Industry Association (FCIA) is helping to drive standards for backward-compatible FC-NVMe implementations, enabling a single FC-NVMe adapter to support SCSI-based disks, traditional SSDs and PCIe-connected NVMe flash cards.
NVMe over Fabrics using TCP/IP
One of the newer developments regarding NVMe-oF includes the development of NVMe-oF using TCP/IP. NVMe-oF can now support TCP transport binding. NVMe over TCP makes it possible to use NVMe-oF across a standard Ethernet network. There is also no need to make configuration changes or implement any special equipment with the use of NVMe-oF TCP/IP. Because the transport binding can be used over any Ethernet network or the internet, the challenges commonly involved in implementing any additional equipment and configurations are eliminated.
TCP is a widely accepted standard for establishing and maintaining network communications when exchanging data across a network. TCP will work in conjunction with IP, as both protocols used together facilitate communications across the internet and private networks. The TCP transport binding in NVMe-oF defines how the data between a host and a non-volatile memory subsystem are encapsulated and delivered.
The TCP binding will also define how queues, capsules and data are mapped, which supports TCP communications between NVMe-oF hosts and controllers through IP networks.
NVMe-oF using TCP/IP is a good choice for organizations that wish to utilize their Ethernet infrastructure. This will also give developers the ability to migrate NVMe technology away from Internet SCSI (iSCSI). As an example, an organization that doesn't want to deal with any potential hassles included in implementing NVMe over Fabrics using RDMA can instead take advantage of NVMe-oF using TCP/IP on a Linux kernel.
Dennis Martin, founder and president of analyst firm Demartek, explains the roles of NVMExpress and NVMe over Fabrics.
Storage industry support for NVMe and NVMe-oF
Established storage vendors and startups alike are competing for a position within the market. All-flash NVMe and NVMe-oF storage products include the following:
- DataDirect Networks (DDN) Flashscale;
- Datrium DVX hybrid system;
- Kaminario K2.N;
- NetApp Fabric-Attached Storage (FAS) arrays, including Flash Cache with NVMe SSD connectivity;
- Pure Storage FlashArray//X; and
- Tegile IntelliFlash (acquired by Western Digital Corp. in 2017 and then sold to DDN in 2019).
In December 2017, IBM previewed an NVMe-oF InfiniBand configuration integrating its Power9 Systems and FlashSystem V9000, a product that is geared for cognitive workloads that ingest massive quantities of data.
In 2017, Hewlett Packard Enterprise introduced its HPE Persistent Memory Server-side flash storage using ProLiant Gen9 servers and NVMe-compliant Persistent Memory PCIe SSDs.
Dell EMC was one of the first storage vendors to bring an all-flash NVMe product to market. The DSSD D5 array was built with Dell PowerEdge servers and a proprietary NVMe over PCIe network mesh. The product was shelved in 2017 due to poor sales.
A handful of startups have also launched NVMe all-flash arrays:
- Apeiron Data Systems uses NVMe drives for media and houses data services in field-programmable gate arrays (FPGAs) instead of servers attached to storage arrays.
- E8 Storage (bought by Amazon in 2019) uses its software to replicate snapshots from the E8-D24 NVMe flash array to attached branded compute servers, a design that aims to reduce management overhead on the array.
- Excelero software-defined storage runs on any standard servers.
- Mangstor MX6300 NVMe-oF arrays are based on Dell EMC PowerEdge outfitted with branded NVMe PCIe cards.
- Pavilion Data Systems has a branded Pavilion Memory Array built with commodity network interface cards (NICs), PCIe switches and processors. Pavilion's 4U appliance contains 20 storage controllers and 40 Ethernet ports, which connect to 72 NVMe SSDs using the internal PCIe switch network.
- Vexata Inc. offers its VX-100 and Vexata Active Data Fabric distributed software. The vendor's Ethernet-connected NVMe array includes a front-end controller, a cut-through router based on FPGAs and data nodes that manage I/O schedules and metadata.
Chipmakers, network vendors prep the market
Computer hardware vendors broke new ground on NVMe over Fabrics technologies in 2017. Networking vendors are waiting for storage vendors to catch up and start selling NVMe-oF-based arrays.
FC switch rivals Brocade and Cisco each rolled out 32 Gbps Gen 6 FC gear that supports NVMe flash traffic, including FC-NVMe fabric capabilities. Also entering the fray was Cavium, refreshing the QLogic Gen 6 FC and FastLinQ Ethernet adapters for NVMe-oF.
Marvell introduced its 88SS1093 NVMe SSD controllers, featuring an advanced design that places its low-density parity check technology for triple-level cell (TLC) NAND flash devices running on top of multi-level cell (MLC) NAND.
Mellanox Technologies has developed an NVMe-oF storage reference architecture based on its BlueField system-on-a-chip (SoC) programmable processors. Similar to hyper-converged infrastructure (HCI), BlueField integrates compute, networking, security, storage and virtualization tools in a single device.
Microsemi Corp. teamed with American Megatrends (AMI) to develop an NVMe-oF reference architecture. The system incorporates Microsemi Switchtec PCIe switches in Intel Rack Scale Design (RSD) disaggregated composable infrastructure hardware running AMI's Fabric Management Firmware.
Among drive-makers, Intel Corp. led the way with dual-ported 3D NAND-based NVMe SSDs and Intel Optane NVMe drives, which are based on 3D XPoint memory technology developed by Intel and chipmaker Micron Technology, Inc. Intel claims Optane NVMe drives are approximately eight times faster than NAND flash memory-based NVMe PCIe SSDs.
Micron rolled out its 9200 series of NVMe SSDs and also branched into selling storage, launching the Micron Accelerated Solutions NVMe reference architecture and Micron SolidScale NVMe-oF-based appliances.
Seagate Technology introduced its Nytro 5000 M.2 NVMe SSD and started sampling a 64 terabyte (TB) NVMe add-in card.
Using Hardware Acceleration to Increase NVMe Storage Performance
Evolution of Ethernet-attached NVMe-oF Devices and Platforms