Which NVMe fabric will win out?
Are you wondering how NVMe over fabrics will shake out? Expect RDMA to take the early lead, but storage pros will continue their love affair with Fibre Channel.
You have several NVMe fabric options when deploying NVMe over fabrics. NVMe over Fibre Channel is one, using the Fibre Channel networking storage pros know well. Most other NVMe-oF options are remote direct memory access-based, including RDMA over Converged Ethernet and iWARP.
NVMe will enter many data centers in extremely high-performance applications that typically require dedicated hardware. Of the NVMe fabric choices, I expect that RDMA fabrics will take an early lead. The specialized hardware required for NVMe over RDMA is already available. These are RDMA-enabled network interface cards (RNICs). Almost every 10 Gigabit Ethernet NIC is also an RNIC, although they aren't all the same type and you can't mix and match them.
As NVMe replaces SCSI as the lingua franca of storage, enterprises will access full-featured arrays, such as Dell EMC PowerMax or NetApp All-Flash FAS, over the Fibre Channel (FC) they know and love. Most Ethernet NVMe will end up on TCP, which runs over standard Ethernet equipment without requiring specialized RNICs or host bus adapters (HBAs). RDMA will stick around for controller-to-shelf connections in arrays, such as Pure Storage's FlashArray//X, but I don't expect it to be mainstream.
NVMe fabric recommendations
While reliable RDMA over Converged Ethernet and Internet Wide Area RDMA Protocol can theoretically run over networks without special configuration, RoCE vendors still recommend users enable priority flow control (PFC) and Explicit Congestion Notification (ECN) to eliminate dropped packets. Enabling PFC and ECN requires some configuration work.
RDMA offloads can reduce latency and CPU utilization, but as Intel Xeon processing power increases, I expect NVMe fabric patterns to follow the path iSCSI users have gone down. In the early days of iSCSI, iSCSI HBAs and TCP offload engine cards were needed. But as CPU power increased, users soon discovered the software initiators built into their OSes and hypervisors provided plenty of performance while using just a small fraction of the server's CPU. Today, many NICs and converged network adapters offer iSCSI and TCP offloads, but when I ask around, I can't find a single user.
The Converged Ethernet in RoCE comes from data center bridging (DCB), which came about to enable FC over Ethernet by eliminating dropped packets from Ethernet. I view the DCB switch setup as analogous to jumbo frames in an iSCSI SAN. Both require the network admin to set a few parameters for every port to get better performance and will get flaky when there are one or two devices or ports not properly configured. Service providers with network automation can deal with that easily, while system admins who have to wait two weeks for the network team to configure the two ports for a new server will probably decide to just go TCP.
NVMe-oF is much more than just a faster replacement for SANs. The NVMe fabric you choose will do more than increase speed. Fabric-attached JBOFs, especially hard drive JBOFs, such as Western Digital's OpenFlex D3000, are about composability and the flexibility it creates, too, but that's another story.