IT professionals tend to see the march of technology as monotonic and progressive -- always moving in one direction toward greater improvement, however that's measured. The impression is generally valid: No one makes CRT TVs, rotary landline phones or cassette recorders anymore because there's no market for them, given the vastly superior replacement technologies.
However, sometimes there's room for multiple generations of the same underlying technology because of secondary differences like performance, reliability, durability, lifespan and cost. Such is the case with NAND flash memory, where the progression from single-level cell (SLC) to multi-level cell (MLC), triple-level cell (TLC) and, now, quad-level cell (QLC) storage technology has left enough performance gaps between them to allow space for the different forms of NAND flash in the modern data center.
The fundamental trade-off between performance and capacity is discussed in this earlier article; here, our focus is on the two highest-density technologies, QLC vs. TLC.
You might think that QLC, with four bits per cell, is an evolutionary extension that increases flash memory density by 33% and could completely replace TLC, with three bits per cell, for high-capacity SSD uses. However, TLC technology has improved in terms of durability and performance to create roles for both types of NAND. Read on to find out more about the QLC vs. TLC debate and why you might want both in your storage systems.
This article is part of
Basic differences between QLC and TLC
Buyers of flash memory immediately face a choice of the underlying storage technology. Yes, all use semiconductor memory cells instead of ferromagnetic polarization to store bits, but unlike hard disks, flash memory devices can hold one, two, three or four bits per storage cell. The more bits per cell, the higher the storage capacity per chip. Given the benefits of multibit cells and technology advances that have mitigated most of its drawbacks, TLC and QLC dominate the consumer market and are rapidly displacing SLC for all but the most stringent workloads in enterprise storage systems.
Let's dive a little deeper into the differences between TLC and QLC. Flash memory cells use electrical charge trapped on an insulated plate or layer to modulate the current flow in a transistor. The two most common structures are a floating gate and charge trap cells, which, in both cases, surround a storage layer -- either conducting polysilicon in the case of floating gate or an insulating silicon nitride in the case of charge trap with an insulting layer to isolate stored electrons. In each, the amount of charge stored on the floating gate affects the voltage that must be applied to a control gate to allow current to flow across the transistor channel, i.e., to turn the transistor on. In an SLC, there is only a single threshold voltage and the transistor is either on or off.
Instead of a single threshold voltage, a TLC has seven (plus the "zero" state) to allow storing three-bits per cell, i.e.:
Similarly, a QLC can hold 16 charge states on its floating gate, corresponding to 16 threshold voltage levels that map to four-bits per cell: 0000, 0001, 0010, … 1111. Given the tiny supply voltages -- typically, 3.3 V -- involved, going from TLC to QLC effectively halves the threshold difference between bit levels, e.g., from about 470 mV to 220 mV, which significantly increases exposure to noise, process variances and chip defects.
As mentioned in the previous article, the essential difference between TLC and QLC is a tradeoff between performance and capacity. In particular, QLC provides:
- Lower cost per bit, i.e., cost per unit of capacity.
- Higher density, i.e., more capacity in the same physical footprint.
- Fewer flash devices per system of the same capacity, which can lower total cost of ownership.
In contrast, TLC offers:
- More writes per cell (endurance); although, due to the density difference, QLC drives usually spec a higher number of total bytes written.
- Higher performance, particularly for small block sizes.
- Slightly better reliability; although, the gap is narrowing due to improved error correction algorithms and the ability of QLC devices to sacrifice a greater number of cells for redundancy without significantly degrading overall capacity.
The capacity advantage of QLC is so compelling that many all-flash array (AFA) vendors have adapted their controller software and cache design to mitigate, if not entirely eliminate, its drawbacks compared to TLC and SLC. Thus, don't be surprised to see a growing number of enterprise AFAs use QLC as the medium of choice.
The throughput vs. endurance tradeoff
Memory technology seldom gives you something for nothing; pushing the boundaries in one direction usually sacrifices features and performance in another dimension. The evolution of NAND flash memory cell technology has resulted in the tradeoff of higher density through packing more bits per cell for slower I/O throughput, higher read latency and lower endurance.
The tradeoff between throughput and endurance versus capacity and cost is the reason some storage systems still use SLC devices. The durability of SLC devices makes them ideal for write-intensive transaction processing workloads. However, the new classes of applications in machine learning, big data analytics and streaming media involve an increasing number of workloads that predominantly read data, rather than write it, minimizing the importance of flash durability.
QLC vs. TLC: The trend toward fewer writes per day
According to research firm Forward Insights, fewer than 20% of SSDs sold in 2018 were spec'd at more than one drive write per day (DWPD). By 2023, estimates are that 85% of drives sold will be the low-duration models spec'd at one or less DWPD. A DWPD measures the total amount of data written to a drive, in proportion to its total capacity, and is used to specify guaranteed drive endurance of five years. A 1 TB drive spec'd at one DWPD can sustain an average of 1 TB of data writes every day for five years.
The trend toward fewer writes per day bolsters the case for QLC drives. They have the shortest endurance of any current flash device because of tight tolerances on the charge levels for each bit state in a memory cell along with tighter spacings and thinner insulating gates in leading-edge flash manufacturing processes.
You must be careful when comparing DWPD specs across device types, because it's a relative measure that's a function of total drive capacity. As Micron Technology pointed out, a 960 GB TLC SSD rated at 1 DWPD has similar overall endurance to a 1.92 TB QLC SSD rated at 0.5 DWPD for a particular workload. Although the QLC DPWD spec is lower, the total amount of data that can be written per day is the same for the two devices. Thus, for workloads that almost exclusively read data, the QLC device is a better choice because of its increased capacity.
How QLC and TLC complement each other
The QLC market is focused on read-dominant workloads; it's not trying to displace TLC devices, but rather to replace HDDs. This is Micron's justification for continuing to manufacture both types of devices. Micron has made a compelling argument that TLC and QLC are complementary, with QLC filling the gaps between TLC flash and HDD magnetic disks. Indeed, Micron makes the case that because SSDs don't wear when reading data, whereas HDDs do, QLC endurance is superior to HDDs for workloads where the data write pattern includes a high percentage of large sequential transfers.
We can more precisely delineate the workload characteristics best suited to QLC vs. TLC drives by looking closely at the I/O patterns of various applications. According to Micron modeling, QLC drives are best for read-intensive workloads until the read/write mix reaches about 70-to-30 for small or random data transfers, with a 50-50 ratio for applications with large sequential writes. Conversely, TLC drives are better for write-dominant workloads, except for the minority of heavily transactional applications that might require an SLC drive that can handle 5 or 10 DWPD.
Fortuitously, many of the enterprise applications experiencing the fastest growth and highest uptake have a preponderance of data reads compared to writes. These include:
- data analytics using data lakes or distributed big data applications like Hadoop;
- AI applications using machine or deep learning;
- NoSQL databases;
- large object stores using Ceph, Gluster, Luster and others; and
- streaming media and content delivery networks.
System design matters, too
The design of QLC devices, which use larger memory blocks than TLC, suits I/O with large-block sequential transfers, but not small random I/O typical of many databases.
A way storage systems can work around this limitation is by coupling an array of QLC SSDs with a non-volatile DIMM (NVDIMM) write buffer that performs write coalescing. Small random writes are buffered until enough are accumulated to fill a data block and then the system sequentially writes them through to a disk volume or file system as a single transfer. The NVDIMM cache buffer can be battery- or capacitor-backed dynamic RAM modules, non-volatile Optane persistent memory modules, or even a high-durability SLC or TLC NVMe drive.
Both Microsoft and Western Digital have detailed an approach for using QLC with NVMe-zoned namespaces. Enterprise storage systems using QLC are available, such as the FlashArray//C by Pure Storage, which was released in 2019.
With a sizable advantage in capacity and lower cost per bit, expect QLC devices to proliferate in enterprise data centers alongside TLC and SLC devices. Storage systems will intelligently manage workload placement according to the I/O characteristics and requirements of each application.