IT professionals tend to see the march of technology as monotonic and progressive -- always moving in one direction toward greater improvement, however that's measured. The impression is generally valid: No one makes picture tube TVs, rotary landline phones or cassette recorders anymore because there's no market for them, given the vastly superior replacement technologies.
However, sometimes, there's room for multiple generations of the same underlying technology because of secondary differences, like performance, reliability, durability, life span and cost. Such is the case with NAND flash memory, where the progression from single-level cell (SLC) to multi-level cell (MLC), triple-level cell (TLC), quad-level cell (QLC) and emerging penta-level cell (PLC) storage technology has left enough performance gaps between them to allow space for the different forms of NAND flash in the modern data center.
This article discusses the fundamental tradeoff between performance and capacity; here, our focus is on the two current highest-density technologies: QLC vs. TLC. PLC is still in its formative stages of development and is too early for commercial adoption.
You might think that QLC, with 4 bits per cell, is an evolutionary extension that increases flash memory density by 33% and could completely replace TLC, with 3 bits per cell, for high-capacity SSD uses. However, TLC technology has improved in terms of durability and performance to create roles for both types of NAND. Read on to find out more about the QLC vs. TLC debate and why you might want both in your storage systems.
This article is part of
Basic differences between QLC and TLC
Buyers of flash memory immediately face a choice of the underlying storage technology. Yes, all use semiconductor memory cells instead of ferromagnetic polarization to store bits, but unlike hard disks, flash memory devices can hold 1, 2, 3, 4 and, soon, 5 bits per storage cell. The more bits per cell, the higher the storage capacity per flash chip. Given the benefits of multibit cells and technology advances that have mitigated most drawbacks, TLC and QLC dominate the consumer market and are rapidly displacing SLC for all but the most write-demanding workloads in enterprise storage systems.
Let's dive a little deeper into the differences between TLC and QLC. Flash memory cells use electrical charge trapped on an insulated plate or layer to modulate the current flow in a transistor. The two most common structures are a floating gate and charge trap cells, which, in both cases, surround a storage layer -- either conducting polysilicon in the case of a floating gate or an insulating silicon nitride in the case of charge trap with an insulating layer to isolate stored electrons. In each, the amount of charge stored on the floating gate affects the voltage that must be applied to a control gate to enable current to flow across the transistor channel, i.e., to turn the transistor on. In SLC, there is only a single threshold voltage, and the transistor is either on or off.
Instead of a single threshold voltage, TLC has seven -- plus the zero state -- to enable storing 3 bits per cell, i.e.:
- 000 (lowest voltage state).
- 111 (highest voltage state).
Similarly, QLC can hold 16 unique charge states on its floating gate, corresponding to 16 threshold voltage levels that map to 4 bits per cell: 0000, 0001, 0010, … 1111. Given the tiny supply voltages -- typically, 3.3 volts -- involved, going from TLC to QLC effectively halves the threshold difference between bit levels, e.g., from about 470 millivolts to 220 mV, which significantly increases exposure to noise, process variances and chip defects. In effect, QLC chips are harder to manufacture and far more sensitive to circuit noise and errors.
The essential difference between TLC and QLC is a tradeoff between performance and capacity. In particular, QLC provides the following:
- Lower cost per bit, i.e., cost per unit of capacity.
- Higher density, i.e., more storage capacity in the same physical footprint.
- Fewer flash devices per system of the same capacity, which can lower TCO.
In contrast, TLC offers the following:
- More writes per cell (write endurance) -- although, due to the density difference, QLC drives usually spec a higher number of total bytes written.
- Higher performance, particularly for small block sizes.
- Slightly better reliability -- although, the gap is narrowing due to improved error correction algorithms and the ability of QLC devices to sacrifice a greater number of cells for redundancy without significantly degrading overall capacity.
The capacity advantage of QLC is so compelling that many all-flash array (AFA) vendors have adapted their controller software and cache design to mitigate, if not entirely eliminate, its drawbacks compared to TLC and SLC. Thus, don't be surprised to see a growing number of enterprise AFAs use QLC as the medium of choice. For example, systems such as Dell PowerStore spec NVMe TLC SSDs, while other systems such as PureStorage FlashArray//C spec all-QLC arrays.
The throughput vs. endurance tradeoff
Memory technology seldom gives you something for nothing; pushing the boundaries in one direction usually sacrifices features and performance in another dimension. The evolution of NAND flash memory cell technology has resulted in the tradeoff of higher density through packing more bits per cell for slower I/O throughput, higher read latency and lower endurance.
The tradeoff between throughput and endurance versus capacity and cost is the reason some storage systems still use SLC devices. The durability of SLC devices makes them ideal for write-intensive transaction processing workloads. However, the new classes of applications in machine learning, big data analytics and streaming media involve an increasing number of workloads that predominantly read data, rather than write it, minimizing the importance of flash durability. The storage cells of NVM technologies aren't deteriorated by reading, only during writes where each cell must be erased and then rewritten with new or changed data. This process causes stress on the underlying electronic elements of the flash cells involved, gradually shortening their working life with each flash erase/rewrite (write) cycle.
QLC vs. TLC: The trend toward fewer writes per day
According to research firm Forward Insights, fewer than 20% of SSDs sold in 2022 were spec'd at more than 1 drive write per day (DWPD). Beyond 2023, estimates are that over 85% of drives sold will be the low-duration models spec'd at 1 or less DWPD. DWPD measures the total amount of data written to a drive, in proportion to its total capacity, and is used to specify guaranteed drive endurance of five years.
In other words, DWPD measures the number of times a user or system can overwrite the drive's entire capacity each day of its expected working life. For example, consider a 200 GB SSD with a five-year lifetime (expressed by its warranty). If DWPD is 1, that means users can write 200 GB onto the drive every day for five years. This works out to 365 TB of cumulative writes to the flash drive before it may need replacement.
If DWPD is less than 1, the write endurance is less, and less data can be written each day. If DWPD is more than 1, the endurance is more, and more data can be written each day. For example, if the same 200 GB flash SSD had DWPD of 4, the cumulative write endurance would be 1,460 TB of cumulative writes to the flash drive before it may need replacement.
The trend toward fewer writes per day bolsters the case for QLC drives. They have the shortest endurance of any current flash device because of tight tolerances on the charge levels for each bit state in a memory cell, along with tighter spacings and thinner insulating gates in leading-edge flash manufacturing processes. PLC flash has even lower endurance but is not yet in commercial storage devices.
Compare DWPD specs across device types carefully because it's a relative measure that's a function of total drive capacity. As Micron Technology pointed out, a 960 GB TLC SSD rated at 1 DWPD has similar overall endurance to a 1.92 TB QLC SSD rated at 0.5 DWPD for a particular workload. Although the QLC DPWD spec is lower, the total amount of data that can be written per day is the same for the two devices. Thus, for workloads that almost exclusively read data, the QLC device is a better choice because of its increased capacity.
The underlying message is simple: Understand the application that the flash storage is intended to support, and select the non-volatile storage technology with the performance, capacity and endurance that's appropriate for the needs of the application.
How QLC and TLC complement each other
The QLC market is focused on read-dominant workloads; it's not trying to displace TLC -- or even SLC -- devices, but rather to replace HDDs. This is Micron's justification for continuing to manufacture both types of devices. Micron has made a compelling argument that TLC and QLC are complementary, with QLC filling the gaps between TLC flash and HDD magnetic disks. Indeed, Micron makes the case that, because SSDs don't wear when reading data, whereas HDDs do, QLC endurance is superior to HDDs for workloads where the data write pattern includes a high percentage of large sequential transfers.
We can more precisely delineate the workload characteristics best suited to QLC vs. TLC drives by looking closely at the I/O patterns of various applications. According to Micron modeling, QLC drives are best for read-intensive workloads until the read/write mix reaches about 70/30 for small or random data transfers, with a 50/50 ratio for applications with large sequential writes. Conversely, TLC flash drives are better for write-dominant workloads, except for the minority of heavily transactional applications that might require an SLC drive that can handle 5 or 10 DWPD.
These figures are likely to shift as PLC flash devices find commercial release and adoption -- especially where PLC devices currently experience write endurance in single digits. However, the overall argument that more sensitive flash devices are better suited to read-intensive workloads should still hold true.
These concepts also reinforce the notion of matching the storage technology to the application. Fortuitously, many of the enterprise applications experiencing the fastest growth and highest uptake, such as data analytics and machine learning, have a preponderance of data reads compared to writes.
QLC vs. TLC system design
The design of QLC devices, which use larger memory blocks than TLC, suits I/O with large-block sequential transfers but not small, random I/O typical of many databases.
One way that storage systems can work around this limitation is by coupling an array of QLC SSDs with a non-volatile DIMM (NVDIMM) write buffer that performs write coalescing. Small random writes are buffered until enough are accumulated to fill a data block, and then the system sequentially writes them through to a disk volume or file system as a single transfer. The NVDIMM cache buffer can be battery- or capacitor-backed DRAM modules, non-volatile Optane persistent memory modules, or even a high-durability SLC or TLC NVMe drive.
Both Microsoft and Western Digital have detailed an approach for using QLC with NVMe Zoned Namespaces. Enterprise storage systems using QLC are available, with Pure Storage supporting the FlashArray//C since 2019.
With a sizable advantage in capacity and lower cost per bit, expect QLC devices to continue proliferating in enterprise data centers alongside TLC and SLC devices. Storage systems will increasingly rely on management software to intelligently manage workload placement according to the I/O characteristics and requirements of each application.
Best uses for QLC SSDs and TLC SSDs
Ultimately, QLC NVM devices, such as SSDs, are best used for the most read-intensive applications demanding the highest and most cost-efficient storage capacity. These include the following:
- Data analytics using data lakes or distributed big data applications, such as Hadoop.
- AI applications using machine or deep learning.
- NoSQL databases.
- Large object stores using Ceph, Gluster, Lustre and others.
- Streaming media and content delivery networks.
TLC and earlier flash SSDs, such as SLC, along with other NVM-based storage devices, are often best suited for applications with a larger proportion of writes to reads. These devices service more traditional enterprise applications as HDD replacements. Use examples include the following:
- General databases.
- HR and finance applications.
- CRM and ERP applications.
- Software development tasks.
TLC and QLC devices aren't mutually exclusive, and drives based on each technology can both be deployed in the same enterprise -- even within the same servers -- to best support the unique needs of varied enterprise applications. In addition, NVM technologies constantly evolve to improve fabrication techniques and enhance reliability and durability, such as error correction and wear-leveling practices. PLC flash and flash-based devices are on the horizon. So, this shouldn't be the final word on NVM uses.