Problems with using SSDs for write caching and how to avoid them

Marc Staimer looks at using SSDs for write caching and explains why most storage SSD caching is write-through and not write-back.

Ever wonder why most storage SSD write caching is write-through and not write-back? They're similar except write-back cache accelerates reads and writes whereas write-through only accelerates reads. Then why aren't the majority of SSD caches write-back? That's the $64,000 question.

Write-through cache only speeds up or accelerates reads -- it does nothing for writes and places them directly on HDDs, hence the term "write-through." It moves data into the SSD cache from the HDDs centered on policies such as age of the data, frequency of access, time since last access, access requests from different sources, and more. Using clever algorithms helps determine which data is hot and should or should not be in the SSD cache. Data can also be pinned in the cache (arbitrarily placed there and it does not move out) until an administrator deliberately deletes the data from the cache.

Write-back cache will speed up or accelerate both writes and reads. The writes are written directly into the SSD cache and are acknowledged upon completion of the write to the SSD. The cached data is then copied to the HDDs. That cached data stays in the cache based on similar policies as write-through cache (age, frequency of access, time since last access, etc.) When policy thresholds are met, the cached data is then removed from the cache.

Looks like SSD write-back cache should be a no-brainer except that it's not and more the exception than the rule. To understand why requires a bit of background.

HDDs do very well at sequential data writes. Their sequential write performance matches SSDs and in some cases is a bit better. SSDs have a huge performance advantage in random write IOPS (essential for virtual servers and virtual desktops). But this means it only enhances some writes and not all writes, reducing its value for write caching.

SSDs today are based on flash technology a.k.a. NAND chips, a technology that has a limited lifecycle. Bits are written to cells. (SLC is one bit per cell, MLC and eMLC are two bits per cell.) When data is to be erased or overwritten, a layer of material must be destroyed. There are a limited number of erase write cycles per cell (a.k.a. write cycles). SLC has the most with approximately 100,000, followed by eMLC with approximately 40,000, and MLC with fewer than 10,000. Write-back caching greatly increases the number of writes on the SSDs, accelerating and reducing the life cycle of a relatively expensive asset.

Working around the SSD write caching problems

One way to get the benefits of write-back caching without SSD wearing out too soon is to use DRAM and/or NVRAM. DRAM and NVRAM are 20 times faster than the fastest SSD, meaning 20 times faster write response times. It takes sophisticated software available on some server-based caching software and a few hybrid storage systems. The DRAM/NVRAM is the target for the write-back caching taking ownership and acknowledgement of the writes. The data is then immediately drained (moved) to the HDDs. The SSDs are used for write-through caching where the policy based hot data is pulled into the SSDs from the HDDs. This tag-team approach provides the best of both worlds.

The problem with this workaround is DRAM's volatility. In other words, how is the data in the DRAM protected if there is an unexpected power outage before the data has drained to the HDDs? That problem can be solved by using NVRAM with a super capacitor that provides time for the NVRAM to drain to either the HDDs or SSDs. It can also be solved with battery backup to the DRAM that provides time for the DRAM to drain to the HDDs or SSDs.

SSDs have made write caching more cost-effective. Write-back caching and write-through caching are both excellent ways to increase user and application read performance. Write-back caching can also accelerate write performance but at the expense of a greatly reduced SSD life. Integrating in DRAM and/or NVRAM can overcome this problem and even further improve write performance.

About the author:
Marc Staimer is the founder, senior analyst and CDS of Dragon Slayer Consulting in Beaverton, Ore. Marc can be reached at [email protected].

Next Steps

Cache vs. RAM: Differences between the two memory types

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center
and ESG