Conventional SSDs involve internal management but have their share of complexities to overcome.
A write into flash memory is anything but straightforward. Data can only be written into an area that has been erased -- there's no overwriting old data without first erasing the block it resides in. If that's not bad enough, flash has a penchant to wear out with more than a certain number of writes, depending on the type of flash. For example, quad-level cell flash can start experiencing bit losses after only a couple of hundred erase/write cycles.
The SSD controller can cleverly hide the effects of both the erase-before-write requirement and the wear-out mechanism, but the techniques to hide them create a data-juggling nightmare for the SSD controller's designer. This can lead to unexpected timing issues for SSD users who are unaware of flash's need to erase before writing and penchant to wear out. So, proper SSD management becomes paramount.
4 best practices to manage conventional SSDs
Some straightforward methods help users deal with the intricacies of conventional SSD management.
1. Keep your SSDs hungry for data
Latency issues raise their head when SSDs get full.
A new SSD and one that has undergone a heavy write workload for a couple of years may appear on the console to have the same amount of free space. However, important differences might need a deeper investigation through self-monitoring, analysis and reporting technology (SMART) attributes.
How does a system administrator figure out what's happening within the SSD? How is it wearing? How much overprovisioning is left? These questions and others are answered by the SSD's SMART attributes.
Although no standard dictates how SMART attributes are organized within an SSD, most SSDs offer their own mix that includes answers to the questions about wear and overprovisioning. The system administrator can monitor them. The lack of standards, though, makes it unappealing to use many different vendors' SSDs at the same time.
2. Don't put your SSD into a critical timing path
Garbage collection is a slow process that causes the SSD to hang up until the routine is complete. In extreme cases, these delays may last several seconds.
An SSD can slip into an occasional extended delay during garbage collection. As a result, time-sensitive routines should not be configured in a way that allows an SSD delay to interfere with their performance.
While it's not difficult for the SSD's controller to find an erased page and remap the write to that page -- rather than the one requested by the application -- the situation gets increasingly tricky when the SSD doesn't have much spare room. In certain cases, the number of available erased pages isn't enough to accept a write command. A garbage collection routine needs to run through the flash to copy a number of valid pages from mostly invalid blocks to a single new block to free the mostly invalid blocks for erasure.
3. Use the trim command
Trim is a way to give permission for a block to be erased. Otherwise, the SSD must second-guess, and pages that would gladly be erased for the benefit of system timing might be completely ignored to prematurely slow the SSD's performance.
SSDs must keep track of overwritten pages or look at other clues to figure out which pages are no longer valid. The SSD doesn't have a natural mechanism to understand what the server is thinking, so it basically guesses. A data loss is unacceptable, so the algorithms for guessing err on the side of caution. Many pages that contain invalidated data might be overlooked and thus become unavailable.
To overcome this problem, the ATA command set added a command called trim, through which applications and programs can help make the internal SSD management, particularly garbage collection, more efficient. Garbage collection no longer needs to be performed as often as it was before trim existed. Older software that doesn't use the trim command puts an unnecessary burden on an SSD that causes slowdowns.
4. Check the SSD's SMART attributes periodically and consider overprovisioning
The timing could be every time you do a backup. The SMART attributes tell you how the SSD is wearing and how much overprovisioning is still available.
One strategy to speed up SSD performance is to maintain a hidden reserve of flash within the SSD. This overprovisioning provides more room for the controller to perform SSD management. In other words, the controller has less difficulty finding erased pages to write into. The more overprovisioned NAND in an SSD, the faster that SSD performs under heavy workloads.
The trouble with overprovisioning is that it increases the SSD's cost. On average, about 80% of the cost of an SSD is the NAND flash chips. If you increase the overprovisioning by 10% of the SSD's stated capacity, you increase that SSD's price by almost as much. Overprovisioning works for users who can afford it.
Overprovisioning also hides from the user how badly the SSD might be worn out by heavy write loads. If you don't know what your write load is and if you don't check the SSD's internal SMART attributes, then a sudden failure could catch you by surprise.
Jim Handy is a semiconductor and SSD analyst at Objective Analysis in Los Gatos, Calif.