This content is part of the Essential Guide: NVMe storage know-how for an easy and optimized transition

How to prevent swap usage from hurting your NVMe devices

Overuse of swap files can cut into the life span of your organization's NVMe-based drives. Find out how to monitor and limit swapping on your Linux systems.

NVMe is the next generation of solid-state storage, improving the performance of SSDs and other storage devices on the network. As more servers are being set up with NVMe, it's time to look at using swap to get the most out of these systems.

Before exploring swap usage on NVMe disks, let's make sure we understand what swap is and why its use was originally avoided on SSDs. Then, we'll explore what that means for using swap on top of NVMe.

The need for swap

Swap is like fire insurance. It's the thing you never want to use but must always be available. On systems that behave the way you want them to, there should be enough memory to load programs and cache data. If your system has enough RAM, you won't have problems with swap.

Swap comes into play when a system doesn't have enough RAM. A dedicated, specific area on the hard disk acts as additional RAM, serving as the swap space.

There are different reasons why a Linux system might need swap space. Maybe, it's short RAM. Or it may be because some server-grade applications, such as Oracle Database or SAP, require a certain amount of swap space. Apart from that, there are exceptional cases where, for instance, your application server has a memory leak. In that case, you risk running out of RAM, having applications stop working and unhappy customers.

Swap and SSD

Swap usage on traditional hard disk devices with rotating platters has never been a problem. The situation is different though if you're using SSDs with flash RAM cells that have a limited life span. Every write on a flash cell wears the memory cell, and at some point, it will stop working.

enterprise nvme SSD unit shipments

The expected life of an SSD device is expressed in the terabytes written (TBW) value. This value expresses how many bytes are expected to be written before an SSD fails. The TBW value for an SSD device is typically mentioned in the specs as the endurance parameter. The TBW value for a disk depends on its quality. Low-end consumer disks may start failing at as low as 20 TBW, while enterprise server-grade SSDs typically can go more than 1,000 TBW. Limiting the use of swap keeps the number of writes down on an SSD.

Swap usage on NVMe

The main difference between SSD devices and NVMe devices is the way they connect to the system bus. At a physical level, both device types are using flash cells to write data, which means there's no fundamental difference regarding TBW between SSD and NVMe. But what does that mean for how swap is used on NVMe devices? It means that the rules that apply to non-NVMe SSDs also applies to NVMe SSDs.

Let's be clear: If you're short on memory and swap is actively being used all the time, you would be better off putting it on an HDD drive. But that's not the case for a typical Linux system. Most servers and workstations that use Linux do fine with the physical RAM that's installed.

If your system is only swapping out inactive memory pages, there's nothing to worry about and no reason to avoid swap usage on either SSD or NVMe drives.

However, if the Linux kernel on such systems starts swapping out memory pages to disk, it's important to look at whether active or inactive memory pages are being swapped. An inactive memory page is one that has been allocated once and has never been used since. If your system is only swapping out inactive memory pages, there's nothing to worry about and no reason to avoid swap usage on either SSD or NVMe drives. That's because inactive memory pages typically stay where they are after being swapped out, which means that the amount of data that's written to swap is much lower when only inactive pages are swapped out.

The best way to find out if your system is using swap actively or not is by running the vmstat command, which provides a systemwide view of performance, including processes, swap usage, memory, paging and CPU activity.

For instance, if you run the vmstat 2 100 command, it will show 100 times at 2 second intervals how actively swap is being used on your system. Look at the SI and SO columns for Swap In and Swap Out results. The first line of vmstat output is the long-term summary and can be ignored. Monitor what's happening in the rest of these columns as the command is producing output. If you don't see any significant activity, then there's nothing to worry about. If you do see significant activity, your SSD device is slowly wearing out, and you'd be better off looking into adding more RAM to your system.

Dig Deeper on Flash memory and storage

Disaster Recovery
Data Backup
Data Center
and ESG