Download this presentation: SPI NAND Host-Side Error Correction
00:00 Salman Rashid: Today, I will discuss trends in the SLC NAND market, especially the evolution of serial NAND. Macronix is a leader in the code storage, flash memory market, which includes NOR and NAND products. And serial NAND is an emerging product that is gaining acceptance in the embedded market.
Today, I will discuss how the serial interface, also known as SPI interface, in NOR flash memory has been widely used for many years and why it is getting popular in NAND products as well. With serial NAND, system designers can switch from NOR to NAND using the same hardware rather than completely redesigning their system for parallel NAND. As indicated by this cover slide, Macronix feels the future direction of applications that use serial NAND will eventually require the host to manage the ECC, not the flash memory, and this will be the main focus of my presentation.
01:06 SR: Okay, now that I've explained the benefits of a serial interface, let's discuss what is driving this design from NOR to NAND and the emerging serial NAND trends. The three factors that are driving these trends are, number one, cost. Due primarily to the string architecture, the NAND array is almost 60% smaller than a NOR array at the same process node, and flash can also benefit from the ability to shrink to lower geometries compared to NOR. This means that the NAND can support higher densities at a lower cost.
01:45 SR: Number two, performance. In today's designs, non-volatile memories are typically used to store code and during system boot-up, the code is downloaded to a RAM for execution. This is known as store and down application. Although NAND read performance is slower than NOR, it is sufficient for these store and download applications. However, there will be a performance penalty when using a serial NAND with on-chip ECC versus a serial NAND using host-based ECC.
Number three, ease of use. As discussed earlier, since serial NAND device is available in the same footprint as serial NOR, there's no PCB layout needed. Later in this presentation, I will also discuss the challenges of using serial NAND with on-chip ECC versus the flexibility of having a serial NAND with host-based ECC. In the next few slides, I'll explain each of these factors in more detail.
02:55 SR: Now, let's look at the serial NAND design and cost a bit closer. Serial NAND devices are based on a standard parallel NAND die with additional control logic to emulate the serial interface and handle ECC. This is typically achieved in one of two ways. Number one, by adding the control logic under NAND device itself, that is a monolithic die, which will increase the die size and the cost. Or number two, by stacking a controller chip on the NAND die, making it a multi-chip solution. Obviously, the multi-chip package has a cost adder, so it is generally more expensive than a monolithic solution.
However, in order to get the most cost-effective serial NAND solution, there is a third option. We can offload the ECC function from the NAND, just like a parallel NAND, and shift it to the host IC. A typical 8-bit ECC engine can be implemented with roughly 50K additional gates, and for an entry-level host, example, MCU or SoC, this is a trivial amount of gates. By adding the ECC engine on the host, the host cost may go up slightly, but it will be much lower than the die size impact to a NAND device with on-chip ECC.
04:29 SR: So far, I've focused your attention primarily on the cost benefits of serial NAND, now let's consider the performance aspects. I already mentioned many designs are based on a store and download architecture. What this means is the system cannot execute code, also known as XIP, directly from the flash, so it reads the code from the NAND into a DRAM and executes from DRAM.
The graph shown illustrates the read performance of different non-volatile memories. Assuming all the devices are running at a similar clock speed, NOR flash has the fastest read throughput of about 66 megabytes per second. The two bars on the right illustrate the read performance of NAND with on-chip ECC, that is the NAND device calculates the ECC before reading the data. This NAND architecture is the slowest, achieving roughly 27 megabytes per second or 60% lower than NOR. However, if the ECC calculation is performed by a host, this read speed can be increased to 56 megabytes per second, which is very similar to the NOR flash read performance.
05:50 SR: Now, I'm sure you're asking yourself, "Well, what about the ECC calculation latency from the host side?" In the next few slides, I will cover this and show the actual time it takes to read the data.
As discussed in the previous slide, there is a performance difference depending on how the ECC is calculated. On this slide, I will show the NAND read latency when no error is detected and compare the performance of a NAND IC with on-chip ECC versus a NAND, where the ECC function is handled by the host. The latency is calculated by adding the time to load the data into the buffer, perform the ECC, and read out the data.
06:41 SR: The diagram on the top represents the time to read data, where the ECC is handled by the host. In this case, it takes 25 microseconds to load 2K byte data into the NAND cache from the NAND array, and roughly 10 microseconds to read the first 512 bytes from NAND cache to the host and have the host check ECC. Therefore, the total time needed for the host to read the first 512 bytes and check ECC where there are no errors is 35 microseconds.
On the other hand, the diagram on the bottom shows the time to read 512 bytes of data by a NAND device with on-chip ECC. The NAND performs the read operation on a full 2K-byte page, so it must load a full page of data into the buffer and perform ECC for the entire page before reading the data out to the host. In this case, the latency with no errors detected is roughly 45 microseconds or almost 30% slower versus the example one. Please note that in these examples, we're comparing NAND device with 2K-byte page size.
08:06 SR: Now, let's compare the NAND performance when errors are detected. The diagram on the top shows the NAND latency with the host-based ECC. The latency is calculated by adding three sets of timing: The time to load 2K byte data into NAND cache from the NAND array, the time to read first 512 bytes from NAND cache to the host and check ECC, which is 10 microseconds, and the time it takes to correct the errors, another 10 microseconds. By adding these three operations, the first data will be ready to read in 45 microseconds.
The diagram on the bottom illustrates the latency for the NAND with on-chip ECC. As explained on the previous slide, serial NAND reads are performed on a full-page basis. So, to calculate ECC and any error corrections and load the 2K bytes of data, takes 70 microseconds before data can be read out. That's a 66% slower performance.
The key reason for such a huge difference between the two ECC calculation method is the amount of data the ECC engine must process. The NAND on-chip ECC engine must process the full page, while the host can process quarter page at a time. Plus, the slower clock speed of NAND versus the host may also contribute in the performance here.
09:48 SR: During NAND read or program operations, bit disturbs can occur. One additional benefit of host-based ECC is the ability to use stronger ECC and prolong the life of the NAND device. When using NAND with on-chip ECC, the device operates with a fixed level of ECC, for example, 8 bits error correction. However, if error correction is performed by the host, the host may have the capability to support multiple levels of ECC. By using stronger ECC, for example, 12 bits versus 8 bits, the NAND device would be able to tolerate higher number of bit disturbs before the data needs to be refreshed or rewritten. This means the NAND will perform fewer program erase cycles and ultimately experience a longer life.
10:47 SR: The two graphs on this slide demonstrate the improvements in read and program and erase cycles for a NAND device using 12-bit ECC versus the required 8-bit. The NAND device will see equal to or greater than 1.4 times the life expectancy if the ECC was increased to 12 bits.
In order to offload the ECC from the NAND, the host will have to implement the ECC engine. By doing this, the MCU will absorb some additional cost. A typical MCU may have approximately three million gates. A BCH 8-bit ECC engine requires approximately 50,000 gates to implement, which is an increase of roughly 1.7% in gate count. This die size impact on the MCU would be much smaller than the impact to the NAND IC if we were to add the ECC engine on the NAND.
11:54 SR: Also, historically, NOR flash did not require error correction. So, the host, for example, an MCU, did not need an ECC engine. But moving forward, some NOR products may be required to support error correction to meet ISO 26262 ASIL requirements. This means in the future we could see more hosts offering the ability to handle the ECC.
As mentioned earlier, parallel NAND also comes in two flavors, with on-chip ECC and without. However, the host-based ECC is the dominant methodology to handle error corrections for a parallel NAND. So short term, we see a need for NAND with on-chip ECC to support the migration from serial NOR to serial NAND, but as the applications evolve and the market for serial NAND matures, we see more and more demand for solutions with lower cost, better performance and longer life expectancy. The best way to support these requirements will be through a host with an ECC engine built in.
13:13 SR: One topic I have not touched so far is standards. In the parallel NAND market, most vendors support the ONFI standard. There is no such standard for serial NAND, which can lead to spec differences between different vendors. For example, each vendor may require 8-bit ECC error correction, but the spare area may have different portions or regions which are protected by the ECC.
So, it's up to the customer to design their firmware to accommodate these differences between multiple vendors. This is an unnecessary overhead and limits the flexibility to select different NAND vendors. However, if the ECC function is performed by the host, the application firmware does not have to consider these variations from each vendor. As long as the host provides the minimum error correction needed, it can support any vendor's serial NAND device.
14:16 SR: Another example of incompatibility between vendors is the ECC threshold status register bit definition. NAND vendors with on-chip ECC use these bits to determine when data needs to be refreshed and moved to a new block. This can cause headaches too, since each vendor may assign different registers for this function. Having host-based ECC eliminates this issue.
14:48 SR: So, in summary, as we have discussed during this presentation, there are two ways to implement ECC for serial NAND. Although on-chip ECC is an option, at Macronix, we feel host-based ECC offers a greater flexibility and advantage to our customers. Some of these advantages are as follows. Number one, lower system cost. There's a huge cost savings moving from serial NOR to serial NAND. However, in this presentation, I discuss the added cost savings from removing the on-chip ECC from the NAND die. The cost savings will more than offset the minor cost increase in the host IC.
Number two, increased performance. The NAND flash read and ECC operations are based on full 2K byte or 4K byte as in some cases, page size. Due to this, the read performance of a NAND device with on-chip ECC is significantly slower than NAND using host-based ECC, which requires a 512-byte truncate. On top of this, the host clock speed can be much faster, so the time to check and correct ECC will be much shorter.
16:10 SR: Number three, improved compatibility. Each NAND vendor may have different ECC coverage specifications when using on-chip ECC. This can make the firmware more complex for applications that require multi-sourcing. Having the ECC engine on the host eliminates this issue.
And lastly, extended life expectancy. If the host is capable of stronger ECC, for example, 12 bits versus 8-bit, the NAND device will rewrite data less frequently. This decrease in program and erase cycles can help to extend the life of the NAND.
16:53 SR: This presentation outlines Macronix's views on serial NAND and ECC implementation trends. I hope I have explained the advantages of having the host perform the ECC versus the serial NAND performing this function. Obviously, the host support needs to be developed, and this will take some time before it becomes the norm. In the meantime, Macronix offers both types of serial NAND products, that is with or without the ECC engine.
17:26 SR: Before I conclude this presentation, there is one suggestion I would like to make. We feel we are in the early stages of a new trend, that is moving from serial NAND with on-chip ECC to a serial NAND with host-based ECC. In order to support multiple serial NAND devices from different vendors, we recommend that the MCU or SoC disable serial NAND's internal ECC in the initial boot loader and directly process the error correction via the host ECC engine. This way, you can use serial NAND from multiple vendors with minimal efforts.
18:10 SR: I hope you found this presentation useful. It was my intent to share with you the emerging trends in the serial NAND market, so you're better equipped to select the right product for your designs. If you have any further questions, you can contact Macronix at any time. Our contact information is on this following slide. Thank you for your participation in this webinar.