SSD Wear Leveling and Reliability
In the prior post, we explored issues with SSD NRRE specifications and testing. In this post, we’ll take a look at how the internal architecture of SSDs affects reliability and the ability of integrators to test flash.
NAND flash has a few limitations that must be overcome to create a useful storage device. The memory cells have finite write endurance and finite data retention, and don’t support direct overwrite. SSD vendors use a technique called wear leveling to address these issues. The goal is to increase the useful lifetime of the device. However, according to the 2nd law of thermodynamics, there is no such thing as a free lunch. (OK, that’s not precisely what is says, but this result can probably be derived from the 2nd law.) Thus, we should expect side effects! We shall examine these here in detail.
A quick intro to flash data storage
NAND flash devices store data by tunneling charge across an insulating layer onto and off of a floating gate. As you have likely guessed, insulators by nature oppose charges crossing them, thus this process is likely to stress the materials. Even though the process is described as tunneling, these operations damage the insulator. Further, some of the charge can become trapped in the insulator. After some number of cycles of moving charge back and forth (programming and erasing), the insulator for a given bit cell will accumulate sufficient damage that it can no longer store data reliably. In general, the more a cell is cycled, the less time it takes for the bit value to change. MLC flash is more vulnerable to this effect than SLC flash, since it takes a smaller charge loss/gain to change the value of a bit. (TLC, or triple-level cell is even more sensitive in this regard.)
Aside on thermodynamics
To me, this problem of causing damage when changing the value of a memory cell is an issue for a broad class of storage technologies. Other than magnetic recording, most every non-volatile memory has a finite cycle life. This is related to the thermodynamic reversibility of the process. Flipping spins in magnetic recording is highly reversible. Moving charge on and off a capacitor (as in DRAM) is highly reversible, although volatile. The floating gate technology used in flash is not highly reversible. The same is true for phase change (volume changes can cause stress, temperature changes can cause migration, etc.), ferroelectric (volume changes), etc. While some of these technologies may have higher endurance and higher performance, one needs to look at the balance. As I mentioned previously, what matters is failures per unit time under a useful workload. So faster devices require higher endurance and a lower UBER. It’s tough to be non-volatile, fast and high endurance.
A further property of flash is that a data cell can’t be directly overwritten. That is, if a given cell has a ‘0’ stored in it, the value can’t be changed to ‘1’ by writing the location again. Instead, it must be erased first. The erase process is significantly slower than the write process, so NAND flash cells are organized into large erase blocks which are erased all at once. Thus, the number of bits erase per unit time can be increased to provide high throughput.
In SSDs, the term endurance is used to indicate the number of program-erase (P-E) cycles that a device can support before the bit error rate is too high. The term retention is used to indicate the length of time that data can safely be stored in a location before the bit error rate becomes too large.
Endurance is statistical
Contrary to what a spec sheet might lead you to believe, the endurance in a solid state storage device is statistical in nature. For example, a 32nm device might have a specified limit of 3,000 P-E cycles, but in reality there are no such hard limits. As mentioned above, each sector will have a number or program-erase cycles beyond which it can’t retain data longer than a certain time, thus experiencing a non-recoverable read error. The situation is that more bit cells in the sector have errors than the sector ECC can correct. (One can consider ECC beyond the sector level, but the principle is the same.) As we will discuss in the next post, there are further factors such as temperature which also contribute to the limit for a sector.
Figure 1 shows the cumulative probability of sector failure for an example device at some data age (called a CDF – or cumulative distribution function). At small cycle numbers, most sectors are fine. Eventually, the cycle count becomes so large that nearly all the sectors have failed.
The behavior is such that some sectors fail early – in Figure 1 beginning around 104 PE cycles, while some can last more than 100 times as long. These are indicated in figure 1 when you mouse over it. The concern for system reliability is the early failure behavior, not the long term survivors.
Consider a device with 512 data bytes per sector (4096 bits). Further, assume it has a BCH ECC capable of correcting 15 error bits, and requires 195 check bits. That gives a sector size of 4291 bits. If the target NRRE interval is 1015 bits, then the sector failure probability is 4.3×10 -12. So we would need to look at where the CDF crosses this value (which is not readily visible in Figure 1). Thus, it clearly is the early failures that dominate the reliability behavior.
Data age (retention)
One of the complexities of solid state storage is that the bit error rate is not a constant, but a function of many variables (we’ll get to these shortly). In addition to the P-E cycle count, the bit error rate depends on the age of the data (time since the cell was last programmed). Thus we expect the CDF to move to the left as the data ages (and potentially changes shape). This is shown in Figure 2 via a mouse over, where the red curve shows how the CDF might evolve with increasing data age. Thus, the useful P-E cycle count is reduced as the data ages.