Wear Leveling and device testing
Wear leveling also presents a significant impediment to device testing by integrators. Since wear leveling hides the underlying behavior behind a translation layer, it is impractical to test such devices for non-recoverable read errors. Using the standard read and write interfaces, it isn’t possible to determine which physical block is associated with a logical block, and this relationship can change at any time. When a logical block is written, it’s physical location is almost always changed, but this mapping can be changed even without external intervention. Wear leveling process move significant amounts of data in the background, and the methodology is not known outside the device. Thus, it’s not possible to determine the P-E cycle count for a logical block.
The situation is more complex than I’ve described here. An integrator not only needs to test how the underlying device behaves, but also to test the how any wear leveling behaves. This is really tricky, as it is hard to test how the device measures the CDF, what its access histograms look like, how it determines its end of life conditions, etc.
Measuring the UBER (or NRRE rate) is also difficult because the denominator (number of bits transferred) isn’t known. Further, since wear leveling spreads the P-E cycles across the device, a 1,000 device 1,000 hour test as described in chapter 3 is even less informative, since there isn’t time to cycle the blocks to high P-E count. Some manufacturers may provide a small partition separately wear leveled to aid in testing, but this isn’t the same thing as having a testable device (we still don’t know the actual P-E cycle counts, etc.).
There is one more side effect of wear leveling – it delays our ability to measure reliability from field data. While it’s beneficial for it to take years for a device to reach high total P-E cycle count, it means that it can take years before we find out there is an issue at high cycle count, such as imperfect detection of the onset of wear out. Thus, an integrator can experience a false sense of reliability during the early program life, but problems may not surface until the install base is large. Thus, integrators need the ability to test devices, and the reliability data to be publicly available.
SSDs and RAID
There is one further impact of wear leveling we should consider – the standard methods for computing RAID reliability are invalid for SSDs! There two key assumptions for computing RAID reliability:
- Failures are independent of time
- Failures are independent of each other
Sadly, both are false for wear-leveled devices. The first condition is that the failure rate is constant in time. As we have seen, the sector failure rates (NRRE or UBER) are time dependent. They depend on both the cycle count and the data age, both of which are themselves time dependent. The second condition is that there is no correlation between one failure and another failure. Wear leveling breaks this condition as well with current RAID systems. RAID systems are designed to spread the write workload evenly across devices to limit the impact of hot spots. Mirroring is the simplest case, where the write workload is identical across a pair of devices. Since the devices nominally have the same P-E endurance behavior, we expect a strong correlation when reaching the endurance limit. Looking at Figures 7 – 9, we can see that we expect the sector failures rates to correlate, thus violating condition 2. Further, this correlation exits within each device as well. That is, as usage increases, the probability of multiple simultaneous sector failures increases (we call these cluster failures). RAID is not very adept at protecting against cluster failures. Simply increasing the RAID power may not help much.
Thus, we are likely to need a new RAID architecture (or at least a more sophisticated analysis package). IBM has invented a new RAID architecture specifically designed to address these issues, but I’m not going to discuss this here yet. Besides,we need to understand the device behavior before we can appreciate the RAID design.
I find it ironic that enterprise system vendors would be unlikely to accept an HDD that can’t be tested for reliability, but they currently are willing to accept SSDs on such terms. My goal here is to lift the veil of secrecy surrounding SSD reliability so that everyone can make informed decisions. I feel it is important to achieve this before we suffer from a bout of SSD-related system failures. To that end, we will next explore what I have measured from flash devices.