I have shown that the standard SD reliability specifications result in expected data loss rates per year that are significantly higher than HDDs. Here, speed really does kill. I recommend a much tighter NRRE specification for enterprise SSDs of 1 bit per 1018 bits transferred (UBER of 10-18 ), which should match the reliability of enterprise performance hard disks.
Given this analysis, it is clear we can’t just assume SSDs are more reliable than HDDs – we need to prove it. Thus we need to test!
A side effect of tightening the NRRE specification will be the increased difficulty of testing such behavior. For example, if we want test to produce around 100 events, we’d need to transfer 1020 bits (2×1016 sectors!). That works out to be a 2×108 unit-hour test. Even if we used 25,000 SSDs, such a test would take 1 year. As this is obviously prohibitively expensive, we need to find other test methods, such as acceleration, measuring error rate behavior or capturing field data. When we get to the testing section we’ll explore these options.
It may look like I am holding SSDs to a higher testing standard than hard disks. Given the dearth of published reliability data, I feel we must.
A note for all those looking at storage technologies potentially faster than flash — you’re going to need to meet even more stringent specifications than I have laid out here. (Just something to keep in mind.)
The trade press is full of SSD performance tests, but for some reason, they seem to take the reliability for granted. I would like to see some attention paid to reliability testing, because as I have shown we can’t just assume the reliability is sufficient. In future posts, I will show how to test SSDs, and present extensive test results. But before we dive into that, we’ll be taking a look at how the internal architecture of SSDs affects both reliability and testability.