All posts by Hetzler

Flash Memory Summit 2015

I have posted my presentations from the 2015 Flash Memory Summit in the Library.

In my Touch Rate talk I showed how Touch Rate can be applied to hybrid storage systems. As mentioned in the talk, Tom Coughlin and I will be working on a 2nd Touch Rate whitepaper covering this new material. I will post the new paper when it’s complete. This will include an updated version of the spreadsheet supporting hybrid (cached or tiered) systems.

Flash Temperature Testing and Modeling

Temperature Testing

This an expanded version of the temperature modeling section from my Flash Memory Summit 2014 Tutorial T1.

An accurate temperature model is vital for flash devices as most vendors rely on accelerate temperature testing to verify retention capabilities. I tested flash at the SSD level as this is how the devices are integrated in to storage systems. All tests were performed using devices supporting a host-managed interface. Continue reading

Flash Memory Summit 2014 Spreadsheet

I am posting the spreadsheet that accompanies the Flash Memory Summit 2014 Seminar A chart deck. This is a LibreOffice 4 spreadsheet. Note that it includes a macro that extends the range of the cumulative binomial distribution. Interestingly, LibreOffice suffers from the same lack of precision that Excel does. The macro is provided as-is, and is not guaranteed to work in all cases. [Translation: I haven’t had time to debug it and the error handling in LibreOffice basic is abysmal.]

The spreadsheet shows how to compute reliability for flash-based storage systems, and compares various RAID  architectures when using a DNR ECC approach (described in the chart deck).

Note: I have seen a lot of talk in the industry about how new systems use erasure codes instead of RAID. Technically, all RAID designs use erasure codes – even mirroring (that’s called a replication code). I use the term RAID loosely to describe a system utilizing an erasure code  to protect against data loss.

The PMDS codes described here are erasure codes designed to protect against simultaneous device loss and sector loss with high data efficiency. You can read about them here, here and here.

You can download the RAID Reliability Spreadsheet.

The spreadsheet is provide under GPLv2.

Flash Memory Summit 2014

FMS 2014

I will be at the 2014 Flash Memory Summit presenting in two tutorials.

The first is:  Pre-Conference Seminar A Making Error Correcting Codes Work for Flash Memory

The second is Tutorial T-1 Measuring Reliability in SSD Storage Systems.

I think you’ll find them both enlightening. In the first, I will be presenting on the benefits of optimizing error correction at the flash device level in concert with a RAID system. In the second, I will cover material not yet presented here on SSD reliability measurements and presenting an empirical temperature acceleration model for flash, derived from device-level measurements. Don’t miss it!

Site remodel

Scraping off the rust

It’s taken some time, but I have remodeled the site, hopefully it’s all working now.
I have moved the site — please note we are now at: http://smorgastor.drhetzler.com.
I have updated some of the prior posts to reflect new information and new data. There will be new content coming shortly. The long break since the last post is due to the remodel (and re-architecting the site), and the fact that this is an interesting big data problem. I have many TB of data to sift through and analyze, which has required a significant coding effort.

 

Series SSD: 7. Bit error rate – cycling data (endurance)

Program-Erase Cycling data (endurance)

Authors note: I am working diligently to get all the data together into a consumable format. Since this is turning out to be a time consuming process given the volume of data, I will be updating this post as I get the data ready. Once I get the data posted, then I’ll get back to the analysis.

Update March 2013

I finally got all the cycling data for the 3xnm class devices collated so you can preview it while I write up the analysis. As you can see, there is quite a bit of data. I have posted the data in high-res images galleries so you can see the details.

The Cycling Test

As described in chapter 6, the cycling tests are designed to rapidly reach a given Program-erase cycle count, at which point data aging tests are performed. Unless noted, the erase-to-write dwell time is 250ms if the data isn’t going to be read, and 360ms if it will be read. The write-to-read dwell time is 500ms.

The data is presented as raw bit error rate vs. program-erase cycle count.

70C raw cycling data

The following gallery shows the raw cycling data at 70C for a set of 3xnm devices. Cycle measurements here were taken out to 14k cycles. These device have a specified 3k program erase cycle limit, a 70C max operating temperature and 1 year retention.

The astute observer may have noticed some non-uniformity in the bit error noise characteristics. Namely, there are spikes in the bit error rate. I have decided to call these error fountains. I plan to devote a chapter to this phenomenon.

60C raw cycling data

The following gallery shows the raw cycling data at 60C for a set of 3xnm devices. Cycle measurements here were taken out to 20k cycles here. These device have a specified 3k program erase cycle limit, a 70C max operating temperature and 1 year retention.

I have run some of these tests to higher cycling counts than the 70C data.Again, the bit error noise characteristics aren’t always uniform. Some of the test parameters were adjusted in some of these tests. I’ll point these out where they show changes in behavior.

40C raw cycling data

The following gallery shows the raw cycling data at 40C for a set of 3xnm devices. Cycle measurements here were taken out to 20k cycles here. These device have a specified 3k program erase cycle limit, a 70C max operating temperature and 1 year retention.

I have run some of these tests to higher cycling counts than the 70C data. Again, the bit error noise characteristics aren’t always uniform. Some of the test parameters were adjusted in some of these tests. I’ll point these out where they show changes in behavior.

30C raw cycling data

The following gallery shows the raw cycling data at 30C (well, 28C if you want to get technical) for a set of 3xnm devices. Cycle measurements here were taken out to 14k cycles here. These device have a specified 3k program erase cycle limit, a 70C max operating temperature and 1 year retention.

Again, the bit error noise characteristics aren’t always uniform. Some of the test parameters were adjusted in some of these tests. I’ll point these out where they show changes in behavior.