Flash Temperature Testing and Modeling

Temperature Results and Summary

The measured time to failure behavior is not consistent with a 1.1eV activation energy Arrhenius model. In fact, the behavior is decidedly not Arrhenius with any activation energy.

Host Managed Interface

All the data here was obtained using SSDs supporting a host-managed interface (HMI). The direct block addressing and raw read capability allowed measurement of the ber beyond the ECC limits.

NAND Temperature Behavior is not Arrhenius

I’m not picking on JPL here — they are wonderful. I did some work at JPL as a graduate student, but not on spacecraft. I was creating F-centers in LiF crystals for an infrared laser using an electron-beam. It’s a bit like flash — I was shooting high energy electrons into an insulator, which caused damage (on purpose).

Consider the acceleration factors for high temperature of 100C and a low temperature of 40C. My measured data (and model) give a 13x acceleration factor. A 1.1eV Arrhenius model predicts 702x here — an error of 54x! (This is the magnitude of error that causes spacecraft to miss planets.) Even the 0.58eV effective activation energy model predicts an acceleration factor of 31x.

Recap of the “Arrh-ain’t-ious” model

$$a_f =
\left( e^{
\left( {
\beta ^ \gamma
\left( {
\left( {T2-\delta} \right) ^\gamma – \left( {T1-\delta} \right) ^\gamma
} \right)
}\right)}
\right)^{\frac{1}{\left(k+g\right)}}$$
Eqn. 5 recap

Possible Explanations

As mentioned above, SSD failures are caused by the weakest bits from a population, not the mean. My SSD test measures a very large population (10^8 bits per data point), whereas most cell-level tests are likely smaller sample sizes.

Another point, which I hinted at above, is that accelerated aging tests operate over very short time intervals. As I pointed out, the typical accelerated retention test might only run for few hours to emulate a years worth of retention. However, the temperature behavior shows that the acceleration factor at short data ages is different than at longer times. This is because the bit error rate vs. age is not exponential. This could be the source of the error, as the acceleration factors at short ages are larger than at longer ages. Thus, without longer, real-time tests for confirmation, the data can be deceiving.

Always Validate Acceleration Models!

I know it’s painful to perform real time retention tests, but it is critical to have an accurate model. It’s also best to test with a full device, and to have it perform all operations at temperature. This is to emulate the behavior in the field. After all, users aren’t operating on devices in systems, then putting them in ovens to age them. I can’t stress this point enough — an unverified model isn’t suitable for accelerated testing!!!

Leave a Reply

Your email address will not be published.