Series SSD: 6. The test methodology

Test Details

The test suite is written in C, and runs on a Windows 7 system. The SSDs are connected to ICH SATA ports on the test machine. The SSDs have the custom microcode loaded prior to testing. The manufacturer’s bad block table is retrieved prior to updating the microcode. This table is then loaded into the test suite such that the bad blocks are not used during testing.
Data is grouped into stripes of erase blocks according to the logical block mapping on the device. Most of the 3xnm devices tested here have a stripe size of 16 erase blocks.
An oven with a temperature controller is used to keep the devices within +- 0.5C for the duration of the tests. The devices are fully operational in the oven, – the SATA and power cables are connected via a feed through (OK, they pass through an opening in the oven). The oven temperature is measured with a precision probe, sampled and recorded on each SSD IO operation. A low-level driver was written in C by an esteemed colleague who is much better at this level coding than I am. This driver recognizes the device, and issues erase, write and raw read commands to the device.

Test Suite

The test suite keeps a history file for each device which tracks the cycle count and test ownership for each erase block stripe. This prevents testing conflicts, as only one test may write to an erase block stripe. It also facilitates restarting tests following an interruption.
The test setup is adjusted using an initialization file. The test state is recording in a status file, which is updated on each IO operation to facilitate restarts. Data is recorded in a number of files to facilitate analysis. Since the written data patterns are known for each page, it is possible to perform a bit-by-bit compare with the written data and get the exact error patterns, I should note that the raw read command in the microcode only returns the data portion of the page. The reserved area, including the ECC check bits, is not returned. Thus the error count should be viewed as a minimum for a sector. Errors in the reserved area can’t be measured. However, the bit error rates should be sufficiently accurate.

Written Data Patterns

The data is written in units of pages. All pages in an erase stripe are written in LBA sequential order, which results in physical page sequential order within each erase block. I also feel it is important to test the accuracy of the device microcode (and the test code I wrote), so each page starts with meta data that includes a unique fingerprint for the page.
The meta data occupies the first 256 bits of each page written. The 256 bits are encoded as a pair of (128,50,28) BCH code words. This code will correct up to 13 bit errors and detect a 14th out of the 128 bits. (While it might have been better to use a 256 bit BCH code, I happened to have already written a 128 encoder/decoder, so it was more expedient.)
The metadata stored:

Table 1. Page meta data details
bits information
0..7 meta-data type values – rest of table is for meta-data type 1
8..39 Page block address – supports up to 4G pages, which is 16TB with 4kB pages
40..55 Cycle count number – supports up to 64K cycles, well beyond consumer flash ability
56..91 Write time stamp in units of 0.1s (real time)
92..99 Data pattern index – tells what type of data is written in the page

The meta data is also stored in the status file and in the active test state, so that the values can be confirmed on read. This allows for logging of meta data miscompares. We can’t determine whether such miscompares are due to device addressing errors, bit errors exceeding 14 or errors in the test code, so such occurrences are counted, but excluded from the results.

The test package currently supports several different data patterns:

Table 2. Page data patterns by index
Index Pattern
0 0000000000000000
1 1111111111111111
2 0101010101010101
3 0011001100110011
4 1111111011111111
5 0000000100000000
6 1111111001111111
7 0000000110000000
255 pseudo random, seed = ((write_time) << 8) || (LBA & 0xFF)

The data pattern index is stored in the meta data as shown in Table 1. The low numbered patterns were chosen to examine if they stress the system. The pseudo random pattern is used to more accurately represent real-world data. It is likely that many of the devices employ a data scrambler, in which case there wouldn’t appear to much pattern dependence. The pseudo random pattern seed is selected systematically as shown. The pseudo random value generator is not the standard C rand() function. I use one written by my colleague that has better characteristics.

To simplify test definitions, two modes are selectable at run time – all random and mixed. Almost all the tests were all random. Mixed is a preset mixture of pattern types as shown in Table 3. The list is 64 pages long, and repeats every 64 pages. About 75% of the patterns are random.

Table 3. Mixed data pattern configuration
Page #s Patterns
0..4 255, 255, 0, 255, 255
5..9 255, 255, 1, 255, 255
10..14 255, 255, 2, 255, 255
15..19 255, 255, 3, 255 ,255
20..24 255, 255, 4, 255, 255
25..29 255, 255, 5, 255, 255
30..35 255, 255, 6,  255, 255
40..45 255,  1, 4, 1, 255, 255
46..51 255,  0, 5, 0, 255, 255
52..57 255, 1, 6, 1, 255, 255
58..63 255, 0, 7, 0, 255, 255

When reading data, the meta data is first decoded, then the exact data pattern can be determined. Once this is done, a bit-for-bit compare can be performed on the data. Thus, an accurate error rate measurement can be obtained even when the number of bit errors exceeds the correction power of the device ECC (and we will show that this does happen).

Leave a Reply

Your email address will not be published.