Open Source Encoder and Decoder for SD Erasure Codes

James S. Plank

Technical Report UT-CS-13-704
EECS Department
University of Tennessee

January, 2013

http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-704.html

Please see the updated version of this on bitbucket: https://bitbucket.org/jimplank/sd_codes

The code here is the older version.

The source code is available in http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-704/sd_code.tar.

Citation Information

.techreport     p:13:ossd
author          J. S. Plank
title           Open Source Encoder and Decoder for SD Erasure Codes
institution     University of Tennessee
month           January
year            2013
number          UT-CS-13-704
where           http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-704.html

@TECHREPORT{p:13:ossd,
        author = "J. S. Plank",
        title = "Open Source Encoder and Decoder for {SD} Erasure Codes",
        institution = "University of Tennessee",
        month = "January",
        year = "2013",
        number = "UT-CS-13-704",
        where = "http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-704.html"
}

If You Use This Code

Please send me an email to let me know how it goes. Or send me an email just to let me know you are using the library. One of the ways in which I am evaluated both internally and externally is by the impact of my work, and if you have found this library and/or this document useful, I would like to be able to document it. Please send mail to plank@cs.utk.edu. Please send bug reports to that address as well.

The library itself is protected by the New BSD License.

Description of SD Codes

SD Codes are explained in [Plank-Blaum-Hafner-2013] (references at the end of this document). This user manual assumes that you have read that paper, and it adheres to the nomenclature of that paper.

What is in this directory

There are three programs in this directory. The main one is a program called sd_code.c, which performs both encoding and decoding using SD codes. The intent of this program is to be a starting point for others to implement and use SD codes in their own systems. It is not meant to be a general purpose SD encoder and decoder.

The second program is generate_random_data_file.c, which generates a random input file for sd_code. These input files need to be in a specific format. One uses sd_code to encode the data in the input file.

The last program is erase_data.c. When you call it on the encoded file, it will erase data, so that sd_code may decode it.

Finally, there is a text file called All-Coefficients.txt. This file contains valid SD code coefficients for all values of n ≤ 24, m ≤ 3, s ≤ 3 and r ≤ 24. More explanation is given below.

gf_complete.h and gf_complete.a

This code was developed using the GF-Complete library for Galois Field arithmetic. GF-Complete is no longer available (please see http://web.eecs.utk.edu/~jplank/plank/www/software.html), so to use this code, you will have to substitute another library for Galois Field arithmetic.

Parameters and file format for sd_code

SD codes are parameterized by the following values:

Parameter	Description
n	The total number of disks
m	The number of disks devoted to coding
s	The number of extra blocks devoted to coding
r	The number of blocks per stripe on each disk (rows)
w	The symbol (word) size for the Galois Field
size	The number of bytes in a block
a₀, a₁, ..., a_m+s-1,	m+s coding coefficients

All three programs in this directory create and manipulate files in the following format. The files contain n*r*size words, which are bytes in hexadecimal. The first size words define the bytes in row 0 of disk 0. The next size words define the bytes in row 0 of disk 1. And so on.

To be precise, word i (zero indexed) in a file is byte (i % size) in row (i/(size*n)) of disk (i/size) % n.

For example, the file example_6_2_2_4_data.txt contains data for an SD code with n=6, m=2, s=2, r=4, size=8. Thus, there are 24 blocks on 6 disks, four blocks per disk, eight bytes per block. This file is formatted with eight words per line, so each block is on a separate line:

UNIX> awk '{ printf "%s  Block %2d: Disk %d, Row %d\n", $0, n, n%6, n/6; n++ }' < example_6_2_2_4_data.txt
6e c0 0a 91 d0 f2 69 ae   Block  0: Disk 0, Row 0
ba b4 48 3b 0b 5c ea 66   Block  1: Disk 1, Row 0
83 81 c8 15 52 b9 29 7d   Block  2: Disk 2, Row 0
3a 7d 5a d4 63 29 8b 1e   Block  3: Disk 3, Row 0
00 00 00 00 00 00 00 00   Block  4: Disk 4, Row 0
00 00 00 00 00 00 00 00   Block  5: Disk 5, Row 0
bb 20 46 24 07 0d 8c 84   Block  6: Disk 0, Row 1
8b 44 8e e1 52 46 05 3a   Block  7: Disk 1, Row 1
1b 63 2e 59 e0 7a 66 62   Block  8: Disk 2, Row 1
04 dc 5d 8b 18 4f fa ea   Block  9: Disk 3, Row 1
00 00 00 00 00 00 00 00   Block 10: Disk 4, Row 1
00 00 00 00 00 00 00 00   Block 11: Disk 5, Row 1
4a 2e cb 69 6a ae 27 d7   Block 12: Disk 0, Row 2
9a 3b e3 19 92 01 ad 7e   Block 13: Disk 1, Row 2
8d 86 08 30 d4 74 e4 c5   Block 14: Disk 2, Row 2
e5 75 d9 f5 40 37 00 67   Block 15: Disk 3, Row 2
00 00 00 00 00 00 00 00   Block 16: Disk 4, Row 2
00 00 00 00 00 00 00 00   Block 17: Disk 5, Row 2
cd 8f 6e a4 ed b9 4e 2d   Block 18: Disk 0, Row 3
1b bd 9a a7 3f ee 76 34   Block 19: Disk 1, Row 3
00 00 00 00 00 00 00 00   Block 20: Disk 2, Row 3
00 00 00 00 00 00 00 00   Block 21: Disk 3, Row 3
00 00 00 00 00 00 00 00   Block 22: Disk 4, Row 3
00 00 00 00 00 00 00 00   Block 23: Disk 5, Row 3
UNIX>

Program #1: generate_random_data_file.c

The program generate_random_data_file has the following syntax:

generate_random_data_file n m s r blocksize seed

It then generates a random data file in the above format. The seed is a seed to srand48(). If set to -1, srand48() is seeed by time(0). The data file represents an unencoded SD codeword. Thus, all of the blocks on disks n-m through n-1 contain zeros. Additionally, of the remaining blocks, the s blocks with the highest numbers also contain zeros.

The file above (example_6_2_2_4_data.txt) was created with:

UNIX> generate_random_data_file 6 2 2 4 8 0 > example_6_2_2_4_data.txt

As such, the blocks on disks 4 and 5 are all zero, as are blocks 20 and 21.

Program #2: sd_code.c

The program sd_code takes quite a few command line arguments:

sd_code n m s r w size IO(0|1)  a₀ a₁ .. a_m+s-1 d₀ .. d_m-1 s₀ .. s_s-1

The parameters n, m, s, r, and size are all the same as generate_random_data_file. w is the symbol size (the word size of the Galois Field). If IO equals 1, then it will read a file in the format specified above. If IO equals 0, then it will generate a random stripe with the given parameters (which is better for timing). The a_i parameters are the coding coefficients (more on these below). The d_i parameters specify erased disks, and the the s_i parameters specify erased sectors.

sd_code erases the mr+s blocks specified by the d_i and s_i, and then decodes them using the SD code defined by the coding coefficients. It prints the entire stripe on standard output in the same format as generate_random_data.

For example, we can use sd_code to encode the data from example_6_2_2_4_data.txt by specifying that disks 4 and 5, plus blocks 20 and 21 have been erased. We use 1, 42, 26 and 61 as the coefficients in GF(2⁸).

UNIX> sd_code 6 2 2 4 8 8  1   1 42 26 61   4 5   20 21 < example_6_2_2_4_data.txt > example_6_2_2_4_encoded.txt
Timer: 0.000060
UNIX> awk '{ printf "%s  Block %2d: Disk %d, Row %d\n", $0, n, n%6, n/6; n++ }' < example_6_2_2_4_encoded.txt
 6e c0 0a 91 d0 f2 69 ae  Block  0: Disk 0, Row 0
 ba b4 48 3b 0b 5c ea 66  Block  1: Disk 1, Row 0
 83 81 c8 15 52 b9 29 7d  Block  2: Disk 2, Row 0
 3a 7d 5a d4 63 29 8b 1e  Block  3: Disk 3, Row 0
 f2 77 9e 08 c7 97 0a b2  Block  4: Disk 4, Row 0
 9f ff 4e 63 2d a9 2b 19  Block  5: Disk 5, Row 0
 bb 20 46 24 07 0d 8c 84  Block  6: Disk 0, Row 1
 8b 44 8e e1 52 46 05 3a  Block  7: Disk 1, Row 1
 1b 63 2e 59 e0 7a 66 62  Block  8: Disk 2, Row 1
 04 dc 5d 8b 18 4f fa ea  Block  9: Disk 3, Row 1
 06 5d c5 3f 32 b6 8d 9e  Block 10: Disk 4, Row 1
 29 86 7e 28 9f c8 98 a8  Block 11: Disk 5, Row 1
 4a 2e cb 69 6a ae 27 d7  Block 12: Disk 0, Row 2
 9a 3b e3 19 92 01 ad 7e  Block 13: Disk 1, Row 2
 8d 86 08 30 d4 74 e4 c5  Block 14: Disk 2, Row 2
 e5 75 d9 f5 40 37 00 67  Block 15: Disk 3, Row 2
 24 55 5c f1 99 dc ab 40  Block 16: Disk 4, Row 2
 9c b3 a5 44 f5 30 c5 4b  Block 17: Disk 5, Row 2
 cd 8f 6e a4 ed b9 4e 2d  Block 18: Disk 0, Row 3
 1b bd 9a a7 3f ee 76 34  Block 19: Disk 1, Row 3
 23 44 a9 cb 78 49 d9 11  Block 20: Disk 2, Row 3
 36 b4 b8 c2 f9 14 71 ac  Block 21: Disk 3, Row 3
 7d 83 b2 ac 66 c6 6a 5b  Block 22: Disk 4, Row 3
 be 41 57 a6 35 cc fa ff  Block 23: Disk 5, Row 3
UNIX>

You'll note that the original data blocks are unchanged. However, now the coding blocks have been calculated according to the SD code.

sd_code only supports w = 8, 16 or 32.

Program #3: erase_data.c

The program erase_data takes the following command line arguments

erase_data n m s r blocksize d₀ .. d_m-1 s₀ .. s_s-1

It reads a file in the proper format on standard input, and erases all of the blocks on disks d₀ to d_m-1, plus blocks s₀ to s_s-1. For example, the command below erases disks 0 and 2 plus blocks 1 and 10 from the encoded example in example_6_2_2_4_encoded.txt:

UNIX> erase_data 6 2 2 4 8 0 2 1 10 < example_6_2_2_4_encoded.txt > example_6_2_2_4_erased.txt      UNIX> awk '{ printf "%s  Block %2d: Disk %d, Row %d\n", $0, n, n%6, n/6; n++ }' < example_6_2_2_4_erased.txt
 00 00 00 00 00 00 00 00  Block  0: Disk 0, Row 0
 00 00 00 00 00 00 00 00  Block  1: Disk 1, Row 0
 00 00 00 00 00 00 00 00  Block  2: Disk 2, Row 0
 3a 7d 5a d4 63 29 8b 1e  Block  3: Disk 3, Row 0
 f2 77 9e 08 c7 97 0a b2  Block  4: Disk 4, Row 0
 9f ff 4e 63 2d a9 2b 19  Block  5: Disk 5, Row 0
 00 00 00 00 00 00 00 00  Block  6: Disk 0, Row 1
 8b 44 8e e1 52 46 05 3a  Block  7: Disk 1, Row 1
 00 00 00 00 00 00 00 00  Block  8: Disk 2, Row 1
 04 dc 5d 8b 18 4f fa ea  Block  9: Disk 3, Row 1
 00 00 00 00 00 00 00 00  Block 10: Disk 4, Row 1
 29 86 7e 28 9f c8 98 a8  Block 11: Disk 5, Row 1
 00 00 00 00 00 00 00 00  Block 12: Disk 0, Row 2
 9a 3b e3 19 92 01 ad 7e  Block 13: Disk 1, Row 2
 00 00 00 00 00 00 00 00  Block 14: Disk 2, Row 2
 e5 75 d9 f5 40 37 00 67  Block 15: Disk 3, Row 2
 24 55 5c f1 99 dc ab 40  Block 16: Disk 4, Row 2
 9c b3 a5 44 f5 30 c5 4b  Block 17: Disk 5, Row 2
 00 00 00 00 00 00 00 00  Block 18: Disk 0, Row 3
 1b bd 9a a7 3f ee 76 34  Block 19: Disk 1, Row 3
 00 00 00 00 00 00 00 00  Block 20: Disk 2, Row 3
 36 b4 b8 c2 f9 14 71 ac  Block 21: Disk 3, Row 3
 7d 83 b2 ac 66 c6 6a 5b  Block 22: Disk 4, Row 3
 be 41 57 a6 35 cc fa ff  Block 23: Disk 5, Row 3
UNIX>

We can now use sd_code to decode from this scenario. The final diff statement will prove that decoding was successful!

UNIX> sd_code 6 2 2 4 8 8  1   1 42 26 61   0 2  1 10 < example_6_2_2_4_erased.txt > example_6_2_2_4_decoded.txt
Timer: 0.000047
UNIX> diff example_6_2_2_4_encoded.txt  example_6_2_2_4_decoded.txt 
UNIX>

Timing

You can set IO to one and set the size large to time performance. For example, the following encodes the above code with 1 MB blocks:

UNIX> sd_code 6 2 2 4 8 1048576 0   1 42 26 61   4 5   20 21
Timer: 0.031875
UNIX>

Since there are 14 MB of data in this stripe (24 total blocks, 10 devoted to coding), the performance of coding is 14/0.031875 = 439 MB/s (this is on my 2.4 GHz Macbook Pro with iTunes, Safari, Photoshop and OpenOffice tooling along in the background).

Caveats

You must specify m unique disk and s unique block erasures. This is the hard decoding case.

As mentioned above, sd_code uses the alternate mapping of words to memory for GF(2¹⁶) and GF(³²). I would recommend using the slower, default versions of the Galois Fields when you go about tweaking this code for yourselves (just call gf_init_easy(&gfm, w) for all values of w. That, or read the "GF Complete" documentation on the alternate memory layouts.

Coefficients of SD Codes

The file All-Coefficients.txt contains coefficients for SD codes for:

4 ≤ n ≤ 24,
1 ≤ m ≤ 3,
1 ≤ s ≤ 3,
4 ≤ r ≤ 24.

If we found valid coefficients for w=8, those are in the file. Otherwise, if we found valid coefficients for w=16, those are in the file. Otherwise, we show the coefficients for w=32. Each line contains a different set of parameters in the following format:

n, right justified to two characters.
A space.
m - one character.
A space.
s - one character.
A space.
r, right justified to two characters.
Four spaces.
w, right justified to two characters.
Three spaces.
The m+s values of a_i, each separated by a space.

Thus, for example, suppose you wanted to find the coefficients for n = 16, m=3, s=2 and r=9. You could do:

UNIX> grep "^16 3 2  9" All-Coefficients.txt 
16 3 2  9    16   1 3561 27596 44295 42660
UNIX>

That shows you that you must use GF(2¹⁶), and the 5 coefficients are 1, 3561, 27596, 44295 and 42660.

Additional Reading

The main recommended reading for this work is our 2013 FAST paper [Plank-Blaum-Hafner-2013]. I would suggest [Plank-Greenan-Miller-2013], which describes fast Galois Field arithmetic and the ``alternate mapping'' optimization.

Four related codes that have different fault-tolerance properties but address similar target areas are PMDS codes [Blaum-Hafner-Hetzler-2012], LRC codes [Huang-et-al-2012], Pyramid Codes [Huang-Chen-Li-2013], and Intradisk Redundancy [Dholakia-et-al-2008].

Acknowledgements

I would like to acknowledge the help of Mario Blaum and Jim Hafner, my able co-conspirators in the SD FAST paper. I would also like to acknowledge Kevin Greenan, Ethan Miller and Will Houston, my able co-conspirators in GF-Complete. Finally, I would like to acknowledge the help of Adam Disney and Andrew LaPrise, who fashioned a shotgun Monte Carlo search that took over our department's lab machines for a few weeks to discover many of these SD codes.

This material is based upon work supported by the National Science Foundation under grant CSR-1016636 and its REU supplements. In particular, Will, Adam and Andrew were all funded by REU's for their part in this work.

Finally, the I would like to acknowledge the support of IBM's Faculty Award program, which granted me an award in December, 2012.

References

M. Blaum, J. L. Hafner and S. Hetzler, "Partial-MDS Codes and Their Application to RAID Type of Architectures," IBM Research Report RJ10498 (ALM1202-001), February, 2012. (A version of this paper will appear in IEEE Transactions on Information Theory). This paper is heavy on theory, but uses the same encoding and decoding methodology as SD codes.
[Huang-et-al-2012] C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li and S. Yekhanin, "Erasure Coding in Windows Azure Storage," USENIX Annual Technical Conference, Boston, June, 2012.
[Huang-Chen-Li-2013] C. Huang, M. Chen and J. Li, "Pyramid Codes: Flexible Schemes to Trade Space for Access Efficiency in Reliable Data Storage Systems," ACM Transactions on Storage, 9(1), Feb, 2013.
[Dholakia-et-al-2008] A. Dholakia, E. Eleftheriou, X. Y. Hu, I. Iliadis, J. Menon and K. K. Rao, "A New Intra-disk Redundancy Scheme for High-Reliability RAID Storage Systems in the Presence of Unrecoverable Errors," ACM Transactions on Storage, 4(1), May, 2008, pp. 1-42.
[Plank-Blaum-Hafner-2013] J. S. Plank, M. Blaum and J. L. Hafner, "SD Codes: Erasure Codes Designed for How Storage Systems Really Fail," FAST-2013: 11th Usenix Conference on File and StorageTechnologies, San Jose, February, 2013. http://web.eecs.utk.edu/~jplank/plank/papers/FAST-2013-SD.html.
[Plank-Greenan-Miller-2013] J. S. Plank, K. M. Greenan and E. L. Miller, "Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions,"FAST-2013: 11th Usenix Conference on File and Storage Technologies, San Jose, February, 2013. http://web.eecs.utk.edu/~jplank/plank/papers/FAST-2013-GF.html.
[Plank-Greenan-Miller-Houston-2013] J. S. Plank, K. M. Greenan, E. L. Miller and W. B. Houston, "GF-Complete: A Comprehensive Open Source Library for Galois Field Arithmetic," Technical Report UT-CS-13-703, University of Tennessee, January, 2013. http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-703.html