May, 2013
http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-707.html
The source code is available on Bitbucket:
https://bitbucket.org/jimplank/sd_codes.
This manual is part of that repository, in Manual.html.
.techreport p:13:ossd2 author J. S. Plank title Open Source Encoder and Decoder for SD Erasure Codes - Revision 2.0 institution University of Tennessee month May year 2013 number UT-CS-13-707 where http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-707.html @TECHREPORT{p:13:ossd2, author = "J. S. Plank", title = "Open Source Encoder and Decoder for {SD} Erasure Codes - Revision 2.0", institution = "University of Tennessee", month = "May", year = "2013", number = "UT-CS-13-707", where = "http://web.eecs.utk.edu/~jplank/plank/papers/CS-13-707.html" }
The library itself is protected by the New BSD License.
In the follow-on work, the parity check matrix is specified by two sets of m+s numbers, which we label X and Y. When a coefficient is used in column j of the parity check matrix, its value is:
where division is integer division and "%" is the modulo operator. The construction in the FAST paper may be expressed using this formulation as follows. Every value in GF(2w) is equal to 2z for some value of z between 0 and 2w-2. Let ai in the FAST formulation be equal to 2zi. Then the sets X and Y are defined so that xi = yi = zi.
In revision 1.0 of this code, the encoder/decoder took the ai coefficients on the command line and constructed the code according to the FAST formulation. In revision 2.0, it takes the name of a file holding the parity check matrix, and I have two programs that generate parity check matrices:
The second and third programs are fast_construction.c and general_construction.c. These generate parity check matrices for sd_codec.c.
The fourth program is generate_random_data_file.c, which generates a random input file for sd_codec. These input files need to be in a specific format. One uses sd_codec to encode the data in the input file.
The last program is erase_data.c. When you call it on the encoded file, it will erase data, so that sd_codec may decode it.
Finally, there are two text files: FAST-Coefficients.txt contains the ai for FAST constructions from [Plank-Blaum-Hafner-2013]. These are valid SD code coefficients for all values of n ≤ 24, m ≤ 3, s ≤ 3 and r ≤ 24.
TOS-Coefficients.txt contains coefficients from [Plank-Blaum-2013]. There are many more constructions here for smaller values of w.
Finally, there are some text files that we use in examples. They start with "example."
Parameter | Description |
n | The total number of disks |
m | The number of disks devoted to coding |
s | The number of extra blocks devoted to coding |
r | The number of blocks per stripe on each disk (rows) |
w | The symbol (word) size for the Galois Field |
size | The number of bytes in a block |
H | A (mr+s) by (nr) parity check matrix. |
Sd_codec, generate_random_data_file and erase_data create and manipulate files in the following format. The files contain n*r*size words, which are bytes in hexadecimal. The first size words define the bytes in row 0 of disk 0. The next size words define the bytes in row 0 of disk 1. And so on.
To be precise, word i (zero indexed) in a file is byte (i % size) in row (i/(size*n)) of disk (i/size) % n.
For example, the file example_6_2_2_4_data.txt contains data for an SD code with n=6, m=2, s=2, r=4, size=8. Thus, there are 24 blocks on 6 disks, four blocks per disk, eight bytes per block. This file is formatted with eight words per line, so each block is on a separate line:
UNIX> awk '{ printf "%s Block %2d: Disk %d, Row %d\n", $0, n, n%6, n/6; n++ }' < example_6_2_2_4_data.txt 6e c0 0a 91 d0 f2 69 ae Block 0: Disk 0, Row 0 ba b4 48 3b 0b 5c ea 66 Block 1: Disk 1, Row 0 83 81 c8 15 52 b9 29 7d Block 2: Disk 2, Row 0 3a 7d 5a d4 63 29 8b 1e Block 3: Disk 3, Row 0 00 00 00 00 00 00 00 00 Block 4: Disk 4, Row 0 00 00 00 00 00 00 00 00 Block 5: Disk 5, Row 0 bb 20 46 24 07 0d 8c 84 Block 6: Disk 0, Row 1 8b 44 8e e1 52 46 05 3a Block 7: Disk 1, Row 1 1b 63 2e 59 e0 7a 66 62 Block 8: Disk 2, Row 1 04 dc 5d 8b 18 4f fa ea Block 9: Disk 3, Row 1 00 00 00 00 00 00 00 00 Block 10: Disk 4, Row 1 00 00 00 00 00 00 00 00 Block 11: Disk 5, Row 1 4a 2e cb 69 6a ae 27 d7 Block 12: Disk 0, Row 2 9a 3b e3 19 92 01 ad 7e Block 13: Disk 1, Row 2 8d 86 08 30 d4 74 e4 c5 Block 14: Disk 2, Row 2 e5 75 d9 f5 40 37 00 67 Block 15: Disk 3, Row 2 00 00 00 00 00 00 00 00 Block 16: Disk 4, Row 2 00 00 00 00 00 00 00 00 Block 17: Disk 5, Row 2 cd 8f 6e a4 ed b9 4e 2d Block 18: Disk 0, Row 3 1b bd 9a a7 3f ee 76 34 Block 19: Disk 1, Row 3 00 00 00 00 00 00 00 00 Block 20: Disk 2, Row 3 00 00 00 00 00 00 00 00 Block 21: Disk 3, Row 3 00 00 00 00 00 00 00 00 Block 22: Disk 4, Row 3 00 00 00 00 00 00 00 00 Block 23: Disk 5, Row 3 UNIX>
generate_random_data_file n m s r blocksize seed |
It then generates a random data file in the above format. The seed is a seed to srand48(). If set to -1, srand48() is seeed by time(0). The data file represents an unencoded SD codeword. Thus, all of the blocks on disks n-m through n-1 contain zeros. Additionally, of the remaining blocks, the s blocks with the highest numbers also contain zeros.
The file above (example_6_2_2_4_data.txt) was created with:
UNIX> generate_random_data_file 6 2 2 4 8 0 > example_6_2_2_4_data.txtAs such, the blocks on disks 4 and 5 are all zero, as are blocks 20 and 21.
fast_construction n m s r w a0 ... am+s-1 |
The parameters are as follows:
UNIX> fast_construction 6 2 2 4 8 1 42 26 61 > example_fast_pcm.txt UNIX> cat example_fast_pcm.txt 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 42 48 179 105 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 122 248 8 77 157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 241 111 224 223 247 147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 82 156 219 95 83 1 26 89 185 145 38 59 36 15 150 96 169 44 223 100 193 85 1 26 89 185 145 38 59 1 61 56 241 41 59 182 97 53 205 44 242 110 115 184 26 120 10 143 173 36 7 179 168 UNIX>
general_construction takes the following parameters:
general_construction n m s r w x0 y0 x1 y1 .. x{m+s-1} y{m+s-1} |
The parameters are the same as fast_construction, only now you must specify the elements of the X and Y sets. It then creates the parity check matrix using the methodology outlined in [Plank-Blaum-2013]. The elements of X and Y may be negative -- they will be converted to a value between 0 and 2w-2.
As a first example, we can create the same parity check matrix as the FAST example by noting that in GF(28), 42 = 2142, 26 = 2105 and 61 = 2228. So:
UNIX> general_construction 6 2 2 4 8 0 0 142 142 105 105 228 228 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 42 48 179 105 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 127 122 248 8 77 157 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 241 111 224 223 247 147 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 64 82 156 219 95 83 1 26 89 185 145 38 59 36 15 150 96 169 44 223 100 193 85 1 26 89 185 145 38 59 1 61 56 241 41 59 182 97 53 205 44 242 110 115 184 26 120 10 143 173 36 7 179 168 UNIX>As a second example, in [Plank-Blaum-2013], we give a more general construction for codes where m=2 and s=2. These are: X = { 0, 0, 3, 2 } and Y = { 0, 1, -1, 2 }:
UNIX> general_construction 6 2 2 4 8 0 0 0 1 3 -1 2 2 > example_general_pcm.txt UNIX> cat example_general_pcm.txt 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8 16 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8 16 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8 16 32 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 4 8 16 32 1 142 71 173 216 108 45 152 76 38 19 135 37 156 78 39 157 192 80 40 20 10 5 140 1 4 16 64 29 116 205 19 76 45 180 234 143 6 24 96 157 78 37 148 106 181 238 159 UNIX>
sd_codec n m s r w size IO(0|1) PCM Input Output d0 .. dm-1 s0 .. ss-1 |
The parameters are as follows:
sd_codec erases the mr+s blocks specified by di and si, and then decodes them using the SD code defined by the parity check matrix. It prints the entire stripe to the Output file in the same format as as generate_random_data.
For example, we can use sd_codec to encode the data from example_6_2_2_4_data.txt by specifying that disks 4 and 5, plus blocks 20 and 21 have been erased. We will do this twice -- once for example_fast_pcm.txt and once for example_general_pcm.txt. They will, of course, generate different coding data, but each will be an SD code:
UNIX> sd_codec 6 2 2 4 8 8 1 example_fast_pcm.txt example_6_2_2_4_data.txt example_6_2_2_4_encoded_fast.txt 4 5 20 21 Timer: 0.000051 UNIX> cat example_6_2_2_4_encoded_fast.txt 6e c0 0a 91 d0 f2 69 ae ba b4 48 3b 0b 5c ea 66 83 81 c8 15 52 b9 29 7d 3a 7d 5a d4 63 29 8b 1e f2 77 9e 08 c7 97 0a b2 9f ff 4e 63 2d a9 2b 19 bb 20 46 24 07 0d 8c 84 8b 44 8e e1 52 46 05 3a 1b 63 2e 59 e0 7a 66 62 04 dc 5d 8b 18 4f fa ea 06 5d c5 3f 32 b6 8d 9e 29 86 7e 28 9f c8 98 a8 4a 2e cb 69 6a ae 27 d7 9a 3b e3 19 92 01 ad 7e 8d 86 08 30 d4 74 e4 c5 e5 75 d9 f5 40 37 00 67 24 55 5c f1 99 dc ab 40 9c b3 a5 44 f5 30 c5 4b cd 8f 6e a4 ed b9 4e 2d 1b bd 9a a7 3f ee 76 34 23 44 a9 cb 78 49 d9 11 36 b4 b8 c2 f9 14 71 ac 7d 83 b2 ac 66 c6 6a 5b be 41 57 a6 35 cc fa ff UNIX> sd_codec 6 2 2 4 8 8 1 example_general_pcm.txt example_6_2_2_4_data.txt example_6_2_2_4_encoded_general.txt 4 5 20 21 Timer: 0.000064 UNIX> cat example_6_2_2_4_encoded_general.txt 6e c0 0a 91 d0 f2 69 ae ba b4 48 3b 0b 5c ea 66 83 81 c8 15 52 b9 29 7d 3a 7d 5a d4 63 29 8b 1e ff 3a 78 b5 06 93 b1 6d 92 b2 a8 de ec ad 90 c6 bb 20 46 24 07 0d 8c 84 8b 44 8e e1 52 46 05 3a 1b 63 2e 59 e0 7a 66 62 04 dc 5d 8b 18 4f fa ea 83 70 28 cb 04 f8 1e 8b ac ab 93 dc a9 86 0b bd 4a 2e cb 69 6a ae 27 d7 9a 3b e3 19 92 01 ad 7e 8d 86 08 30 d4 74 e4 c5 e5 75 d9 f5 40 37 00 67 89 3f f1 2f 6b 7a aa 1c 31 d9 08 9a 07 96 c4 17 cd 8f 6e a4 ed b9 4e 2d 1b bd 9a a7 3f ee 76 34 9c 27 79 ba 93 44 d5 46 4a 06 46 bf 6f eb 6e a2 52 23 b6 cb 2c 7a 57 6e 52 30 7d cd 02 82 d4 93 UNIX>You'll note that in each example, the original data blocks are unchanged. However, the coding blocks have been calculated according to the SD codes.
erase_data n m s r blocksize d0 .. dm-1 s0 .. ss-1 |
It reads a file in the proper format on standard input, and erases all of the blocks on disks d0 to dm-1, plus blocks s0 to ss-1. In the commands below, we erase disks 0 and 2 plus blocks 1 and 10 from the two encodded examples.
UNIX> erase_data 6 2 2 4 8 0 2 1 10 < example_6_2_2_4_encoded_fast.txt > example_6_2_2_4_erased_fast.txt UNIX> cat example_6_2_2_4_erased_fast.txt 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3a 7d 5a d4 63 29 8b 1e f2 77 9e 08 c7 97 0a b2 9f ff 4e 63 2d a9 2b 19 00 00 00 00 00 00 00 00 8b 44 8e e1 52 46 05 3a 00 00 00 00 00 00 00 00 04 dc 5d 8b 18 4f fa ea 00 00 00 00 00 00 00 00 29 86 7e 28 9f c8 98 a8 00 00 00 00 00 00 00 00 9a 3b e3 19 92 01 ad 7e 00 00 00 00 00 00 00 00 e5 75 d9 f5 40 37 00 67 24 55 5c f1 99 dc ab 40 9c b3 a5 44 f5 30 c5 4b 00 00 00 00 00 00 00 00 1b bd 9a a7 3f ee 76 34 00 00 00 00 00 00 00 00 36 b4 b8 c2 f9 14 71 ac 7d 83 b2 ac 66 c6 6a 5b be 41 57 a6 35 cc fa ff UNIX> erase_data 6 2 2 4 8 0 2 1 10 < example_6_2_2_4_encoded_general.txt > example_6_2_2_4_erased_general.txt UNIX> cat example_6_2_2_4_erased_general.txt 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 3a 7d 5a d4 63 29 8b 1e ff 3a 78 b5 06 93 b1 6d 92 b2 a8 de ec ad 90 c6 00 00 00 00 00 00 00 00 8b 44 8e e1 52 46 05 3a 00 00 00 00 00 00 00 00 04 dc 5d 8b 18 4f fa ea 00 00 00 00 00 00 00 00 ac ab 93 dc a9 86 0b bd 00 00 00 00 00 00 00 00 9a 3b e3 19 92 01 ad 7e 00 00 00 00 00 00 00 00 e5 75 d9 f5 40 37 00 67 89 3f f1 2f 6b 7a aa 1c 31 d9 08 9a 07 96 c4 17 00 00 00 00 00 00 00 00 1b bd 9a a7 3f ee 76 34 00 00 00 00 00 00 00 00 4a 06 46 bf 6f eb 6e a2 52 23 b6 cb 2c 7a 57 6e 52 30 7d cd 02 82 d4 93 UNIX>We can now use sd_codec to decode. We'll do so in both cases to demonstrate that the decoded data is identical to the encoded data!
UNIX> sd_codec 6 2 2 4 8 8 1 example_fast_pcm.txt example_6_2_2_4_erased_fast.txt example_6_2_2_4_decoded_fast.txt 0 2 1 10 Timer: 0.000071 UNIX> diff example_6_2_2_4_encoded_fast.txt example_6_2_2_4_decoded_fast.txt UNIX> sd_codec 6 2 2 4 8 8 1 example_general_pcm.txt example_6_2_2_4_erased_general.txt example_6_2_2_4_decoded_general.txt 0 2 1 10 Timer: 0.000082 UNIX> diff example_6_2_2_4_encoded_general.txt example_6_2_2_4_decoded_general.txt UNIX>
UNIX> sd_codec 6 2 2 4 8 1048576 0 example_general_pcm.txt - - 4 5 20 21 Timer: 0.030266 UNIX>Since there are 14 MB of data in this stripe (24 total blocks, 10 devoted to coding), the performance of coding is 14/0.031875 = 439 MB/s (this is on my 2.4 GHz Macbook Pro with iTunes, Safari, Photoshop and OpenOffice tooling along in the background).
As mentioned above, sd_codec uses the alternate mapping of words to memory for GF(216) and GF(32). I would recommend using the slower, default versions of the Galois Fields when you go about tweaking this code for yourselves (just call gf_init_easy(&gfm, w) for all values of w). That, or read the "GF Complete" documentation on the alternate memory layouts.
If we found valid coefficients for w=8, those are in the file. Otherwise, if we found valid coefficients for w=16, those are in the file. Otherwise, we show the coefficients for w=32. Each line contains a different set of parameters in the following format:
UNIX> grep "^16 3 2 9" FAST-Coefficients.txt 16 3 2 9 16 1 3561 27596 44295 42660 UNIX>That shows you that you must use GF(216), and the 5 coefficients are 1, 3561, 27596, 44295 and 42660.
UNIX> grep ' 8 1 3 4' TOS-Coefficients.txt 8 1 3 4 8 0 0 0 30 15 225 135 240 8 1 3 4 16 0 0 24480 29835 28560 17850 32640 35700 8 1 3 4 32 0 0 1 1 2 2 3 3 UNIX>And here is an example of using general_construction to generate a parity check matrix for the first of those codes:
UNIX> general_construction 8 1 3 4 8 0 0 0 30 15 225 135 240 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 96 185 223 59 85 150 89 1 96 185 223 59 85 150 89 1 96 185 223 59 85 150 89 1 96 185 223 59 85 150 89 1 36 100 145 169 26 15 193 59 223 185 96 1 36 100 145 44 89 150 85 59 223 185 96 26 15 193 38 44 89 150 85 1 44 36 89 100 150 145 85 185 193 96 38 1 44 36 89 59 26 223 15 185 193 96 38 150 145 85 169 59 26 223 15 UNIX>As we discover more SD codes, we will update TOS-Coefficients on bitbucket.
Four related codes that have different fault-tolerance properties but address similar target areas are PMDS codes [Blaum-Hafner-Hetzler-2012], LRC codes [Huang-et-al-2012], Pyramid Codes [Huang-Chen-Li-2013], and Intradisk Redundancy [Dholakia-et-al-2008].
This material is based upon work supported by the National Science Foundation under grant CSR-1016636 and its REU supplements. In particular, Will, Adam and Andrew were all funded by REU's for their part in this work.
Finally, the I would like to acknowledge the support of IBM's Faculty Award program, which granted me an award in December, 2012.