Uber-CSHR and X-Sets: C++ Programs for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems

James S. Plank

December 14, 2010

University of Tennessee EECS Department Technical Report UT-CS-10-665.
http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html

This technical report contains the user's manual for two programs: Uber-CSHR and X-Sets. These are C++ programs whose intent is to minimize the number of XOR operations required when performing a binary matrix-vector product.

The source code for the two programs are in the following files:

Uber.cpp: Implementing Uber-CSHR. Version 1.0.
X-Sets.cpp: Implementing X-Sets. Version 1.0.
License.txt: License
index.html: This file.
cauchy-40.txt: An example matrix file -- the Cauchy matrix for the value 40 when w=6.
cauchy-10-6-8.txt: An example matrix file -- the Cauchy Reed-Solomon coding matrix for k=10, m=6 and w=8.
makefile: A 10-line makefile.

The C++ programs are standalone programs that should compile and run with any modern C++ compiler. However, since compilers do differ, if you run into warnings and errors, please let me know and I'll fix them.

This material is based upon work supported by the National Science Foundation under grants CNS-0615221 and CSR-1016636, and a research award from Google.

If you use these programs

Please send me an email to let me know how it goes. One of the ways in which I am evaluated both internally and externally is by the impact of my work, and if you have found this library and/or this document useful, I would like to be able to document it. Please send mail to plank@cs.utk.edu.

The library itself is protected by the New BSD License. It is free to use and modify within the bounds of the License. None of the techniques implemented in this library have been patented.

Citation Information

Plain Text:

.techreport     p:10:heur
author          J. S. Plank
title           {Uber-CSHR} and {X-Sets}: {C++} Programs for Optimizing Matrix-Based Erasure
                Codes for Fault-Tolerant Storage Systems
institution     University of Tennessee
month           December
year            2010
number          CS-10-665
where           http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html

Bibtex:

@TECHREPORT{p:10:heur,
        author = "J. S. Plank",
        title = "{Uber-CSHR} and {X-Sets}: {C++} Programs for Optimizing Matrix-Based Erasure
                Codes for Fault-Tolerant Storage Systems",
        institution = "University of Tennessee",
        month = "December",
        year = "2010",
        number = "CS-10-665",
        where = "http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html"
}

Reference Material

This manual assumes that you have read the following paper:

[Plank10]: J. S. Plank, C. D. Schuman and B. D. Robison, "Heuristics for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems," Technical Report CS-10-664, University of Tennessee, December, 2010. http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-664.html.

In that paper, the problem is stated, and the two algorithms are defined and evaluated. This manual simply explains how to use the programs that implement them. The two example matrix files (cauchy-40.txt and cauchy-10-6-8.txt) are the ones used in the paper.

Uber-CSHR

As detailed in [Plank10], Uber-CSHR has two parameters:

T|I: Whether target elements are used as starting points (T), or all intermediate elements are used as starting points (I).
L: How many combinations of starting points or intermediate elements are considered.

As such, Uber.cpp takes two command line arguments (examples from a Unix prompt):

UNIX> make Uber
g++ -o Uber -O3 Uber.cpp
UNIX> ./Uber
usage: Uber I|T L
UNIX>

The matrix is given on standard input. Each non-blank line is a row of the matrix, and may only contain zeros, ones and whitespace. Each row must have the same number of zeros and ones.

For example, when we run this on cauchy-40.txt, with T and L=1, we get the following output:

UNIX> ./Uber T 1 < cauchy-40.txt 
100000
010000
001000
000100
000010
000001
010100
011100
011110
001110
001111
101111
100111
010010
010011
101000
101001
17
UNIX>

This is a schedule in the same format as [Plank10]. The first c rows compose an identity matrix, and the remaining rows except for the last are elements that may be created as the XOR of two previous elements. The last line of output is the size of the schedule. The number of XORs required is the schedule size minus c.

This output is the same schedule as in [Plank10], which requires 11 XOR's. If we set T|I to I and L to two, we get an optimal schedule of 9 XOR's:

UNIX> ./Uber I 2 < cauchy-40.txt
100000
010000
001000
000100
000010
000001
010100
011100
011110
001110
001111
010011
101111
100111
101001
15
UNIX>

The running time of the heuristic is roughly X^L where X is the number of targets for T and the size of the schedule for I. As such, when matrices get large, the values of L become more limited. The following running times are on my janky old 2007 MacBook Pro with a 2.16 GHz Intel Core 2 Duo processor. The program has been written to keep the memory footprint very low, so you can bump up the value of L and let it run for a year if the feeling moves you.

UNIX> time sh -c "./Uber T 1 < cauchy-10-6-8.txt | tail -n 1"
1578
0.038u 0.017s 0:00.08 50.0%	0+0k 0+2io 3pf+0w
UNIX> time sh -c "./Uber T 2 < cauchy-10-6-8.txt | tail -n 1"
1425
0.045u 0.014s 0:00.05 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 3 < cauchy-10-6-8.txt | tail -n 1"
1316
0.209u 0.018s 0:00.21 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 4 < cauchy-10-6-8.txt | tail -n 1"
1282
2.041u 0.025s 0:02.06 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 5 < cauchy-10-6-8.txt | tail -n 1"
1255
17.992u 0.082s 0:18.11 99.7%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 1 < cauchy-10-6-8.txt | tail -n 1"
1465
0.040u 0.016s 0:00.04 125.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 2 < cauchy-10-6-8.txt | tail -n 1"
1220
6.570u 0.034s 0:06.60 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 3 < cauchy-10-6-8.txt | tail -n 1"
944
1048.757u 3.980s 17:36.31 99.6%	0+0k 0+0io 0pf+0w
UNIX>

X-Sets

The heuristics based on X-Sets are much more heavyweight than Uber-CSHR. Again, this manual defers to [Plank10] for explanation. The program has four command-line arguments:

Thresh -- when you are attempting to create new X-Sets for a given target, if a newly created X-Set's size is within thresh of the minimum sized X-Set for that target, then you may add the new one to the target's list. Obviously, smaller values of thresh will run faster than larger values, and they will have a lower memory footprint.
L -- as with Uber-CSHR, this defines how many combinations of intermediates one considers for starting points. Program running times are exponential in the value of L, so this impacts the running time much more than Thresh.
Give-Up-When -- this parameter is not described in [Plank10], nor is it used in the paper. However, when all potential sums have a weight of Give-Up-When or less, then the X-Sets program defaults to the MW selection technique. This can speed things up quite a bit with bigger matrices.
Technique. This is one of:
- MW: Choose any sum with maximum weight.
- MW_SS: Of all sums with maximum weight, choose one whose target is the closest to being complete.
- UBER_XSET: This reverses the above order -- of all targets that are closest to being complete, choose the sum with maximum weight. This is a hybrid of Uber-CSHR and X-Sets.
- MW_SMALLEST_SUM: Don't worry about this one -- it doesn't work very well.
- MW_MATCHING: Find a maximum matching of all sums with maximum weight, and choose any sum from that matching.
- MW_SQ: For all sums with maximum weight, perform one step of lookahead and choose the best sum.
- SUBEX: Huang's original SUBEX algorithm.
MW and SUBEX are the fastest. MW_SQ is the slowest. The rest are in the middle.
In terms of schedules produced, either UBER_XSET or MW_SQ produces the best. Usually, you'd like to try to set L to two.

As with Uber, you put the matrix on standard input. For output, it prints intermediate values of how many targets are left and what elements are generated. After that, it prints the schedule, and finally the size of the schedule. So, for example with the small Cauchy matrix, we get the following. This follows the example in Figure 10 of [Plank10].

UNIX> make X-Sets
g++ -o X-Sets -O3 X-Sets.cpp
UNIX> ./X-Sets 0 1 0 UBER_XSET < cauchy-40.txt
Targets Left  6.  New Element 010100 Edge Weight: 2 (TH 4.1)
Targets Left  5.  New Element 000011 Edge Weight: 3
Targets Left  5.  New Element 010011 Edge Weight: 1 (TH 4.1)
Targets Left  4.  New Element 000111 Edge Weight: 2
Targets Left  4.  New Element 001111 Edge Weight: 1 (TH 4.1)
Targets Left  3.  New Element 100111 Edge Weight: 1 (TH 4.1)
Targets Left  2.  New Element 001001 Edge Weight: 1
Targets Left  2.  New Element 101001 Edge Weight: 1 (TH 4.1)
Targets Left  1.  New Element 001010 Edge Weight: 1
Targets Left  1.  New Element 011110 Edge Weight: 1 (TH 4.1)
100000
010000
001000
000100
000010
000001
010100
000011
010011
000111
001111
100111
001001
101001
001010
011110
16
UNIX> ./X-Sets 0 2 0 UBER_XSET < cauchy-40.txt
Targets Left  6.  New Element 010100 Edge Weight: 2 (TH 4.1)
Targets Left  5.  New Element 000011 Edge Weight: 3
Targets Left  5.  New Element 010011 Edge Weight: 1 (TH 4.1)
Targets Left  4.  New Element 000111 Edge Weight: 2
Targets Left  4.  New Element 001111 Edge Weight: 1 (TH 4.1)
Targets Left  3.  New Element 100111 Edge Weight: 1 (TH 4.1)
Targets Left  2.  New Element 001110 Edge Weight: 2
Targets Left  2.  New Element 011110 Edge Weight: 1 (TH 4.1)
Targets Left  1.  New Element 101001 Edge Weight: 1 (TH 4.1)
100000
010000
001000
000100
000010
000001
010100
000011
010011
000111
001111
100111
001110
011110
101001
15
UNIX>

In terms of timings, this program is much slower than Uber, since it carries around quite a bit more baggage. Again on my 2007 Mac:

UNIX> time sh -c "./X-Sets 0 0 0 SUBEX < cauchy-10-6-8.txt | tail -n 1"
855
2.198u 0.044s 0:02.23 100.0%	0+0k 0+1io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW < cauchy-10-6-8.txt | tail -n 1"
841
2.300u 0.048s 0:02.34 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 2 0 MW < cauchy-10-6-8.txt | tail -n 1"
841
98.924u 0.313s 1:39.47 99.7%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW_MATCHING < cauchy-10-6-8.txt | tail -n 1"
835
3.714u 0.052s 0:03.92 95.9%	0+0k 0+3io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW_SQ < cauchy-10-6-8.txt | tail -n 1"
820
584.686u 3.145s 9:54.20 98.9%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 UBER_XSET < cauchy-10-6-8.txt | tail -n 1"
888
3.283u 0.060s 0:03.34 100.0%	0+0k 0+6io 0pf+0w
UNIX>

These timings are pretty similar to those in Figure 17 of [Plank10] (those were for decoding, though, rather than encoding). The schedules are the same as Figure 16.

Since the programs use the standard template library (STL) and implementation of the STL differ from machine to machine, the schedules that you get may differ from the ones generated here.