Uber-CSHR and X-Sets: C++ Programs for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems

James S. Plank

December 14, 2010

University of Tennessee EECS Department Technical Report UT-CS-10-665.
http://www.cs.utk.edu/~plank/plank/papers/CS-10-665/index.html


This technical report contains the user's manual for two programs: Uber-CSHR and X-Sets. These are C++ programs whose intent is to minimize the number of XOR operations required when performing a binary matrix-vector product.

The source code for the two programs are in the following files:

The C++ programs are standalone programs that should compile and run with any modern C++ compiler. However, since compilers do differ, if you run into warnings and errors, please let me know and I'll fix them.


This material is based upon work supported by the National Science Foundation under grants CNS-0615221 and CSR-1016636, and a research award from Google.

If you use these programs

Please send me an email to let me know how it goes. One of the ways in which I am evaluated both internally and externally is by the impact of my work, and if you have found this library and/or this document useful, I would like to be able to document it. Please send mail to plank@cs.utk.edu.

The library itself is protected by the New BSD License. It is free to use and modify within the bounds of the License. None of the techniques implemented in this library have been patented.

Citation Information


Reference Material

This manual assumes that you have read the following paper:

In that paper, the problem is stated, and the two algorithms are defined and evaluated. This manual simply explains how to use the programs that implement them. The two example matrix files (cauchy-40.txt and cauchy-10-6-8.txt) are the ones used in the paper.


Uber-CSHR

As detailed in [Plank10], Uber-CSHR has two parameters: As such, Uber.cpp takes two command line arguments (examples from a Unix prompt):
UNIX> make Uber
g++ -o Uber -O3 Uber.cpp
UNIX> ./Uber
usage: Uber I|T L
UNIX> 
The matrix is given on standard input. Each non-blank line is a row of the matrix, and may only contain zeros, ones and whitespace. Each row must have the same number of zeros and ones.

For example, when we run this on cauchy-40.txt, with T and L=1, we get the following output:

UNIX> ./Uber T 1 < cauchy-40.txt 
100000
010000
001000
000100
000010
000001
010100
011100
011110
001110
001111
101111
100111
010010
010011
101000
101001
17
UNIX> 
This is a schedule in the same format as [Plank10]. The first c rows compose an identity matrix, and the remaining rows except for the last are elements that may be created as the XOR of two previous elements. The last line of output is the size of the schedule. The number of XORs required is the schedule size minus c.

This output is the same schedule as in [Plank10], which requires 11 XOR's. If we set T|I to I and L to two, we get an optimal schedule of 9 XOR's:

UNIX> ./Uber I 2 < cauchy-40.txt
100000
010000
001000
000100
000010
000001
010100
011100
011110
001110
001111
010011
101111
100111
101001
15
UNIX> 
The running time of the heuristic is roughly XL where X is the number of targets for T and the size of the schedule for I. As such, when matrices get large, the values of L become more limited. The following running times are on my janky old 2007 MacBook Pro with a 2.16 GHz Intel Core 2 Duo processor. The program has been written to keep the memory footprint very low, so you can bump up the value of L and let it run for a year if the feeling moves you.
UNIX> time sh -c "./Uber T 1 < cauchy-10-6-8.txt | tail -n 1"
1578
0.038u 0.017s 0:00.08 50.0%	0+0k 0+2io 3pf+0w
UNIX> time sh -c "./Uber T 2 < cauchy-10-6-8.txt | tail -n 1"
1425
0.045u 0.014s 0:00.05 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 3 < cauchy-10-6-8.txt | tail -n 1"
1316
0.209u 0.018s 0:00.21 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 4 < cauchy-10-6-8.txt | tail -n 1"
1282
2.041u 0.025s 0:02.06 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber T 5 < cauchy-10-6-8.txt | tail -n 1"
1255
17.992u 0.082s 0:18.11 99.7%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 1 < cauchy-10-6-8.txt | tail -n 1"
1465
0.040u 0.016s 0:00.04 125.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 2 < cauchy-10-6-8.txt | tail -n 1"
1220
6.570u 0.034s 0:06.60 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./Uber I 3 < cauchy-10-6-8.txt | tail -n 1"
944
1048.757u 3.980s 17:36.31 99.6%	0+0k 0+0io 0pf+0w
UNIX>

X-Sets

The heuristics based on X-Sets are much more heavyweight than Uber-CSHR. Again, this manual defers to [Plank10] for explanation. The program has four command-line arguments:

As with Uber, you put the matrix on standard input. For output, it prints intermediate values of how many targets are left and what elements are generated. After that, it prints the schedule, and finally the size of the schedule. So, for example with the small Cauchy matrix, we get the following. This follows the example in Figure 10 of [Plank10].

UNIX> make X-Sets
g++ -o X-Sets -O3 X-Sets.cpp
UNIX> ./X-Sets 0 1 0 UBER_XSET < cauchy-40.txt
Targets Left  6.  New Element 010100 Edge Weight: 2 (TH 4.1)
Targets Left  5.  New Element 000011 Edge Weight: 3
Targets Left  5.  New Element 010011 Edge Weight: 1 (TH 4.1)
Targets Left  4.  New Element 000111 Edge Weight: 2
Targets Left  4.  New Element 001111 Edge Weight: 1 (TH 4.1)
Targets Left  3.  New Element 100111 Edge Weight: 1 (TH 4.1)
Targets Left  2.  New Element 001001 Edge Weight: 1
Targets Left  2.  New Element 101001 Edge Weight: 1 (TH 4.1)
Targets Left  1.  New Element 001010 Edge Weight: 1
Targets Left  1.  New Element 011110 Edge Weight: 1 (TH 4.1)
100000
010000
001000
000100
000010
000001
010100
000011
010011
000111
001111
100111
001001
101001
001010
011110
16
UNIX> ./X-Sets 0 2 0 UBER_XSET < cauchy-40.txt
Targets Left  6.  New Element 010100 Edge Weight: 2 (TH 4.1)
Targets Left  5.  New Element 000011 Edge Weight: 3
Targets Left  5.  New Element 010011 Edge Weight: 1 (TH 4.1)
Targets Left  4.  New Element 000111 Edge Weight: 2
Targets Left  4.  New Element 001111 Edge Weight: 1 (TH 4.1)
Targets Left  3.  New Element 100111 Edge Weight: 1 (TH 4.1)
Targets Left  2.  New Element 001110 Edge Weight: 2
Targets Left  2.  New Element 011110 Edge Weight: 1 (TH 4.1)
Targets Left  1.  New Element 101001 Edge Weight: 1 (TH 4.1)
100000
010000
001000
000100
000010
000001
010100
000011
010011
000111
001111
100111
001110
011110
101001
15
UNIX> 
In terms of timings, this program is much slower than Uber, since it carries around quite a bit more baggage. Again on my 2007 Mac:
UNIX> time sh -c "./X-Sets 0 0 0 SUBEX < cauchy-10-6-8.txt | tail -n 1"
855
2.198u 0.044s 0:02.23 100.0%	0+0k 0+1io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW < cauchy-10-6-8.txt | tail -n 1"
841
2.300u 0.048s 0:02.34 100.0%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 2 0 MW < cauchy-10-6-8.txt | tail -n 1"
841
98.924u 0.313s 1:39.47 99.7%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW_MATCHING < cauchy-10-6-8.txt | tail -n 1"
835
3.714u 0.052s 0:03.92 95.9%	0+0k 0+3io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 MW_SQ < cauchy-10-6-8.txt | tail -n 1"
820
584.686u 3.145s 9:54.20 98.9%	0+0k 0+0io 0pf+0w
UNIX> time sh -c "./X-Sets 0 1 0 UBER_XSET < cauchy-10-6-8.txt | tail -n 1"
888
3.283u 0.060s 0:03.34 100.0%	0+0k 0+6io 0pf+0w
UNIX> 
These timings are pretty similar to those in Figure 17 of [Plank10] (those were for decoding, though, rather than encoding). The schedules are the same as Figure 16.

Since the programs use the standard template library (STL) and implementation of the STL differ from machine to machine, the schedules that you get may differ from the ones generated here.