December 14, 2010
University of Tennessee EECS Department Technical Report UT-CS-10-665.
http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html
The source code for the two programs are in the following files:
The C++ programs are standalone programs that should compile and run with any modern C++ compiler. However, since compilers do differ, if you run into warnings and errors, please let me know and I'll fix them.
The library itself is protected by the New BSD License. It is free to use and modify within the bounds of the License. None of the techniques implemented in this library have been patented.
.techreport p:10:heur author J. S. Plank title {Uber-CSHR} and {X-Sets}: {C++} Programs for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems institution University of Tennessee month December year 2010 number CS-10-665 where http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html
@TECHREPORT{p:10:heur, author = "J. S. Plank", title = "{Uber-CSHR} and {X-Sets}: {C++} Programs for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems", institution = "University of Tennessee", month = "December", year = "2010", number = "CS-10-665", where = "http://web.eecs.utk.edu/~jplank/plank/papers/CS-10-665/index.html" }
In that paper, the problem is stated, and the two algorithms are defined and evaluated. This manual simply explains how to use the programs that implement them. The two example matrix files (cauchy-40.txt and cauchy-10-6-8.txt) are the ones used in the paper.
UNIX> make Uber g++ -o Uber -O3 Uber.cpp UNIX> ./Uber usage: Uber I|T L UNIX>The matrix is given on standard input. Each non-blank line is a row of the matrix, and may only contain zeros, ones and whitespace. Each row must have the same number of zeros and ones.
For example, when we run this on cauchy-40.txt, with T and L=1, we get the following output:
UNIX> ./Uber T 1 < cauchy-40.txt 100000 010000 001000 000100 000010 000001 010100 011100 011110 001110 001111 101111 100111 010010 010011 101000 101001 17 UNIX>This is a schedule in the same format as [Plank10]. The first c rows compose an identity matrix, and the remaining rows except for the last are elements that may be created as the XOR of two previous elements. The last line of output is the size of the schedule. The number of XORs required is the schedule size minus c.
This output is the same schedule as in [Plank10], which requires 11 XOR's. If we set T|I to I and L to two, we get an optimal schedule of 9 XOR's:
UNIX> ./Uber I 2 < cauchy-40.txt 100000 010000 001000 000100 000010 000001 010100 011100 011110 001110 001111 010011 101111 100111 101001 15 UNIX>The running time of the heuristic is roughly XL where X is the number of targets for T and the size of the schedule for I. As such, when matrices get large, the values of L become more limited. The following running times are on my janky old 2007 MacBook Pro with a 2.16 GHz Intel Core 2 Duo processor. The program has been written to keep the memory footprint very low, so you can bump up the value of L and let it run for a year if the feeling moves you.
UNIX> time sh -c "./Uber T 1 < cauchy-10-6-8.txt | tail -n 1" 1578 0.038u 0.017s 0:00.08 50.0% 0+0k 0+2io 3pf+0w UNIX> time sh -c "./Uber T 2 < cauchy-10-6-8.txt | tail -n 1" 1425 0.045u 0.014s 0:00.05 100.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber T 3 < cauchy-10-6-8.txt | tail -n 1" 1316 0.209u 0.018s 0:00.21 100.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber T 4 < cauchy-10-6-8.txt | tail -n 1" 1282 2.041u 0.025s 0:02.06 100.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber T 5 < cauchy-10-6-8.txt | tail -n 1" 1255 17.992u 0.082s 0:18.11 99.7% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber I 1 < cauchy-10-6-8.txt | tail -n 1" 1465 0.040u 0.016s 0:00.04 125.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber I 2 < cauchy-10-6-8.txt | tail -n 1" 1220 6.570u 0.034s 0:06.60 100.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./Uber I 3 < cauchy-10-6-8.txt | tail -n 1" 944 1048.757u 3.980s 17:36.31 99.6% 0+0k 0+0io 0pf+0w UNIX>
MW and SUBEX are the fastest. MW_SQ is the slowest. The rest are in the middle.
In terms of schedules produced, either UBER_XSET or MW_SQ produces the best. Usually, you'd like to try to set L to two.
As with Uber, you put the matrix on standard input. For output, it prints intermediate values of how many targets are left and what elements are generated. After that, it prints the schedule, and finally the size of the schedule. So, for example with the small Cauchy matrix, we get the following. This follows the example in Figure 10 of [Plank10].
UNIX> make X-Sets g++ -o X-Sets -O3 X-Sets.cpp UNIX> ./X-Sets 0 1 0 UBER_XSET < cauchy-40.txt Targets Left 6. New Element 010100 Edge Weight: 2 (TH 4.1) Targets Left 5. New Element 000011 Edge Weight: 3 Targets Left 5. New Element 010011 Edge Weight: 1 (TH 4.1) Targets Left 4. New Element 000111 Edge Weight: 2 Targets Left 4. New Element 001111 Edge Weight: 1 (TH 4.1) Targets Left 3. New Element 100111 Edge Weight: 1 (TH 4.1) Targets Left 2. New Element 001001 Edge Weight: 1 Targets Left 2. New Element 101001 Edge Weight: 1 (TH 4.1) Targets Left 1. New Element 001010 Edge Weight: 1 Targets Left 1. New Element 011110 Edge Weight: 1 (TH 4.1) 100000 010000 001000 000100 000010 000001 010100 000011 010011 000111 001111 100111 001001 101001 001010 011110 16 UNIX> ./X-Sets 0 2 0 UBER_XSET < cauchy-40.txt Targets Left 6. New Element 010100 Edge Weight: 2 (TH 4.1) Targets Left 5. New Element 000011 Edge Weight: 3 Targets Left 5. New Element 010011 Edge Weight: 1 (TH 4.1) Targets Left 4. New Element 000111 Edge Weight: 2 Targets Left 4. New Element 001111 Edge Weight: 1 (TH 4.1) Targets Left 3. New Element 100111 Edge Weight: 1 (TH 4.1) Targets Left 2. New Element 001110 Edge Weight: 2 Targets Left 2. New Element 011110 Edge Weight: 1 (TH 4.1) Targets Left 1. New Element 101001 Edge Weight: 1 (TH 4.1) 100000 010000 001000 000100 000010 000001 010100 000011 010011 000111 001111 100111 001110 011110 101001 15 UNIX>In terms of timings, this program is much slower than Uber, since it carries around quite a bit more baggage. Again on my 2007 Mac:
UNIX> time sh -c "./X-Sets 0 0 0 SUBEX < cauchy-10-6-8.txt | tail -n 1" 855 2.198u 0.044s 0:02.23 100.0% 0+0k 0+1io 0pf+0w UNIX> time sh -c "./X-Sets 0 1 0 MW < cauchy-10-6-8.txt | tail -n 1" 841 2.300u 0.048s 0:02.34 100.0% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./X-Sets 0 2 0 MW < cauchy-10-6-8.txt | tail -n 1" 841 98.924u 0.313s 1:39.47 99.7% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./X-Sets 0 1 0 MW_MATCHING < cauchy-10-6-8.txt | tail -n 1" 835 3.714u 0.052s 0:03.92 95.9% 0+0k 0+3io 0pf+0w UNIX> time sh -c "./X-Sets 0 1 0 MW_SQ < cauchy-10-6-8.txt | tail -n 1" 820 584.686u 3.145s 9:54.20 98.9% 0+0k 0+0io 0pf+0w UNIX> time sh -c "./X-Sets 0 1 0 UBER_XSET < cauchy-10-6-8.txt | tail -n 1" 888 3.283u 0.060s 0:03.34 100.0% 0+0k 0+6io 0pf+0w UNIX>These timings are pretty similar to those in Figure 17 of [Plank10] (those were for decoding, though, rather than encoding). The schedules are the same as Figure 16.
Since the programs use the standard template library (STL) and implementation of the STL differ from machine to machine, the schedules that you get may differ from the ones generated here.