Heuristics for Optimizing Matrix-Based Erasure Codes for Fault-Tolerant Storage Systems

James S. Plank
Catherine D. Schuman
B. Devin Robison

December 10, 2010

Technical Report UT-CS-10-664
EECS Department
University of Tennessee
Knoxville, TN 37996

This paper appears in DSN-2012: The 42nd Annual IEEE/IFIP Internaltional Conference on Dependable Systems and Networks, Boston, MA, June, 2012.

Please see the web page for that paper, and please cite that paper in preference to this one.

PDF of the paper.


Large scale, archival and wide-area storage systems use erasure codes to protect users from losing data due to the inevitable failures that occur. All but the most basic erasure codes employ bit-matrices to perform encoding and decoding. These bit-matrices are massaged so that encoding and decoding become described by lists of exclusive-or (XOR) operations.

When converting matrices to lists of XOR operations, there are CPU savings that can result from strategically scheduling the XOR operations and leveraging intermediate results so that fewer XOR's are performed. It is an open problem to derive a schedule from a bit-matrix that minimizes the number of XOR operations.

We attack this open problem, deriving two new heuristics called Uber-CHRS and X-Sets to schedule encoding and decoding bit-matrices with reduced XOR operations. We evaluate these heuristics in a variety of realistic erasure coding settings and demonstrate that they are a significant improvement over previously published heuristics. In particular, a hybrid of the two heuristics, which we call Uber-XSet, provides consistently good schedules across all of our tests. We provide an open-source implementation of these heuristics so that practitioners may leverage our work.

Citation Information