Optimizing Cauchy Reed-Solomon Codes for Fault-Tolerant Storage Applications

James S. Plank

December 8, 2005

Technical Report UT-CS-05-569
Department of Computer Science
University of Tennessee
Knoxville, TN 37996

PDF: http://web.eecs.utk.edu/~jplank/plank/papers/CS-05-569.pdf

A slight variant of this paper was submitted and accepted to the 5th IEEE International Symposium on Network Computing Applications (NCA06). See this link for citation information about that paper. Since NCA has a page limit of eight pages, that paper is more or less a hatchet job of this paper, which cuts out all tutorial material and a section or two on codes for larger w. I would recommend you read this version and cite that one. If this work get journalized, I will put a link to that here.


Abstract

In the past few years, all manner of storage systems, ranging from disk array systems to distributed and wide-area systems, have started to grapple with the reality of tolerating multiple simultaneous failures of storage nodes. Unlike the single failure case, which is optimally handled with RAID Level-5 parity, the multiple failure case is more difficult because optimal general purpose strategies are not yet known.

Erasure Coding is the field of research that deals with these strategies, and this field has blossomed in recent years. Despite this research, the decades-old strategy of Reed-Solomon coding remains the only space-optimal (MDS) code for all but the smallest storage systems. The best performing implementations of Reed-Solomon coding employ a variant called Cauchy Reed-Solomon coding developed in the mid 1990's.

In this paper, we present an improvement to Cauchy Reed-Solomon coding that is based on optimizing the Cauchy distribution matrix. We detail an algorithm for generating good matrices and then evaluate the performance of encoding using all manners of Reed-Solomon coding, plus the best MDS codes from the literature. The improvements over the original Cauchy Reed-Solomon codes are as much as 83% in realistic scenarios, and average roughly 10% over all cases that we tested.


Citation Information