SD Codes: Erasure Codes Designed for How Storage Systems Really Fail

James S. Plank, EECS Department, University of Tennessee,
Mario Blaum, Independent Contractor to IBM Almaden Research Center
James L. Hafner, IBM Research Division, Almaden Research Center.

Appearing in FAST 2013: 11th USENIX Conference on File and Storage Technologies, San Jose, CA, February, 2013.

PDF of the paper.
The open source software for this paper is here.


Traditionally, when storage systems employ erasure codes, they are designed to tolerate the failures of entire disks. However, the most common types of failures are latent sector failures, which only affect individual disk sectors, and block failures which arise through wear on SSD's. This paper introduces SD codes, which are designed to tolerate combinations of disk and sector failures. As such, they consume far less storage resources than traditional erasure codes. We specify the codes with enough detail for the storage practitioner to employ them, discuss their practical properties, and detail an open-source implementation.

