``Fault-Tolerance in the Network Storage Stack''

Scott Atchley, Stephen Soltesz, James S. Plank, Micah Beck, and Terry Moore.

IEEE Workshop on Fault-Tolerant Parallel and Distributed Systems, Ft. Lauderdale, FL, April, 2002.

Available via anonymous ftp to cs.utk.edu in pub/plank/papers/FTPDS-02.ps.Z and pub/plank/papers/FTPDS-02.pdf.

Abstract

This paper addresses the issue of fault-tolerance in applications that make use of network storage. A network storage abstraction called the Network Storage Stack is presented, along with its constituent parts. In particular, a data type called the exNode is detailed, along with tools that allow it to be used to implement a wide-area, striped and replicated file.

Using these tools, we evaluate the fault-tolerance of several exNode ``files,'' composed of variable-size blocks stored on 14 different machines at five locations throughout the United States. The results demonstrate that while failures in using network storage occur frequently, the tools built on the Network Storage Stack tolerate them gracefully, and with good performance.

Postscript of the paper

PDF of the paper


Citation Information