Yes, this is a long one, but it's an excellent survey of the field (and the margins are big).
Also available as: Tech Report CSD-97-938, University of California, Berkeley, 1997.
Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems, James S. Plank and Wael R. Elwasif, 28th International Symposium on Fault-tolerant Computing, June, 1998, pp. 48-57.
Netsolve: An Environment for Deploying Fault-Tolerant Computing, James S. Plank, Henri Casanova, Jack J. Dongarra, Terry Moore, FastAbstracts Session, 28th International Symposium on Fault-tolerant Computing, June, 1998.
The paper is in Rachel's box -- please make yourself a copy.
If you can't get it online, the paper is in Rachel's box -- please make yourself a copy.
The paper is in Rachel's box -- please make yourself a copy.
Home page of the Parallel Data Lab at CMU.