``Design, Implementation, and Performance of Checkpointing in NetSolve''

Adnan Agbaria and James S. Plank.

International Conference on Dependable Systems and Networks (FTCS-30 and DCCA-8), New York, NY, June, 2000.

Available via anonymous ftp to cs.utk.edu in pub/plank/papers/FTCS30.ps.Z and pub/plank/papers/FTCS30.pdf.Z.

Abstract

While a variety of checkpointing techniques and systems have been documented for long-running programs, they are typically not available for programmers that are non systems experts. This paper details a project that integrates three technologies, NetSolve, Starfish, and IBP, for the seamless integration of fault-tolerance into long-running applications. We discuss the design and implementation of this project, and present performance results executing on both local and wide-area networks.

Postscript of the paper

PDF of the paper


Citation Information