James S. Plank,
Kai Li and
Michael A. Puening.
Transactions on Parallel and Distributed Systems,
9(10), October, 1998, pp. 972-986.
The precursor of this paper is available
as UT CS Technical Report 97-380.
Diskless Checkpointing is a technique for checkpointing the state of
a long-running computation on a distributed system without relying on
stable storage. As such, it eliminates the performance bottleneck of
traditional checkpointing on distributed systems. In this paper, we
motivate diskless checkpointing and present the basic diskless
checkpointing scheme along with several variants for improved
performance. The performance of the basic scheme and its variants is
evaluated on a high-performance network of workstations and compared
to traditional disk-based checkpointing. We conclude that diskless
checkpointing is a desirable alternative to disk-based checkpointing
that can improve the performance of distributed applications in the
face of failures.
- Plain Text:
author J. S. Plank and K. Li and M. A. Puening
title Diskless Checkpointing
journal IEEE Transactions on Parallel and Distributed Systems
author = "J. S. Plank and K. Li and M. A. Puening",
title = "Diskless Checkpointing",
journal = "IEEE Transactions on Parallel and Distributed Systems",
volume = "9",
number = "10",
month = "October",
year = "1998",
pages = "972-986",
where = "http://web.eecs.utk.edu/~plank/plank/papers/CS-97-380.html"