``Memory Exclusion: Optimizing the Performance
of Checkpointing Systems''
James S. Plank,
Technical Report UT-CS-96-335, University of Tennessee, August, 1996.
A revised version of this paper appears in
Software -- Practice and Experience,
Volume 29, Number 2, pp. 125-142, 1999. Please cite that
paper in preference to this technical report.
Available via anonymous ftp to cs.utk.edu in
Checkpointing systems are a convenient way for users to make their
programs fault-tolerant by intermittently saving program state to
disk, and restoring that state following a failure.
The main concern with checkpointing
is the overhead that it adds to running time of the program.
This paper describes memory exclusion an important class of
optimizations that reduce the overhead of checkpointing.
These optimizations have been implemented in two checkpointers:
libckpt which works on Unix-based workstations, and
libNXckpt which works on the Intel Paragon. Both checkpointers
are publicly available at no cost. We have checkpointed
various long-running applications with both checkpointers and have
explored the performance improvements that may be gained through
memory exclusion. Results from these experiments are presented and
show that the improvements are significant. We conclude that
all checkpointing systems should include primitives allowing programmers
and users to gain the full benefits of memory exclusion.
Raw Data for the paper
The raw data for the paper is
This link also resolves the apparent anomaly in the data in Figure 3,
addressed in a footnote in the paper. Please see the link for more
- Plain Text:
author J. S. Plank and Y. Chen and K. Li and M. Beck and G. Kingsley
title Memory Exclusion: Optimizing the Performance of
institution University of Tennessee
author = "J. S. Plank and Y. Chen and K. Li and M. Beck and G. Kingsley",
title = "Memory Exclusion: Optimizing the Performance of
institution = "University of Tennessee",
number = "UT-CS-96-335",
month = "August",
year = "1996",
where = "http://web.eecs.utk.edu/~jplank/plank/papers/CS-96-335.html"