``The Average Availability of Parallel Checkpointing Systems and Its
Importance in Selecting Runtime Parameters''
James S. Plank,
Michael G. Thomason.
FTCS-29:
29th International Symposium on Fault-tolerant
Computing, Madison, WI, June, 1999, pp. 250-259.
The journal version of the paper was published in JPDC in 2001 and is
an expansion of this work.
Please see this link for information
about that paper. Also, please cite that paper in preference to
this one.
Available via anonymous ftp to cs.utk.edu in
pub/plank/papers/FTCS29.ps and
pub/plank/papers/FTCS29.pdf.
Matlab scripts for this work are here.
Abstract
Performance prediction of checkpointing systems in the presence of
failures is a well-studied research area. While the literature
abounds with performance models of checkpointing systems, none
address the issue of selecting runtime parameters other than the
optimal checkpointing interval. In particular, the issue of
processor allocation is typically ignored. In this paper, we briefly
present a performance model for long-running parallel computations
that execute with checkpointing enabled. We then discuss how it is
relevant to today's parallel computing environments and software, and
present case studies of using the model to select runtime parameters.
Citation Information
- Plain Text:
author J. S. Plank and M. G. Thomason
title The Average Availability of Parallel Checkpointing
Systems and Its Importance in Selecting Runtime
Parameters
booktitle 29th International Symposium on Fault-Tolerant Computing
address Madison, WI,
month June
year 1999
pages 250--259
where http://web.eecs.utk.edu/~jplank/plank/papers/FTCS29.html
- Bibtex:
@INPROCEEDINGS{pt:99:aa,
author = "J. S. Plank and M. G. Thomason",
title = "The Average Availability of Parallel Checkpointing
Systems and Its Importance in Selecting Runtime
Parameters",
booktitle = "29th International Symposium on Fault-Tolerant Computing",
address = "Madison, WI,",
month = "June",
year = "1999",
pages = "250--259",
where = "http://web.eecs.utk.edu/~jplank/plank/papers/FTCS29.html"
}