Journal of Parallel and Distributed Computing, Vol. 61, No. 11, November, 2001, pp. 1570-1590.
Available via anonymous ftp to cs.utk.edu in pub/plank/papers/JPDC01.pdf and pub/plank/papers/JPDC01.ps.Z.
Matlab scripts for this work are here.
Keywords Checkpointing, performance prediction, parameter selection, parallel computation, Markov chain, exponential failure and repair distributions.
author J. S. Plank and M. G. Thomason title Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems journal Journal of Parallel and Distributed Computing publisher Academic Press month November year 2001 volume 61 number 11 pages 1570-1590 where http://www.idealibrary.com/links/toc/jpdc/61/11/0 http://web.eecs.utk.edu/~jplank/plank/papers/JPDC01.html
@INPROCEEDINGS{pt:01:pac, author = "J. S. Plank and M. G. Thomason", title = "Processor Allocation and Checkpoint Interval Selection in Cluster Computing Systems", journal = "Journal of Parallel and Distributed Computing", publisher = "Academic Press", month = "November", year = "2001", volume = "61", number = "11", pages = "1570-1590", where = "http://www.idealibrary.com/links/toc/jpdc/61/11/0 http://web.eecs.utk.edu/~jplank/plank/papers/JPDC01.html" }