James S. Plank and Kai Li
IEEE Parallel and Distributed Technologies, 2(2), Summer, 1994, pp. 62--67.
The main result of this paper is that we can sufficiently checkpoint a multicomputer of the size of the iPSC/860, thereby achieving fault-tolerance and coarse-grained job-swapping in an environment where there previously was none. We also draw conclusions on the nature of consistent checkpointing algorithms, and on the effectiveness of two optimizations -- main memory checkpointing, and checkpoint compression.
An alpha release of ickp has been made to the Intel community.
The base version of this paper is ``Performance Results of Ickp -- A Consistent Checkpointer on the iPSC/860''. The postscript is available for that paper.