``Performance Results of Ickp -- A Consistent Checkpointer on the iPSC/860''

James S. Plank and Kai Li

Scalable High Performance Computing Conference, Knoxville, TN, May, 1994, pp. 686--693.

Available via anonymous ftp to cs.utk.edu in

pub/plank/papers/SHPCC94.ps.Z.

Abstract

This paper presents performance results of ickp, a program that checkpoints applications written on the Intel iPSC/860. ickp is the first implementation of general-purpose checkpointing on a multicomputer. ickp implements three consistent checkpointing algorithms, as well as two optimizations, and recovery. This paper displays the results of testing ickp on many benchmark programs. These benchmarks are useful iPSC/860 programs written by other scientists, and not toy benchmarks used to test the checkpointer.

The main result of this paper is that we can sufficiently checkpoint a multicomputer of the size of the iPSC/860, thereby achieving fault-tolerance and coarse-grained job-swapping in an environment where there previously was none. We also draw conclusions on the nature of consistent checkpointing algorithms, and on the effectiveness of two optimizations -- main memory checkpointing, and checkpoint compression.

An alpha release of ickp has been made to the Intel community.

Postscript of the paper