Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems

``Experimental Assessment of Workstation Failures and Their Impact on Checkpointing Systems''

FTCS-28: 28th International Symposium on Fault-tolerant Computing, Munich, June, 1998, pages 48-57.

Available via anonymous ftp to cs.utk.edu in pub/plank/papers/FTCS28.ps and pub/plank/papers/FTCS28.pdf.

Abstract

In the past twenty years, there has been a wealth of theoretical research on minimizing the expected running time of a program in the presence of failures by employing checkpointing and rollback recovery. In the same time period, there has been little experimental research to corroborate these results. In this paper, we study the results of three separate projects that monitor failure in workstation networks. Our goals are twofold. The first is to see how these results correlate with the theoretical results, and the second is to assess their impact on strategies for checkpointing long-running computations on workstations and networks of workstations. A surprising result of our work is that although the base assumptions of the theoretical research do not hold, many of the results are still applicable.

Postscript of the paper

PDF of the paper

Citation Information

Plain Text:

author          J. S. Plank and W. R. Elwasif
title           Experimental Assessment of Workstation Failures and
                Their Impact on Checkpointing Systems
booktitle       28th International Symposium on Fault-Tolerant Computing
address         Munich
month           June
year            1998
pages           48-57

Bibtex:

@INPROCEEDINGS{pe:98:eaw,
        author = "J. S. Plank and W. R. Elwasif",
        title = "Experimental Assessment of Workstation Failures and
                Their Impact on Checkpointing Systems",
        booktitle = "28th International Symposium on Fault-Tolerant Computing",
        address = "Munich",
        month = "June",
        year = "1998",
        pages = "48-57",
        where = "http://web.eecs.utk.edu/~jplank/plank/papers/FTCS28.html"
}