``The Average Availability of Uniprocessor
Checkpointing Systems, Revisited''
James S. Plank and
Michael G. Thomason.
Technical Report UT-CS-98-400, University of Tennessee, August, 1998.
Available via anonymous ftp to cs.utk.edu in
pub/plank/papers/CS-98-400.ps.Z.
Abstract
Performance prediction of checkpointing systems in the presence of
failures is a well-studied research area. This paper makes three
small contributions to this research area. First, we show how to apply
the concept of availability from reliability theory as a
useful metric for checkpointing systems. Second, we study the
average availability of uniprocessor checkpointing systems, using
the libckpt checkpointer as a model. This is a slight deviation
from previous checkpointing models. We employ Bernoulli trials
to derive an expression for the availability of such a checkpointing system,
and then use this expression to calculate the checkpoint interval which
maximizes availability. Third, we present another derivation of the
availability based on a direct calculation of average segment uptime.
For the exponential failure distribution function, these two derivations
are equivalent. The latter derivation allows for a simple way to
numerically approximate availability for other failure distribution
functions. We conclude with examples of applying these results.