Checkpointing Java

This project was abandoned in the mid-1990's. I keep this page around since some people maintain pointers to it. However, I don't have any good code or libraries with respect to Java Checkpointing -- Jim

Principal People

Professors

James S. Plank

Graduate Students

Michael Puening (Masters)

Description

This project concerns saving the state of Java programs so that they may be restored either on another machine or on the same machine at a later point in time. Applications include fault-tolerance and computation mobility either for purposes of load balancing or reasons of geographical necessity.

The neat thing about Java as it concerns checkpointing is that checkpoints can be taken in a machine-independent format, which means that Java programs have the ability to be checkpointed and restored on machines of differing architecture. This is a very powerful functionality, and is impossible with standard programming languages like C.

This Web page describes the ongoing projects in Java Checkpointing at the University of Tennessee. There are various levels of functionality that we striving for. These are described below.

Introduction and some basics

Java is a language that has become extremely popular due to the World Wide Web. Designed by Sun Microsystems, Java resembles C++ with type safety and automatic memory allocation. Java programs are compiled into a machine-independent format, called ``bytecode,'' and bytecodes are then executed by an interpreter (called the ``virtual machine''), or compiled themselves into native machine code for higher performance. Typically, Java programs are compiled into either ``applets'' or ``applications.'' Applications are standalone programs executed by a virtual machine, while applets are code intended to be loaded into and interpreted by a browser like Netscape. Applets are more complex than applications because they expect to interact with current environment of the browser, while applications look more like standard programs.

Java has no pointer types, and presents a computation model that has guarantees about safety. Thus, when a user executes a Java application or applet from an untrusted source, he or she can be assured that the Java program will not use memory in unknown or perhaps malignant ways. For this reason, Java has become the language of choice for downloading executable code over the internet. Moreover, Java has become popular for code distribution because it is guaranteed to be portable, and bytecode rather than source code can be distributed, which limits software piracy. The only limitation of Java is performance, which is improving as the virtual machines improve (see ``The Java Linpack Benchmark'' or ``The Java Performance Report'' for more detail on Java performance).

In this project, the goal is to provide architecture-independent checkpointing (AIC) for Java applications. The Java bytecode has been defined so that the types of all variables may be identified with none of the problems associated with languages like C. Moreover, the architecture-independent format of the bytecode provides a convenient storage format. Since Java is gaining popularity, it is a natural choice of language for AIC.

This project has already started. The planned stages are as follows:

Stage 1: Instrument a virtual machine to perform AIC of Java applications

We have started with a virtual machine called `` Kaffe'' with public domain source code. We have modified the virtual machine to take architecture-independent checkpoints, which are currently in a format that can only be recovered by another instrumented Kaffe. machine. However, since Kaffe is portable across architectures, so are the checkpoints.

The checkpointer leverages off the mark and sweep code for the Kaffe garbage collector. Instead of marking memory for collection/retention, the checkpointer checkpoints each object. Object checkpointing is a recursive procedure: each variable of an object is either a primitive type or an object itself. Primitive types can be checkpointed easily; object variables are checkpointed recursively. Object inheritance is also handled recursively: the subclass variables are checkpointed, and then the object is checkpointed using the checkpointing methodology for its superclass.

All types are stored in an architecture-independent format. Moreover, the interpreter is structured in the manner of Theimer and Hayes [TH] so that the execution state of the virtual machine may be restored. State external to the application's process is not checkpointed, which means that the bulk of the Java API (e.g. the window system and sockets) may not be checkpointed. Since the checkpointer is instrumented in the virtual machine rather than the browser, applet checkpointing is not supported. However, for standalone programs, the checkpointer is functional, and this is exciting.

Stage 2: Format the checkpoints as bytecode

Java bytecode is a machine-independent format for specifying safe executable content. It provides a natural format in which to store architecture-independent checkpoints. Since object class definitions are already stored in bytecode, data storage is straightforward. The subtle part will be encapsulating the machine state in bytecode, however this should be made possible by creating recoverable subclasses of objects, storing them as bytecode, and using them only for recovery of the execution state. The powerful by-product of using bytecode for checkpoints is that checkpoints may be recovered by any virtual machine, even those without support for checkpointing. Of course, in this case, the recovered code will not checkpoint further. However, this is an extra degree of functionality.

Stage 3: Allow objects to checkpoint themselves

Conceptually, Java checkpointing can be viewed as invoking, for each accessible object, a checkpoint method that stores the object into a checkpoint, and a recover method that recovers the object from a checkpoint. There are default checkpointing methods for each primitive type, and the checkpointing of aggregate types is built on top of these primitive defaults. Recovery is similar. In Stage 3, we will allow the programmers to provide their own checkpoint and recover methods for objects that override the defaults.

The reasons for this are twofold. First, programmers may exploit performance optimizations that arise because the default methods may save more than is necessary. For example, an object may be able to build significant parts of its state from a smaller part of its state. Thus the significant parts may be excluded from the checkpoint, and simply rebuilt upon recovery. Moreover, programmers can use this as the interface to memory exclusion [PBKL,PCLBK]. Second, there may be times when a programmer must define checkpoint and recover methods for correctness. This is because the object relies on native (non-Java) code. Thus, for example, the part of the Java API that deals with the state of the window system must provide its own checkpointing method, since it relies on state outside of the Java process. By using the interface of checkpoint and recover methods, we can enable checkpointing on large parts of the Java API in a clean fashion, and this may pave the way for clean implementations of applet checkpointing.

Stage 4: Explore the possibility of Java bytecode checkpointing itself

The obvious last step is to enable, by compilation and/or extending the Java API, the checkpointing to be directed by the Java bytecode. With this functionality, any Java application is automatically empowered with AIC, because the checkpointing is defined in the bytecode, and checkpoints are themselves bytecode, which will direct further checkpointing.

This functionality may not be feasible, but it represents a direction of research that will be pursued.

References

[PCLBK] James S. Plank, Yuqun Chen, Kai Li, Micah Beck and Gerry Kingsley, Memory Exclusion: Optimizing the Performance of Checkpointing Systems, University of Tennessee Technical Report UT-CS-96-335, August, 1996.
[PBKL] James S. Plank, Micah Beck, Gerry Kingsley and Kai Li, Libckpt: Transparent Checkpointing under Unix, Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA, January, 1995, pp. 213--223.
[TH] M. Theimer and B. Hayes, Heterogeneous Process Migration by Recompilation, 11th International Conference on Distributed Computing Systems, pages 18-25, 1991.