The Java bytecode verification problem

Vijay Saraswat
vj@research.att.com
AT&T Research , 180 Park Avenue, Florham Park NJ 07932

The problem of byte code verification for Java raises interesting technical challenges, not the least because it is not well-formulated. The primary material for the specification of this problem is considered to be the description given in JVMS (P.118-137), and the (somewhat incomparable) C implementation provided with Sun's JDK implementation of Java.

We pursue a first-principles approach to this problem. Eschewing both of these "definitions", we start by analyzing the structure of the Java Virtual Machine (JVM), as given in JVMS. This yields, for each of the 200-odd opcodes, a description of constraints that can be verified statically, prior to execution, in such a way that the constraints are guaranteed to hold at run-time, and hence need not be checked explicitly. We identify four stages of verification: off-line verification, in-situ verification, resolution-time verification and link-time verification. Off-line verification works only on the input class-file, generating constraints on the class-hierarchy in the execution context. These must must be checked during in-situ verification in the context of the executing JVM. This distinction breaks a vicious cycle that arises from the fact that Java allows arbitrary user-defined Java code to run during the byte-code verification process.

We present a compositional translation T from (a syntactic variant of) class files to concurrent constraint programs. Intuitively, for any class file C, running T(C) establishes whether or not C satisfies the off-line constraints; if it does, T(C) also generates the in-situ constraints, if any. This "declarative" reconstruction of the byte-code verification problem shows how to handle tricky technical issues such as the initialization of newly allocated objects, and the execution of subroutine calls in a purely declarative manner. (This technique of compositional translation into (concurrent) constraint programs is a general technique applicable to other type-analysis problems as well.)

In the light of this analysis, it becomes possible to classify the "structural constraints" presented in JVMS (P.118-137) as being necessary or merely sufficient.

With a formal analysis of the operational behavior of the JVM, currently under development, we expect to be able to prove that T is sound, in that the execution of (code in) a class C will not halt unexpectedly if T(C) does not generate false .