We analyze and prove some structural properties of Java runs which are needed for the proof that Java is type safe, and for the correctness proof of the compilation of Java to JVM bytecode. This includes the reachability analysis for expressions and statements and the rules of de_nite assignment for local variables
Most structural properties of Java runs are intuitively clear. Therefore a reader not interested in the technical details of mathematical proofs should skip this section and move directly. Some structural properties proved in this section are used for the proof that Java is type safe. Most structural properties are used later in the correctness proof of the compilation of Java to bytecode of the JVM.
The evaluation of expressions or statements may not terminate at all. An expression can contain an invocation of a recursive function which does not return. A statement can contain a loop which does not terminate. Our main goal is to prove a theorem relating the states of a possibly in_nite run of the ASM of Java with the recursive structure of expressions and statements.
In order to prove properties about runs of ASMs it is convenient to add the time as a subscript to the dynamic functions. Therefore by fn we denote the dynamic function f in the nth state of the run of the ASM. For example, posn is the value of the dynamic function pos in the nth state of a run of execJavaThread.
To some dynamic functions we add threads as superscripts. For example, posqn is the value of pos for thread q in the nth state of the run. The thread superscripts allow us to make statements about the local states of threads without distinguishing between the current thread and other threads.
De_nition (Context functions). For each thread q the functions
n , posqn
are de_ned in such a way that, if the
current thread in state n is q, then
112 8. Java is type safe
; methn = methqn
; restbodyn= restbodyq
posn = posqn
; localsn= localsqn
e current thread in state n is di_erent from q, then
contn(q) = (framesqn
n ; posqn
The functions are de_ned only if q is a valid pointer of type Thread in the nth state of the run, i.e., if q 2 dom(heapn) and classOfn(q) _h Thread.
When a thread executes Java code, control walks through the abstract syntax trees of method bodies. It enters expressions and statements, evaluates them and yields the results upwards for further processing. When methods are invoked or classes are initialized, new frames are pushed on the stack of the thread. When the methods return, the frames are popped again.
What does it mean that a thread is inside a phrase (expression or statement) in a certain interval of the run of the ASM? It means that during the interval the current position of the thread is always inside the phrase. Since methods may be invoked inside the phrase, new frames are created and the current position walks through the bodies of the invoked methods. Hence the current position leaves the phrase, although in the view of the parent frame the position is still inside the phrase.
We write l1 vpre l2, if the list l1 is a pre_x of l2. For positions _ and _,_ vpre _ means that _ is below _ in the abstract syntax tree. For stacks, frames1 vpre frames2 means that frames2 is an extension of frames1.
De_nition (Inside). Thread q is inside _ on the interval [m; n], if for all i with m _ i _ n
2. if framesqm
i , then _ vpre posq
What does it mean that a thread enters a phrase in a certain state of the run of the ASM? It means that in that state in the run, the thread is the active thread and the phrase is the phrase at the current position in the body of the current method, and it is unevaluated. At some positions, however, static initializers are called for the initialization of classes. When control returns from the class initialization, the phrase at the current position is still the same as before the initialization and so the thread `enters' the expression a second time. This second `entering' is excluded in the following de_nition.
De_nition (Enter). Thread q enters _ at m, if
1. threadm = q,
2. posm = _,
3. there is no k < m such that q is inside _ on the interval [k;m].
What does it mean that a thread evaluates a phrase in a certain interval of the run of the ASM? It means that the thread enters the phrase at the beginning of the interval and stays inside the phrase during the interval. The only changes made to restbody are inside the phrase. The local environment can be changed and extended during the evaluation because expressions and statements have side e_ects. The stacks of frames at the beginning and at the end of the interval are the same.
De_nition (Evaluation). Thread q evaluates _ on [m; n], if
1. q enters _ at m,
2. q is inside _ on the interval [m; n],
n = restbodyqm
['=_], where ' = restbodyq
) _ dom(localsqn
Given the endpoint of the evaluation of an expression or statement, the beginning of the evaluation is uniquely determined, since it is not allowed that the thread just returned from a class initialization.
Lemma . If q evaluates _ on the interval [m; n] and q evaluates _ on the interval [m0; n], then m = m0.
The ASM formalism allows us to speak about steps in a run of a machine.
Since the steps in the run of JavaT are di_erent from the steps of the single threads, we need some terminology to speak about the steps of an individual thread. The previous step of a thread is (the number of) its step which was done in the last state in the run of the ASM where the thread has been the current thread.
De_nition (Previous step). The previous step of thread q before n is m, if
1. m < n,
2. threadm = q,
3. for each i, if m < i < n, then threadi 6= q.
The walk of the current position through the body of a method can branch at certain positions. For most positions in the abstract syntax tree, however, when they are entered the preceding position is uniquely determined. It is called the predecessor of the position.
De_nition (Predecessor). Position _ is a predecessor of position _ in method _ (written _ __ _), if for each thread q of state n which is in _ at n and enters _ at n there is a previous step m of q before n such that
The direct subexpressions of an expression
Lemma (Predecessor). If the direct subexpressions (Table above) of _exp in the body of _ are _1exp1; : : : ; _n expn, then a __ _1 and _i __ _i+1 for each i with 1 _ i < n. For other kinds of phrases the predecessors are listed in Table below.
Proof. By induction on the run of execJavaThread. If the rule executed in the step of the ASM moves the current position upwards or does not move the position at all, nothing has to be shown, because in that case by de_nition, the thread in the current position does not enter an expression or statement.
Hence the rules which move the current positions inwards or forwards have to be considered only. ut
The stack of a thread may grow and shrink. New frames are pushed on the stack when class or instance methods are invoked, when class initializers are executed, or when new instances are created and a constructor is called.
The positions which cause the growth of the stack are categorized as follows:
De_nition (Position categories). We say that
1. position _ is a method invocation position for _ in:
The predecessors of expressions and statements
For each thread the current method is either a class method, an instance method, a constructor, or a class initialization method. If the stack of the thread is empty, then the current method is the run method of the thread.
De_nition (Run method). We say that C=m is the run method of thread q, if
1. m = run(),
2. getField(q; Thread=target) = r ,
3. classOf (r ) = A,
4. lookup(A; Thread=run()) = C.
The ASM rules which change the stack of a thread are invokeMethod and exitMethod. A new frame is pushed with invokeMethod when a class is initialized, when a class or instance method is invoked, or when a new instance is created and a constructor is invoked. A frame is popped with exitMethod when a value is returned to the invoking frame or when an exception is passed to the invoking frame. The next lemma simply states, that the current method has been called at some point in the past. The lemma must be proved simultaneously with Lemma.
Lemma 3 (Invocation). For each thread q of state n in the run of execJavaThread, either methqn
is the run method of q and framesqn
empty stack, or there exists k and m such that k _ m < n and
_ ( ; ; _; ),
2. one of the following is true:
is a class or instance method and posqm
is a method invocation
position for _ in the body of methqm
is a constructor and posqm
is a constructor invocation position
for _ in the body of methqm
is a <clinit> method, posqm
= _ and _ is a class initialization
position in the body of methqm
3. for each i, if m < i _ n, then framesqn
4. q evaluates _ on the interval [k;m].
Proof. By induction on the run of execJavaThread. In most steps of the run the frame stack of a thread remains unchanged and nothing has to be shown. A class or instance method is invoked at a method invocation position.
A constructor is invoked at a constructor invocation position. A <clinit> method is invoked at a class initialization position. ut
The following lemma says that execJavaThread respects the recursive structure of expressions and statements. At each point in the computation there is a unique point in the past where control entered the expression or statement at the current position. In the interval between the entry point and the current point the expression is evaluated according to Def. below.
This means that all changes made to restbody occur in the subtree rooted at the current position. Outside of this area, restbody is not touched.
Lemma (Evaluation). For each thread q of state n in the run of execJavaThread, if posqn = _, then there exists an m _ n such that thread q evaluates _ on the interval [m; n].
Proof. By induction on the run of execJavaThread. Lemma is used when values or exceptions are returned to the parent frame. ut
It is not possible that control enters an expression or statement and returns to the root position of the expression or statement before it is completely evaluated. At the current position in restbody there is always an unevaluated expression, an unevaluated statement, a value, the constant Norm, or an abruption.
Lemma . For each thread q of state n in the run of execJavaThread,
is neither a value nor the constant Norm nor an abruption,
is an unevaluated expression or an unevaluated
Proof. By induction on the run of execJavaThread. Lemma is used when the current position moves to the next subexpression. ut