Here we analyze the dynamic constraints, which are checked at run-time by the defensive VM, to extract from them link-time checkable conditions on bytecode which, if satis_ed, imply that the defensive VM executes the code without violating any run-time check for bytecode type assignments). We re_ne the compilation function by an appropriate type frame, which is associated to every instruction, and show that code, if generated from a correct Java program and following this certifying compilation scheme, does satisfy these link-time checkable bytecode conditions.
Later we will show that for a large class of programs bytecode type assignments can be computed using a _xed point computation. The defensive VM checks dynamic constraints at run-time, at the price of slowing down the performance of the system. The slow down can be avoided if checks are performed only once, namely before running the fast but trustful machine. This is the task of the Java bytecode veri_er.
At link-time, run-time data are not available. For example, the veri_er does not know which reference value will be created by a New instruction or which path the execution will take and, usually, it is impossible to check all possible paths. To compensate this, the Java bytecode veri_er imposes certain restrictions on the bytecode. For example, the veri_er veri_es every independent execution path through the code to ensure that for the veri_cation all reachable instructions in the code are veri_ed at least once. Similarly the veri_er abstracts from the concrete reference generated by a New instruction and restricts the form of the program in such a way that link-time information is su_cient to keep track of un-initialized instances.
Here we formulate such conditions by the notion of a bytecode type assignment. A bytecode type assignment assigns type frames to the code indices of method bodies and must satisfy several constraints. Our notion of bytecode type assignments di_ers from similar notions in the literature; for example it does not use the so-called subroutine call stacks of Stata and Abadi or the modi_cation histories of Qian. Our typing rules for return addresses are simpler than O'Callahan's type system.
We explain the problems with bytecode veri_cation and with current bytecode veri_ers. In particular we show that SUN's treatment of subroutines is inconsistent. We de_ned by stepwise re_nement the possible successors of (certifying) type frame for the de_nition of bytecode type assignments.
Bytecode veri_cation is complicated because of the instructions Jsr(s) and Ret(x ). Without these instructions bytecode veri_cation would be much simpler.
Here we explain the general problems of bytecode veri_cation, and show the particular problems and inconsistencies encountered in current bytecode veri_er implementations.
Why are subroutines problematic?
The instructions Jsr(s) and Ret(x) are used to implement the finally block of Java try statements. The finally block is called with the instruction Jsr(s) and the bytecode starting at code index s is called a subroutine. The instruction Ret(x) is used to return from the subroutine. Control jumps to the address stored in the local variable x . Since the same finally block may be called from di_erent positions, a subroutine is executed in di_erent environments. The subroutine must be polymorphic in the variables which are not modi_ed by the subroutine.
In Fig. below, the finally block is called from several positions: (1) Before the expression i * i is returned, its value is saved in a new local variable x, and then the finally block is executed. Afterwards the saved value of x is returned. (2) The finally block is called at the end of the try block, where the variable j has a value. 3) Because of errors which could occur in the try block, the compiler inserts a default handler which does the following: the value of a possible error or exception is saved in a temporary variable y and then the finally block is executed. After termination of the finally block, the saved value of y is thrown again.
the subroutine S is called from three di_erent positions.
Each time exactly one of the variables j, x, y has a value, whereas the other two variables are unde_ned. When S is called from block A, the variable x contains an int. When S is called at label C, the variable j contains an int.
When S is called in the exception handler H, the variable y contains a pointer which is compatible with Throwable.
Polymorphism is one of the problems with subroutines. Another problem is that subroutines are not always left via a Ret(x ) instruction. Subroutines
A polymorphic subroutine (; CD)
can also be left via a Cond(p; j) instruction or via a Throw. Unfortunately, it is not obvious from the bytecode where exactly a subroutine is left. It is also not obvious to which subroutine an instruction belongs. For example, the finally block can be left via a break statement. This fact is reected in the bytecode: the instruction `ifne R' jumps out of the subroutine S to the label R which belongs to the top-level. It is also possible to jump out of a subroutine to an enclosing subroutine.
Breaking out of a subroutine to the top level (; CD)
the instruction `ifne R1' jumps out of the subroutine S2 to the enclosing subroutine S1.
Breaking out of a subroutine to an enclosing subroutine (; CD)
Moreover, a subroutine can be left via an exception handler. the subroutine S throws an exception of type E. This exception is then caught by the top-level handler H. In some cases it is not so clear which variables are modi_ed by a subroutine just by looking at the bytecode.
4 Jumping out of a subroutine with an exception handler (; CD)
In Fig. below we see in the Java program on the left-hand side that the variable b is not modi_ed in the finally block. In the bytecode on the right-hand side, however, the only way to reach the instruction `istore b' is via the subroutine S. Therefore, why should the variable b not be considered as modi_ed by the subroutine S? Sun's treatment of subroutines is inconsistent. For example, the JVM speci_cation requires that subroutines are not recursive. Sun's JDK 1.2 veri_er, however, accepts some recursive subroutines and rejects others.
Which variables are modi_ed by the subroutine? (; CD)
The bytecode on the left-hand side in Fig. below is rejected with the error `Recursive call to jsr entry'. The bytecode on the right-hand side in , however, is accepted by Sun's veri_er although it is as recursive as the rejected bytecode. It turns out that recursive subroutines are harmless, because return addresses can be stored in local variables only . They cannot be stored in an array. They cannot be loaded on the operand stack. The only operations which are allowed with return addresses are duplication or deletion. Therefore, our notion of bytecode type assignment is focused on polymorphism and allows trivial forms of recursion.
Inconsistencies in Sun's JDK 1.2 veri_er (; CD)
Fig. below shows an example of a valid Java program which is rejected by Sun's JDK 1.2 veri_er[14, bug no. 4268120]. This should not happen, since a veri_er should at least accept all bytecode that has been generated by a correct compiler from valid Java programs.
A valid Java program rejected by Sun's JDK 1.2 veri_er (; CD)
The program in Fig. below suggests that bytecode veri_cation is not possible at all, because it is rejected by any bytecode veri_er we have tried including JDK 1.2, JDK 1.3, Netscape 4.73-4.76, Microsoft VM for Java 5.0 and 5.5 and the Kimera Veri_er. The problem is that in the eyes of the veri_er the variable i is unusable at the end of the method at the return i instruction, whereas according to the JLS [19, x16.2.14] the variable i is de_nitely assigned after the try statement [14, bug no. 4381996]. Our rules of de_nite assignment for the try statementstronger and therefore the program is already rejected by our
A valid Java program rejected by all known veri_ers (; CD)
Another Java program rejected by all known veri_ers (; CD)
What is the type of variable x in the bytecode? (; CD)
compiler. Fig. above shows a similar problem for labeled statements.
Why sets of reference types?
One of the main tasks of bytecode veri_cation is to infer the types of local variables in method bodies. Only the types of the arguments of a method are known because they are contained in the method signature in the class _le.
The types of the other local variables are not known and must be inferred. contains an example with a local variable x of type Comparable for the type hierarchy). Since the class Integer as well as class String implement the interface Comparable, both arguments i and s of the method can be assigned to the variable x in the body.
In the corresponding bytecode, the type of x is not known.
When the veri_er reaches label B during its static analysis, the variable x can contain an Integer or a String. Since both types are reference types, the veri_er tries to merge the two types. The JVM speci_cation says in that `the merged type of two references is the _rst common superclass of the two types and that such a reference type always exists because the type Object is a superclass of all class and interface types.' In our example the _rst common superclass of Integer and String is the class Object Hence, the type assigned to variable x at label B is Object. The method invocation at the next instruction, however, requires an argument of type Comparable and so the veri_cation process fails.
Sun's JDK 1.2 veri_er does not reject the bytecode. Instead it inserts an additional run-time check for methods with arguments of interface type.
Hence, the compatibility of method arguments has to be checked at runtime in contradiction to the JVM speci_cation. A better solution is to allow sets of reference types in the bytecode veri_cation process. The type of variable x at label B above is then the set fInteger; Stringg. The meaning of this type is `either Integer or String'. At the method invocation
A piece of the Java type hierarchy
in the next instruction, the veri_er has just to check that each element of the set of reference types assigned to x is a subtype of Comparable. No additional run-time checks are needed.
Therefore we extend the universe of verify types to include also _nite sets of reference types. Remember that reference types are class types, interface types, array types and the special type Null which is used as the type of the null pointer.
data VerifyType = : : :
j Powerset(Null j Class j Interface j Array)
A single reference type R is often identi_ed with the singleton set fRg. Examples of sets of reference types are:
fInteger; Stringg; fComparableg; fint[ ]; float[ ]g
The compatibility relation v has to be extended to such sets. Therefore we need the notion of an upper bound of a set of reference types.
De_nition (Upper bound). A type A is an upper bound of a set_ of reference types, if B _ A for every B 2 _ .
For example, the upper bounds of the verify type fFloat; Integerg are the following types: Number; Object; Comparable:
The classes Integer, Float and Double are direct subclasses of Number and implement the interface Comparable. The class Number is a direct subclass of Object and does not implement Comparable.
De_nition (Compatibility). Let _ and _ be _nite sets of reference types. We de_ne _ v _ i_ each upper bound of _ is an upper bound of _.
For example, the following compatibility relations are true: fInteger; Stringg v fComparableg
Reason: Integer and String implement the interface Comparable. fDoubleg v fInteger; Floatg
Reason: The upper bounds of fInteger,Floatg are Number, Comparable, Object.
De_nition (Array type). A set _ of reference types is called an array type, if there exists a type A such that A[ ] is an upper bound of _ .
For example, the verify type fint[ ]; float[ ]g is not an array type, because its only upper bounds are Object, Cloneable and Serializable. The verify type fInteger[ ]; Float[ ]g, is an array type, because Object[ ] is an upper bound. The verify type fNullg is an array type, too.
In the soundness and completeness proof for the bytecode veri_er the following properties of the extended compatibility relation are relevant:
Lemma (Properties of sets of reference types).
V1. _ v _ .
V2. _ v _ and _ v _ =) _ v _ .
V3. _ v fAg () A is an upper bound of _ .
V4. If _ v _ and _ v fObject[ ]g, then fA j A[ ] 2 _g v fA j A[ ] 2 _g.
Proof. The properties can be derived using the subtype relation for reference types in Def.