Printer Friendly

Type elaboration and subtype completion for Java bytecode.

1. INTRODUCTION

Type elaboration is a technique for type inference on verifiable Java bytecode. Verification provides type consistency rules that are based upon program flow and safety checking. These are weaker than the rules for static typechecking in Java source, and there are verifiable bytecode programs that are not typable under the Java typing rules. The central issue is that the type system of bytecode verification is based upon sets of types, and there may not be names for all of these sets in the Java type hierarchy. Subtype completion is a technique for adding a minimal number of new type names to make the bytecode typable.

The present work was motivated by our goal of using a strongly typed intermediate representation as part of the Marmot bytecode-to-native-code compiler [Fitzgerald et al. 2000]. Strongly typed programming languages have long been recognized as improving program correctness and enhancing efficient implementation. More recently, it has been observed that type-based intermediate representations and type-based compilation can extend these advantages to a compiler itself [Morrisett 1995; Tarditi 1996; Leroy and Ohori 1998]. The use of types provides benefits in debugging the compiler, as well as in optimization and garbage collection.

Many Java compilers use (or at least will accept) Java bytecode [Lindholm and Yellin 1999] instead of Java source programs as input [IBM 1998; Instantiations, Inc. 1998; NaturalBridge, LLC 1998; SuperCede, Inc. 1998]. However, some of the type information in the original Java source program is lost during the initial translation to bytecode. In order to use a typed intermediate representation, it is first necessary to reconstruct a strongly typed representation from the partially typed bytecode.

The type system that we have chosen as the basis of our typed intermediate representation is a relatively simple one that has several advantages. First, it is very similar to the type system of Java. Second, it does not require sets of types representations, as does bytecode verification. Third, it is easy for a human to read, verify, and comprehend the typings. Fourth, it arises in a principled way from the (implicit) type system of bytecode verification. Finally, it is efficient to typecheck a program under this type system.

Type elaboration is performed once, near the beginning of the compilation. After that, the fully typed program can be efficiently typechecked. In the standard mode of operation, the Marmot compiler will typecheck the program 10 times, and even on our largest benchmarks, each typecheck takes less than a second. During debugging, the system can perform dozens of typechecks in order to identify and isolate faults.

In addition to its utility in type-directed compilation, type elaboration for bytecode solves a practical problem for Java decompilers such as Mocha that attempt to reconstruct Java source from Java bytecode. Once again, the simplicity of the underlying type system and its similarity to the original Java type system are advantages for this problem, since the goal is to reconstruct the original program typings.

In order to describe type elaboration, it is necessary that we examine three languages: Java source code, Java bytecode, and a typed intermediate representation that we refer to as the Java Intermediate Representation, JIR. These three languages have distinct, but related type systems. The Java source language has a traditionally defined type system presented as part of the standard language definition [Gosling et al. 1996]. Bytecode, per se, is untyped (or partially typed), but the rules for bytecode verification provide further consistency requirements based on dataflow and safety checking. Finally, the type system for JIR is closely related to the original type system for Java, and can be given a conventional set of typing rules. This paper investigates the formal relationship between these three typed languages and their type systems. Our main technical contributions are:

(1) We present a practical algorithm for type elaboration that accepts any verifiable bytecode.

(2) We describe a technique called subtype completion, which transforms a subtyping system by conservatively extending its type hierarchy to a lattice. The technique is founded on the Dedekind-MacNeille completion, a mathematical technique for embedding a poset in a lattice.

(3) We formalize type-based safety checking present in Java bytecode verification. We show that subtype completion performed on a Java-like type system inserts exactly the extra types needed for verification, and it gives rise to a strongly typed intermediate representation (JIR) in a principled manner, together with a provably correct type inference algorithm.

The paper is organized as follows. In Sections 2 and 3, the types of bytecode, Java, and JIR are discussed. In Sections 4 through 6, we present an abstraction of the problem and present the technical results on subtype completion. In Section 7, we return from the abstract problem to the concrete problem, and describe a number of other issues that must be addressed in implementing type elaboration, along with some comments on complexity and performance. Finally, we survey related work in Section 8 and offer conclusions in Section 9.

2. TYPES IN JAVA BYTECODE

Java front-end compilers are responsible for translating Java source to bytecode. Java source code programs include complete static type information, but front-end compilers omit some of that information in the conversion to bytecode. The loss of type information in bytecode is apparent in several places:

--Local variables do not have type information.

--Evaluation stack locations are untyped.

--Values represented as small integers (defined as booleans, bytes, shorts, chars, and integers) are convolved within bytecode methods.

Separating the various types of small integers is especially useful because they are semantically distinct and valid in differing contexts. For example, while the representation of a boolean may be as another small integer type, boolean values should not be used in arithmetic expressions, and integer values should not be used where a boolean is expected.

Although bytecode has lost some of the original typing of the Java source, nevertheless, much of the original type information from the program source has been preserved:

--All class fields maintain representations of the originally declared types.

--All function formals have types.

--The return value of a method, if any, has a type.

--Verified bytecode implies certain internal consistency in the use of types for locals and stack temporaries.

What is missing is most of the type information within a method.

The bytecode specification includes provisions for debug types on locals. This would appear to simplify at least part of the problem of elaboration. Unfortunately, (a) the debug types for locals are optional, (b) they do not distinguish between small integer types, (c) debug information is not available for the VM stack locations, and (d) they are wrong or incomplete in some bytecode files.

Type elaboration assumes verified bytecode as input, but does not depend upon the local variable debug information. If it is available, and deemed reliable, it is easy to employ. Bytecode verification assures various dynamic safety properties of the input program. It does not directly solve the problem of type elaboration because it does not distinguish between small integer types and handles multiple inheritance issues differently than the Java type system does.

3. THE TYPE SYSTEMS OF JAVA AND JIR

The type system of Java is defined, in part, by the widening conversions of the language [Gosling et al. 1996]. These encompass both the subtyping rules for reference types and the implicit coercions for numeric types.

All of the widening conversions may be combined to form a partial ordering of the types of Java. We write A < B if there is a widening conversion from type A to type B. Figure 1 shows an example type hierarchy represented as a partial-order diagram. (1)

[FIGURE 1 OMITTED]

The type system of JIR is the same as that of Java except that all primitive numeric types are incomparable (i.e., there are no implicit representation-changing coercions) and the JIR system contains extra types in the subtype hierarchy.

Type hierarchies for Java programs, as in other object-oriented languages, are partial orders, but not necessarily lattices. Figure 2 contains the outline of four interface definitions that give rise to the fragment of a type hierarchy shown. In this hierarchy, there is no least upper bound of I and J and no greatest lower bound of SI and SJ.
Fig. 2. Type hierarchy representing multiple
inheritance using interfaces.

interface SI {
    void siMeth();
}

interface SJ {
    void sjMeth();
}

interface I extends SI, SJ {
    ...
}

interface J extends SI, SJ {
    ...
}


Figure 3 contains a sample method (shown in pseudocode rather than bytecode for clarity). The task for type elaboration is to solve for the type of local variable x relative to the type hierarchy in Figure 2. We can see from the definitions of x that its type must be a supertype of both I and J, and further from the two uses, it must be a subtype of both SI and SJ.
Fig. 3. Sample method that can not be typed
in the given type hierarchy.

void foo(boolean flag, I i, J j) {
    if (flag) {
        x = i;
    } else {
        x = j;
    }
    x.siMeth();
    x.sjMeth();
}


This method is an interesting case in bytecode verification. Verification defines the merging of two occurrences of a local variable at join points (such as the control point succeeding the if) as containing "an instance of the first common superclass of the two types" [Lindholm and Yellin 1999, p. 146]. As has been noted by Qian [1998], Goldberg [1997], and others, it is not clear how this should be interpreted when the types in question are interfaces, especially when multiple inheritance is involved. (2)

There appears to be general agreement, both in the work on formalizing Java bytecode verification and in the actual implementations of bytecode verifiers, that this step in verification requires that sets of types be employed. Moreover, the merge of the type states at the join point is the union of the possible types. Under this interpretation, the method in Figure 3 is verifiable, but not typable in the given type hierarchy [Goldberg 1997; Coglio et al. 1998; Qian 1998; Pusch 1999; Gagnon and Hendren 1999].

4. AN ABSTRACT TYPE SYSTEM FOR JAVA

In this section, we begin to study the problem of type elaboration in depth. We consider an abstract and simplified version of the problem which focuses on the core issue of how the three type systems (Java, bytecode, and JIR) are related. The typed language J is an abstraction of Java which captures the salient properties of the type elaboration problem. The syntax of the language J and its types are shown in Figure 4.
Fig. 4. Syntax of J and its types.

The language of J

Local variables          z
Parameters               x
Field names              a
Method names             f
Constants                c

Expressions              e ::= c | x | z | e.a | e.f([e.sub.1], ...,
                         [e.sub.n])

L Expressions            le ::= z | e.a

Statements               s ::= le = e | [return.sub.f] e | if(e)
                         [s.sub.1] else [s.sub.2] | let z = e in
                         s | [s.sub.1]; [s.sub.2]

Declarations             d ::= [omega] :: f([x.sub.1] :
                         [[tau].sub.1], ..., [x.sub.n] :
                         [[tau].sub.n]){s}

The Types of J

Primitive types          [pi]

Reference types          [omega] ::= null| ...

Base types ([T.sub.0])   [tau] ::= unit| [omega] | [pi]

Types (T)                [sigma] ::= [tau] | [omega].[tau] |
                         [tau] [right arrow] [tau]'


4.1 The Language J and Its Types

The language J defined in Figure 4 includes method declarations with typed parameters and assignable local variables introduced in let-statements. The types of J (ranged over by [sigma]) are built from base types (ranged over by [tau]) which include the special type unit together with reference types (ranged over by [omega]) and primitive types (ranged over by [pi]). Reference types include the special type null and object types. Primitive types include the boolean type and the numerical types of Java. Let T denote the set of all types, and let [T.sub.0] denote the set of base types. Base types are assumed to be organized as a poset H = <[T.sub.0], <[less than or equal to]>, which defines a subtype or inheritance hierarchy.

The expression form e.a is field selection. A field type is a pair [omega].[tau], where [omega] is an object type in which the field exists, and [tau] the type of the field itself. The form e.f([e.sub.1], ..., [e.sub.n]) is method invocation, and method types have the form ([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]'. We assume that parameters, ranged over by x, include unique tokens [this.sub.[omega]] for each reference type [omega]. Local variables are introduced using let-statements of the form let z = e in s. Method definitions are accommodated by declarations [omega] :: f([x.sub.1], ..., [x.sub.n]){s}, which define a method f with body s, within the reference type [omega]. A return-statement is assumed to be tagged with the name of the method in whose declaration it occurs; any return statement in the declaration of method f must have the form [return.sub.f] e. A phrase, M, is either a statement, a declaration, or an expression.

Type system J. The type system J is defined in Figure 5. These rules define derivable typing judgments of the form [SIGMA]; A [??] M: [tau]. The intended reading of such a judgment is that, under the typing assumptions given by the signature [SIGMA] and the type environment A, the phrase M has type [tau]. A signature is a function mapping field names, method names, parameters, and constants to types. For M to be typable, all the field names, method names, and constants occurring in M must be given types by [SIGMA]. The signature is intended to model declared types and the types of the basic constants of the language, which include predefined functions, such as arithmetic functions. It follows that only local (let-bound) variables have no declared types. A type environment, A, is a set of type assumptions of the form z : [tau], which assigns type [tau] to local variable z. Only one assumption may occur for a given variable. (3) The rules of Figure 5 are parametric in a given subtype hierarchy H and the signature [SIGMA].

[FIGURE 5 OMITTED]

We need to require for a method name f that it is appropriately declared in the hierarchy of reference types. For this purpose, let Decl([omega]) denote the set of method names f such that f is declared as a method of [omega], with [omega] a reference type in H. Furthermore, define

sig([omega], f) = {[omega]' [member of] H | [omega] [less than or equal to] [omega]', f [member of] Decl([omega]')}.

In other words, sig([omega], f) denotes the set of all reference types above [omega] in H that declare a method named f. Whenever a method name f is used, we can now require that it must have a declared type by requiring sig([omega], f) [not equal to] 0. Also, by suitably renaming method names, we can assume without loss of generality that every method name f can be given a single declared type, [SIGMA](f). For a set S of reference types we define

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Notice that under this definition one has S [subset or equal to] S' => sig(S', f) [subset or equal to] sig(S, f).

We let J([SIGMA], H) denote the system obtained by using a specific hierarchy H and signature [SIGMA] in the rules for system J.

The type inference problem for J is to reconstruct types for the local variables of a program in such a way that the program is well typed according to the rules of Figure 5. More precisely, given a phrase M, a signature [SIGMA], and a hierarchy H, the type inference problem is to decide whether there exists a type [tau] and an assignment A of types to the locals of M such that [SIGMA]; A [??] M : [tau]. Note that because H can contain arbitrary finite subposets of interface hierarchies, the type inference problem for system J is NP-complete by reduction from satisfiability of inequalities over posets. The reduction follows from previous results on the complexity of subtype inference [Lincoln and Mitchell 1992; Tiuryn 1992; Pratt and Tiuryn 1996; Benke 1993; Hoang and Mitchell 1995], and a proof of NP-completeness for a framework similar to ours can be found in Gagnon and Hendren [1999].

5. BYTECODE VERIFICATION

This section formalizes the relevant part of the rules of Java bytecode verification as they apply to our intermediate language J. It takes the form of a system of flow constraints generated from a program and a set of safety-checking rules. The intention is that a program passes the verifier if and only if the safety-checking rules are satisfied by the least solution to the flow-system generated from the program.

5.1 Flow System

The flow system shown in Figure 6 conservatively approximates the set of types of the values that a subexpression can evaluate to. The system of constraints generated from phrase M is denoted F [??]M[??]. For the purpose of defining the flow system, we assume that each variable in the program has been renamed, so that the names are distinct. For each distinct occurrence of a subexpression e in the program, where e is not a local variable and not a parameter, the system uses a distinct flow variable named [[chi].sub.e] to describe the possible types of e. Moreover, for each local variable z and each parameter x there will be a unique flow variable [[chi].sub.z], respectively [[chi].sub.x]. Flow variables range over finite subsets of [T.sub.0]. All constraints take the form [[chi].sub.e] [subset or equal to] [[chi].sub.e'] or [[chi].sub.e] = [tau]. The latter form is a shorthand for [[chi].sub.e] = {[tau]}.

[FIGURE 6 OMITTED]

The flow rules shown in Figure 6 should be understood in conjunction with the verification rules shown in Figure 7. Notice that the flow rules of Figure 6 exploit type declarations to localize the flow computation as much as possible. For example, because every field name a is assumed to have a declared type [SIGMA](a), there is no flow rule for an assignment e.a = e'. Instead, the flow variable [[chi].sub.e.a] gets the singleton value of the declared field type, [tau], where [SIGMA](a) = [omega].[tau]. The verification rules, however, will contain a check to make sure that, for an expression of the form e.a = e', the flow value associated with e' is consistent with the declared type [tau] (see rule (S2) of Figure 7).
Fig. 7. Verification rules for system J.

(S1) For every occurrence of a subexpression of the form e.a:
check that [[chi].sub.e] [??] [omega], where [SIGMA](a) = [omega].[tau]

(S2) For every occurrence of a statement of the form e.a = e':
check that [[chi].sub.e'] [??] [tau], where [SIGMA](a) = [omega].[tau]

(S3) For every occurrence of a subexpression of the form e.f([e.sub.1],
..., [e.sub.n]): check that sig([[chi].sub.e], f) [not equal to] 0 and
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [SIGMA](f)
= ([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]'

(S4) For every occurrence of a statement of the form [return.sub.f] e:
check that [[chi].sub.e] [??] [tau]', where [SIGMA](f) = [tau]
[right arrow] [tau]'

(S5) For every occurrence of a statement of the form if(e)
[s.sub.1] else [s.sub.2]: check that [[chi].sub.e] [??] boolean


LEMMA 5.1. For every phrase M, the constraint system F [??]M[??] has a least solution.

PROOF. See Appendix A. []

5.2 Verification Rules

For a given phrase M, let [[chi].sub.e] [subset or equal to] [T.sub.0] denote the meaning of flow variable [[chi].sub.e] under the least solution to F [??]M[??]. Then the verification rules for M are as defined in Figure 7. These rules use the notation S [??] [tau], where S is a subset of [T.sub.0], defined by setting S [??] [tau] iff [for all] [tau]' [member of] S. [tau]' [less than or equal to] [tau]. If the least solution to F [??]M[??] satisfies the verification rules of Figure 7 using signature [SIGMA] and hierarchy H, then we say that M is safe with respect to [SIGMA] and H.

LEMMA 5.2. If M is typable in system J([SIGMA], H), then M is safe with respect to [SIGMA] and H.

PROOF. By induction on the proof that M types in system J([SIGMA], H). []

Note that verification accepts more programs than system J. Consider the method foo shown in Figure 3. It cannot be typed in System J, because the conditional requires x to have a type larger than both I and J, and the method invocations require x to have a type smaller than both SI and SJ; but no such type exists in the hierarchy shown in Figure 2. However, since [[chi].sub.x] = {I, J} holds for the least solution to F [??]foo[??], we have [[chi].sub.x] [??] SI and [[chi].sub.x] [??] SJ, and hence it is easy to see that foo satisfies the verification rules of Figure 7.

6. SUBTYPE COMPLETION AND THE JIR TYPE SYSTEM

We will now show how the type system of JIR emerges by a completion of system J. The resulting JIR type system is a conservative extension of system J, which accepts exactly the verifiable (safe) programs. The construction is an application of the Dedekind-MacNeille completion known from lattice theory [MacNeille 1937; Birkhoff 1995; Davey and Priestley 1990]. Intuitively, this completion technique enriches the hierarchy H to a minimal lattice, by inserting missing least upper bounds and greatest lower bounds into H. Minimality means that the completion inserts new elements only where necessary.

Other completion methods are known, such as powerset completion and ideal completion [Davey and Priestley 1990]. Moreover, the set-based flow-system of Section 5 is of course sufficient for verification. However, these methods introduce "unnecessary" elements. Consider a hierarchy P with types {[perpendicular to], A, B, C, D, [??]}, ordered by x [less than or equal to] [??] and [perpendicular to] [less than or equal to] x for all x in {A, B, C, D}. Ideal completion will include a distinct element for each of the 16 subsets of {A, B, C, D}. Each such element represents the least upper bound of the types in the subset. Flow-based safety checking may consider any subset of P. In contrast, Dedekind-MacNeille completion inserts no elements at all, since P is already a lattice and hence the minimal lattice containing itself.

The result that type elaboration captures bytecode verification shows that, even though flow-based safety checking may consider many more sets than are produced by completion, the distinctions that can be made using these "extra" sets are irrelevant for safety.

6.1 The Dedekind-MacNeille Completion

Let <P, [less than or equal to]> be an arbitrary poset. The Dedekind-MacNeille completion of P, denoted DM(P), is the least complete lattice containing P as an isomorphic subposet. An order ideal is a downward closed subset of P, and the principal ideal generated from an element, x, is defined as [down arrow] x = {y [member of] P | y [less than or equal to] x}. The poset P is contained in DM(P) by the embedding x [??] [down arrow] x. In particular, DM(P) preserves existing joins and meets in P. (4) We proceed to outline how DM(P) is constructed from P. If A [subset or equal to] P and x [member of] P we write x [less than or equal to] A if and only if x [less than or equal to] y for all y [member of] A, and we write A [less than or equal to] x if and only if y [less than or equal to] x for all y [member of] A. Define the sets [A.sup.u] and [A.sup.l] as

[A.sup.u] = {x [member of] P | A [less than or equal to] x} and [A.sup.l] = {x [member of] P | x [less than or equal to] A}.

Define the operator C by C(A) = [A.sup.ul]; then one has the basic properties A [subset or equal to] C(A), A [subset or equal to] B implies C(A) [subset or equal to] C(B), and C(C(A)) = C(A). Now define the family of subsets DM (P) by setting

DM(P) = {A [subset or equal to] P | C(A) = A},

i.e., DM(P) is the family of all subsets of P that are closed with respect to the operator C, ordered by set inclusion. Meet and join in DM(P) are given by

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where the operator C is as defined above. All elements of DM(P) are order ideals, but not every ideal is an element of DM(P).

6.2 The JIR Type System

We now show how the type system of JIR arises from that of J([SIGMA], H) by applying the Dedekind-MacNeille completion to J. Its definition is given in Figure 8. This system is isomorphic to system J([SIGMA], H) (Figure 5), in which the type structure has been reinterpreted over DM(H). Its base types are just the elements of DM(H), ranged over by I. We construct method types I [right arrow] I' by setting

I [right arrow] I' = {([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]' | [[tau].sub.i] [member of] [I.sub.i], [tau]' [member of] I'},

and for a set [OMEGA] [member of] DM(H) consisting of object types and a base type element I we define the field type [OMEGA].I by setting

[OMEGA].I = {[omega].[tau] | [omega] [member of] [OMEGA], [tau] [member of] I}.

We translate types [sigma] of J to types dm([sigma]) of JIR by taking principal ideals of base types. (5)

[FIGURE 8 OMITTED]

We write Boolean = [down arrow] boolean, Unit = [down arrow] unit and Null = [down arrow] null. A signature [SIGMA] is then translated to the signature DM([SIGMA]) = dm [omicron] [SIGMA]. The resulting system is denoted JIR(DM([SIGMA]), DM(H)).

The order relation in Figure 8 can be thought of in two (isomorphic) ways: as set inclusion, or as an extension of the order on H. It is useful to stress the latter view in the context of decompilation, where we desire typings that are as close as possible to the type system J. The completion DM(H) is instrumental

to this by inserting only a minimal number of new points into H. In the example of Figure 2, this amounts to inserting a new type, IJ, which is the least upper bound of I and J and the greatest lower bound of SI and SJ.

6.3 Safety, Typability, and Type Inference

This section contains our main theoretical results on typability and type inference for JIR. We characterize the type inference problem for JIR, and show that the type system of JIR exactly captures the notion of safety of bytecode verification.

Type elaboration for JIR reconstructs types for local (let-bound) variables. Type elaboration can be performed by solving a system of equalities and inequalities between types and type variables generated from the subexpressions of a given program, as specified in Figure 9. The constraint system generated from the phrase M is denoted I [??] M [??]. Type variables [alpha] [member of] TyVar range over elements of DM(H). The constraint generation rules are given in terms of a signature [SIGMA]. Constraints take the form [xi] [less than or equal to] [xi]', where [xi] and [xi]' are either a constant from DM(H) or a type variable, [alpha].

[FIGURE 9 OMITTED]

The following lemma records the fact that typability in the JIR type system is exactly captured by satisfiability of the constraint systems I [??] M [??].

LEMMA 6.1. A program M is typable in JIR(DM([SIGMA]), DM(H)) if and only if the system I [??] M [??] is satisfiable in DM(H). Moreover, every solution to I [??] M [??] corresponds to a valid typing derivation in JIR(DM([SIGMA]), DM(H)).

PROOF. The constraint system is a close reformulation of the typing rules. A detailed proof can be constructed along the lines of Theorem 2.1 in Wand [1987] and Kozen et al. [1994], and is omitted. []

Our next theorem establishes that the type system of JIR exactly captures the bytecode verification system defined in Section 5. The proof essentially consists of showing that the least solution to F [??]M[??] can be translated to a minimal solution to I [??] M [??], provided that M is safe, and, conversely that a minimal solution to I [??] M [??] can be translated to the least solution to F [??]M[??] such that the solution satisfies the verification rules.

THEOREM (SOUNDNESS AND COMPLETENESS). A program M is typable in system JIR(DM([SIGMA]), DM(H)) if and only if M is safe with respect to [SIGMA] and H.

PROOF. See Appendix A for the proof. []

We now consider type inference. By Lemma 6.1, type inference for JIR reduces to solving type constraints over the lattice DM(H). Our next theorem characterizes the least solution to a satisfiable type constraint system. The theorem yields a low-order polynomial type inference algorithm, and it is the foundation for the JIR type inference algorithm used in type elaboration.

Note that all type constants in I [??] M [??] are principal ideals, of the form [down arrow] [tau]. To solve the constraints, it is not necessary to actually form these ideals, because we can represent a principal ideal, [down arrow][tau], by its generator, [tau]. Accordingly, we define a translation [??]*[??] on types, given by [??][down arrow][tau][??] = [tau], [??][alpha][??] = [alpha], and we lift the translation to constraint sets C of the form I [??] M [??] by defining [??]C[??] = {[??][tau][??] [less than or equal to] [??][tau]'[??] | [tau] [less than or equal to] [tau]' [member of] C}. If C = I[??] M [??], and [alpha] is a variable in C, define the set [D.sub.[alpha]] by setting

[D.sub.[alpha]]={[tau] [member of] H | [tau] [less than or equal to] [alpha] [member of] [[??]C[??].sup.*]}

where [[??]C[??].sup.*] is the transitive closure of [??]C[??].

THEOREM 6.3. Let C = I [??] M [??] and assume that C is satisfiable. Then there is a unique least solution [mu] to C in DM(H), which is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. See Appendix A for the proof. []

Even though DM(H) may be exponentially large in the size of H, the solution formula in Theorem 6.3 only relies on types which are present in the constraint set. Therefore, solving a system I [??] M [??] derived from a program M constructs only the types in DM(H) which are necessary for typing the particular program M. One can regard this as a kind of lazy completion, which avoids the exponential blow up; Theorem 6.3 gives rise to a polynomial time type inference algorithm. More details on how type elaboration is implemented are given in Section 7.

We will show one more property of DM(H) in the following lemma. It provides an optimization of the formula in Theorem 6.3, and it gives a succinct representation of the sets produced by completion. Whenever a set in DM(H) is isomorphic to an element in H, the representation gives back that element automatically. It is therefore the basis of "decompiling" the types of DM(H) back to the types of H.

To state the lemma, we give a few definitions. For subsets A and B of H, we define the relation [??] by

A [??] B iff [for all] x [member of] B. [there exists]y [member of] A. y [less than or equal to] x

where the relation [less than or equal to] is the order relation of H. If A is a subset of H and x [member of] A, then x is called a minimal element of A iff y = x for any element y [member of] A with y [less than or equal to] x. Let Min A denote the set of minimal elements of A.

LEMMA 6.4. Let A and B be subsets of the poset H. If H has no infinite descending chains, then one has

[A.sup.ul] [subset or equal to] [B.sup.ul] [??] Min ([A.sup.u]) [??] Min ([B.sup.u]).

PROOF. See Appendix A for the proof. []

The previous lemma is applicable to solution sets of the form [(D.sub.[alpha]]).sup.ul] from Theorem 6.3. The lemma shows that the function that maps a set of the form [A.sup.ul] to the set Min ([A.sup.u]) is an order isomorphism. (6) We can therefore represent (more efficiently) the operation [*.sup.l] by the operation Min. Moreover, since [phi]([tau]) = [down arrow][tau] embeds H into DM(H), we know that

(2) [[phi].sup.-1]([A.sup.ul]) = [tau] [??] Min([A.sup.u]) = {[tau]}

(if [A.sup.ul] = [down arrow][tau], then [A.sup.u] = [A.sup.ulu] = [up arrow][tau]; hence Min([A.sup.u]) = {[tau]}. Conversely, if Min([A.sup.u]) = {[tau]}, then [A.sup.ul] = [{[tau]}.sup.l] = [down arrow][tau]). This shows that the representation given by Lemma 6.4 automatically maps elements of DM(H) back to H, whenever this is possible. The application of Lemma 6.4 in type elaboration is discussed in more detail in Section 7.

7. IMPLEMENTING TYPE ELABORATION

Type elaboration has been implemented as part of the Marmot optimizing compiler [Fitzgerald et al. 2000]. This section describes how type elaboration was implemented for the full Java bytecode. In addition to the issues formalized in the abstract system of the last few sections, a number of other, more pragmatic issues must be addressed, including that of typing small integer variables and the Java definition of covariant array subtyping.

7.1 Preliminary Processing

Because the stack-based bytecode is not a convenient compiler intermediate form for reasons beyond type elaboration, bytecode is first converted to a conventional temporary-variable based intermediate form, JIR. In JIR, references to the interpreter stack have been replaced by explicit temporaries which may be treated as normal local variables. (7)

It is legal in bytecode for a single local variable to hold values of distinct types at different places in the method. For example, local_3 may be used as both an int and as an Object within a single method. In verifiable bytecode, this is valid only if the lifetimes of the two uses of the local do not overlap (ignoring subroutines for the moment). Because type elaboration is required to assign a single type to each variable, it is necessary to separate any ambiguous uses of locals. This can be accomplished by renaming all uses of variables with distinct lifetimes. In Marmot, this is accomplished as a by-product of converting to SSA form [Cytron et al. 1989]. SSA form has the property that all static assignments to a variable have unique names. In the current example, the first name might become local_3' 1 with type int and the second local_3' 2 with type Object.

Java bytecode includes instructions that support a form of lightweight subroutines which preserve the local variable context. Such subroutines may be used to represent the finally part of try/finally handlers. The verification rules for locals and their interactions with subroutines are complex. They allow, for example, multiple types for the same live local so long as that local is not referenced in the finally block. As Freund [1998] and O'Callahan [1999] have noted, the space savings from using bytecode subroutines does not appear to justify the substantial additional complexity.

In Marmot, we chose to eliminate these subroutines by inline expansion. Type elaboration could be made to support subroutines directly, and in fact, we prototyped the code to support them, but eventually decided that the simpler expedient of inlining them was the more elegant solution.

For the purposes of the following description of type elaboration, it is assumed that the bytecode input has been preprocessed such that all local variables and stack temporaries have been assigned designators, and all ambiguous uses with distinct lifetimes have been separated.

7.2 The Type Elaboration Algorithm

After preprocessing to JIR, all of the locals that did not have manifest types (either declared or credible debug information) in the bytecode are assigned unique type variables of the form [[alpha].sub.n].

Step 1: Constraint Collection. Constraint collection proceeds on a per-method basis using constraint formation rules analogous to those given in Figure 9. Equality constraints, x = y, are represented as two inequality constraints, x [less than or equal to] y and y [less than or equal to] x.

For the purpose of constraint collection, small integer constants are given the type of the smallest containing small integer type. The signatures for phi applications, inserted as part of the translation to SSA form, are types of the from [[alpha].sup.n] [right arrow] [alpha] where n is the arity for the phi function, and [alpha] is a fresh type variable for each distinct phi.

Step 2: Constraint Closure. Java defines covariant subtyping for array types: by definition A [] [less than or equal to] B [] iff A [less than or equal to] B for all reference types A and B. This rule requires that additional constraints be added to the constraint set in a process called constraint closure. Whenever a constraint is established between two potential array types, another constraint is induced between their element types. For example, if A [] [less than or equal to] B [] is established, then A [less than or equal to] B is also added to the constraint set. Further, recursively, if either A or B is a potential array type, then a constraint relating their element types is added. A potential array type is defined to be an explicit array type A[] or a type variable [alpha] that is related to (greater or less than or equal) to a potential array type.

To relate a type variable's element type to a potential array type, say A[], a fresh type variable is created, and then [alpha] = [[alpha].sub.elt][] is added to the constraints. Then A [less than or equal to] [[alpha].sub.elt] or A [greater than or equal to] [[alpha].sub.elt], as appropriate, is added to the constraint set. This in turn may require further closure on the constraint set. If a potential array eventually turns out not to be an array, then the type variable introduced as its "element" type is disregarded.

The result of constraint collection and constraint closure is a finite set of constraints of the form A [less than or equal to] B, which relate types and type variables as employed in the program. The next task is to solve the constraints, i.e., to find an assignment of the types for all type variables such that the constraints are satisfied.

Step 3: Cycle Elimination. Type elaboration first eliminates cycles in the constraint set by computing the strongly connected component of the constraints under the order relation [Tarjan 1972], and examines the acyclic directed hypergraph induced from the constraint graph by collapsing the nodes in a strongly connected component. Since our type structure is a partial order, all types within a strongly connected component, SCC, are equal, and all type variables in it will receive the same assignment in the solution. The resulting graph is called the SCC graph and represents a partial order, the SCC order.

The SCC graph is then traversed in depth-first order, and the types in each strongly connected component are computed using Theorem 6.3.

Step 4: Constructing Filters. In this step, order filters (8) of the original type hierarchy are used in the construction of solution types. This is done by computing the values [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for all variables [alpha] in the constraint graph, according to Theorem 6.3. Each such value is a subset of H. Note that the intersection of filters is a filter. The filters are computed incrementally by a single depth-first traversal of the SCC graph, where the filter for each node is computed as follows:

(1) If the node contains a base type, T, the solution type is the principal filter generated from T. The principal filters may be precomputed and cached for each source type.

(2) If the node only contains type variables, the solution type is the intersection of the solution types of all immediate predecessors (lower elements in the SCC order).

(3) The solution computed by the previous two steps is cached at each node, so that it can be used to incrementally compute the solutions for its immediate successors in the SCC order in 2.

Step 5: Constructing Types. In this step, we compute minimal elements of the filter intersections computed in Step 4. This turns the filters computed in Step 4 into types (isomorphic to those) of DM(H) and at the same time maps them back to H whenever possible. This step is founded on Lemma 6.4, which allows us to represent the value [A.sup.l] by MinA, where A is a filter intersection. The type hierarchy H may have infinite descending chains due to the array type constructor:

Object [greater than or equal to] Object[] [greater than or equal to] Object[][] [greater than or equal to] ...

However, for any given program, the depth of array types will be bounded, and all descending chains will be finite for the section of H that matters for typing the given program. Lemma 6.4 is therefore applicable.

Step 6: Applying the Solution. One pragmatic issue is that it may not be possible to separate the uses of the small integers. A node in the SCC graph will normally contain at most one primitive type. For example, Object and Clonable would not be in the same SCC node in a verifiable program, since they are distinct in the type hierarchy partial order. However, it is possible for small integer types, e.g., boolean and short, to occur in the same node. This may happen because bytecode verification does not distinguish between these types, and so bytecode may legally use a value of one small integer type where the another is expected (e.g., a short 1 as a boolean condition).

This situation arises rarely in practice. When it does, all type calculations are based upon the join of the small integer types contained in the node. This is equivalent to equating, for example boolean and short, in the type hierarchy for the method being elaborated.

To complete type elaboration, the solution is recorded for each type variable. Further, any implied widening conversions that involve representational changes (e.g., short to integer) are made manifest by inserting explicit coercions in the JIR.

If the bytecode for a method convolves integer types in a way that causes a larger integer value to be used in a context expecting a smaller integer, then applying the solution will also introduce narrowing conversions for small integers. This is the only place where narrowing conversions (ones that lose information) are introduced by type elaboration.

A final pragmatic issue is that the precise definitions of classes and interfaces in the Java type system can, in rare instances, preclude inserting the desired completion point into the type hierarchy. Figure 10 shows a problematic case. This figure shows a crown-like class hierarchy with a combination of classes and interfaces. The point BD is a completion type for classes B and D. If BD is a class then D would require multiple inheritance of superclasses. If BD were an interface, then the instance variables of B and D are not representable (Java interfaces may include static fields, but not instance fields).

[FIGURE 10 OMITTED]

There are several possible work-arounds for this problem. The JIR type system could be modified to allow either a class or an interface to be inserted. Alternately, the point may be typed as Object and down-casts at uses inserted. Note, that since these casts can never fail, they may be implemented without any run-time cost. We chose the last option because it keeps the type system of JIR close to the original Java type system and because it can be implemented with almost the same mechanism used to narrow small integer types.

7.3 Complexity

As implemented in Marmot, in the worst case, the preprocessing of the bytecode (conversion to temporary form, lifetime splitting via SSA, and subroutine inlining) is exponential in the size of the original bytecode. It would suffice to implement preprocessing using a quadratic algorithm, resulting in a linear output. (9) Let m represent the size of the program after preprocessing.

Constraint collection (Step 1) takes time O(m) in the input size of the program (i.e., after preprocessing). Constraint closure (Step 2) takes time and results in a constraint graph that is O(d * m) = O([m.sup.2]) where d is the maximal depth of array type constructors. SCC formation (Step 3) is linear in the size of its input graph: O(d * m) in this case. Calculating filters (Step 4) of lower bounds involves intersecting subsets of H. The sets are of size bounded by h, where h is the size of the hierarchy H. An intersection is performed at most once for each edge in the closed constraint graph. The intersections take time O(h) = O(m) per edge, hence O(d * m * h) in total. Precomputing the filters (represented as boolean vectors of length h) for all type constants requires O(h) calls to reachability in H (viewed as a graph) and hence is O([h.sup.2]). The total cost of Step 4 is therefore O([h.sup.2] + d * m * h) = O([m.sup.3]). Constructing types (Step 5) by computing minimal elements of a set A [subset or equal to] H can be done by processing the elements of A in reverse postorder: for each [x.sub.i] [member of] A taken in that order, we mark all unmarked nodes in H reachable from x in H. If [x.sub.i] is unmarked, we add it to the set Min A. This step visits an edge in H at most once (edges from marked nodes are not visited again); hence Step 5 can be done in time O(h). Finally, applying the solution (Step 6) takes O(m) time. We have shown:

THEOREM 7.1. Let m be the size of the preprocessed program; let d be the maximal depth of array constructors in the program; and let h be the size of the hierarchy H. Then type elaboration can be computed in time O([h.sup.2] + d * m * h) = O([m.sup.3]).

Thus, the complexity of type elaboration excluding preprocessing is O([m.sup.3]). If preprocessing were implemented using linear encoding of subroutines and variable lifetime splitting, then O(m) = O(n) and type elaboration takes O([n.sup.3]) time, where n is the size of the original program. In practice, there is some evidence that the conversion to SSA and subroutine inlining are linear in the average case [Cytron et al. 1989; Freund 1998]. Also, d and h will be small relative to n in most cases.

7.4 Empirical Results

While the worst-case complexity of type elaboration is in terms of the overall program size, it is interesting to examine the cost of type elaboration relative to the size of the individual methods. Figure 11 shows the time it takes to type elaborate a method relative to the size of the unpreprocessed bytecode method. Note that the data are presented on a log-log scale, which makes the data for the very small method appear distinctly on the left of the graph. The data for this figure represents type elaboration run over approximately 22,300 methods from 65 programs. (10) The methods range in size from 1 to 200,634 bytes. The largest method, a class initialization method written by an LALR parser-generator, took 18.5 seconds to type elaborate. Data were collected on an otherwise idle dual-processor Pentium II/300MHz processor computer running Windows NT/4.

[FIGURE 11 OMITTED]

Type elaboration is run early in the optimization pipeline, and hence must solve types for many more methods and many more locals than will persist at later stages. Type elaboration times for entire benchmark programs range from 0.3 seconds to 45.5 seconds, representing approximately 4% of the compilation time. For comparison, SSA conversion represents approximately 5% of compilation time.

Only 2 of the 65 programs had any methods that required that the small integer types be combined (one had 3 methods that convolved small integers; the other had 27 methods).

8. RELATED WORK

Gagnon and Hendren [1999] also present an algorithm for type inference for Java bytecodes based upon solving constraints. Although superficially similar, the approaches differ in important ways. Gagnon and Hendren do not introduce new types. Type elaboration successfully types verifiable bytecode that Gagnon and Hendren's technique can not. Type elaboration is polynomial whereas their technique is exponential and type elaboration supports typing of the small integer types.

Palsberg and O'Keefe [1995] have shown that their notion of safety analysis is equivalent to typability in a subtyping system. A major difference to our results is that their type system is already lattice-based, and the problem of completion does not arise in their work.

Previous work, including Lincoln and Mitchell [1992], Tiuryn [1992], Pratt and Tiuryn [1996], and Benke [1993], has established that subtype inference over arbitrary posets is intractable. However, if the subtype order on base types is a lattice, the problem becomes PTIME solvable [Tiuryn 1992]. Subtype completion exploits this fact by transforming a poset to a lattice. The transformation simplifies the problem and makes it easier to solve. As we have shown, this simplified formulation is adequate for our notion of type safety.

The use of upward and downward closed sets, corresponding to order-filters and order-ideals of types, is standard in many works on object-oriented analysis (e.g., Bacon [1997]). Ait-Kaci et al. [1989] present data structures for efficient lattice operations on partially ordered sets representing inheritance hierarchies. See also Caseau [1993], which describes an algorithm for lattice completion.

There has been a substantial amount of work on formalizing various aspects of bytecode verification [Qian 1998; Stata and Abadi 1998; Coglio et al. 1998; Yelland 1999; Pusch 1999; O'Callahan 1999; Goldberg 1997]. Only a subset of these directly address the issue of verification with class hierarchies that have multiple inheritance. Others are focused on particular aspects of verification such as the polymorphism in subroutines [Stata and Abadi 1998; Freund and Mitchell 1999; O'Callahan 1999] or initialization of locals [Freund and Mitchell 1998]. Goldberg [1997] and Qian [1999] formalize bytecode using dataflow and safety. Coglio et al. [1998] formalize verification using constraints. A number of papers, including Oheimb and Nipkow [1999] and Syme [1997] have formalized the Java type system. League et al. [1999] describe a typed intermediate language for Java based upon system [F.sup.[omega]].

Shivers [1991] presents a technique recovering what we called small integer types for Scheme. That domain is different, and more difficult, because the programs are not statically typed.

9. CONCLUSION

This paper has presented type elaboration, a practical algorithm for recovering a strongly typed intermediate representation from partially typed verifiable bytecode. We showed that the type system of JIR arises in a principled way from the type system of Java and the flow-based verification of Java bytecode. For an abstract formalization of the problem, we showed that the technique of subtype completion adds just enough types to the subtype hierarchy to precisely correspond to the rules for bytecode verification, allowing us to derive a provably correct type inference algorithm.

In addition to issues caused by multiple inheritance, a practical implementation of type elaboration must work for verifiable bytecode that includes distinct variables assigned the same name, convolved small integer types, and the Java rule for covariant array subtyping. We have sketched how these pragmatic issues are addressed in the implementation of type elaboration in Marmot. Although the asymptotic complexity of type elaboration is cubic, it appears to be acceptable in practice.

Type elaboration and the use of a strongly typed intermediate representation has proved to be a useful technique in the development of the Marmot compiler. Although having precise and accurate types is useful in optimization, the main benefit has been as a machine-verifiable weak semantic correctness check used in debugging the optimizations.

Of course, typechecking the intermediate representation can find only some errors in the compiler. The type system that we employ is relatively weak. By employing stronger type systems, it would be possible to catch a larger class of incorrect transformations [League et al. 1999; Morrisett et al. 1997; Necula 1998].

In addition to being too weak, the type system can also be too strong. A minor annoyance in employing a strongly typed intermediate representation is that occasionally one must augment optimization transformations so as to preserve typability of the intermediate representation. Even in the simple case of copy propagation, transformations must be inhibited in certain cases. For example, it is not legal to propagate a null constant into an array indexing context (because the element type is not explicitly stated). Nor is it legal to propagate a smaller array type into the left-hand side of an array assignment replacing a larger array type (because of issues precipitated by the covariant array subtyping rules of Java).

The technique of subtype completion can transform an intractable type inference system into a tractable one while preserving essential safety properties. This may be of independent interest beyond its application to type elaboration.

APPENDIX

A. PROOFS

A.1 Proof of Lemma 5.1

Before proving the lemma, we give a technical definition. For a system F [??]M[??] of flow constraints (Figure 6), we divide its flow variables into two categories, typed and untyped. A flow variable [[chi].sub.e] is called typed, if e is a formal parameter, e is of the form [this.sub.[omega]], e is a method invocation, e is a constant, or e is a field selection. A flow variable is untyped if it is not typed.

LEMMA 5.1 For every phrase M, the constraint system F [??]M[??] has a least solution.

PROOF. To see that any system of the form F [??]M[??] has a least solution if it has any solutions, notice that the constraints in F [??]M[??] have one of the forms [chi] = [tau] or [chi] [subset or equal to] [chi]'. Moreover, constraint variables range over sets of types, which form a lattice. Constraint resolution for systems of the form F [??]M[??] therefore falls within the framework of monotone inequalities over a lattice, which implies the existence of least solutions to solvable constraint systems [Rehof and Mogensen 1999].

To see that every system F [??]M[??] indeed has a solution, note, from the definition of F, that every constraint has the form [[chi].sup.t] = [tau] or [[chi].sub.e] [subset or equal to] [[chi].sup.u], where [[chi].sub.e] is either typed or untyped, [[chi].sup.t] is typed, and [[chi].sup.u] is untyped; moreover, whenever we have [[chi].sup.t] = [[tau].sub.1] and [[chi].sup.t] = [[tau].sub.2] then [[tau].sub.1] = [[tau].sub.2]. It then easily follows that every system can be solved by choosing sufficiently large sets of types for the untyped variables. []

A.2 Proof of Theorem 6.2

THEOREM 6.2 (SOUNDNESS AND COMPLETENESS) A program M is typable in system JIR(DM([SIGMA]), DM(H)) if and only if M is safe with respect to [SIGMA] and H.

PROOF. We begin by proving soundness, i.e., whenever M has a type in system JIR(DM([SIGMA]), DM(H)), then M is safe with respect to [SIGMA] and H.

To prove soundness, assume that M is typable, so that (by Lemma 6.1) we have a least solution [mu] : TyVar [right arrow] DM(H) to the constraint system I [??]M[??] (the fact that a least solution exists follows by an argument similar to the one given in the proof of Lemma 5.1).

We now proceed to define a function [phi] (depending on [mu]), which maps flow variables to subsets of H. It will be shown that [phi] satisfies the constraints in F [??]M[??] as well as all the safety checks of Figure 7. Soundness follows from the existence of [phi] with these properties, because if any one solution to F [??]M[??] satisfies the safety checking rules, then so does the least solution to F [??]M[??] (observe that all checks have the form S [??] [tau], and if S satisfies this condition, then so does any subset of S).

In order to define the map [phi], let Max S denote the set of all maximal elements of the subset S [subset or equal to] H (an element x [member of] S is maximal, if and only if x [less than or equal to] y implies x = y for all y [member of] S). The map [phi] : FlowVar [right arrow] p(H) is then defined by

[phi] ([[chi].sup.t.sub.e] = Max [mu]([[alpha].sub.e])

[phi] ([[chi].sup.u.sub.e] = [mu]([[alpha].sub.e])

where [[chi].sup.t.sub.e] is a typed variable, and [[chi].sup.u.sub.e] is an untyped variable (definitions of typed and untyped flow variables are given in Section A.1). We claim that [phi] solves the system F [??]M[??] and satisfies all safety checks. Notice that all flow constraints in F [??]M[??] have one of the forms [[chi].sup.t.sub.e] = [tau] or [[chi].sub.e] [subset or equal to] [[chi].sup.u.sub.e'], where [[chi].sub.e] is either typed or untyped. Since Max S [subset or equal to] S, it follows that one has

(3) [mu]([[alpha].sub.e]) [subset or equal to] [mu]([[alpha].sub.e']) [??] [phi]([[chi].sub.e]) [subset or equal to] [phi]([[chi].sup.u.sub.e'])

regardless of whether [[chi].sub.e] is typed or untyped. We will use this observation freely in the following. We now show that [phi] solves F [??]M[??] and satisfies all safety checks. We do so by considering all constraints generated from an arbitrary occurrence of a phrase in M. We consider each occurrence by cases over the form of the phrase, and for each such occurrence we consider the associated constraints and safety checks according to the definitions in Figure 6 and Figure 7:

--Case e.f([e.sub.1], ..., [e.sub.n]) with constraint [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [SIGMA](f) = [tau] [right arrow] [tau]'.

The map [phi] solves the flow constraint: One has line [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in I [??]M[??], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is typed. Hence,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

as desired.

The map [phi] satisfies the safety checks: The relevant safety check is (S3), requiring [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. One has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in I [??]M[??], hence [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], from which we get that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], hence also [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and therefore [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASC as desired. Moreover, the safety condition sig([[chi].sub.e]) [not equal to] 0 is satisfied, because the constraint sig([[alpha].sub.e]) [not equal to] 0 is in I[??]M[??].

--Case e.a, with constraint [[chi].sub.e.a] = [tau], where [SIGMA](a) = [omega].[tau].

One has DM([SIGMA])(a) = ([down arrow][omega]).([down arrow] [tau]) and {[[alpha].sub.e] [less than or equal to] [down arrow][omega], [[alpha].sub.e.a] = [down arrow][tau]} [subset or equal to] I [??]M[??]. Therefore, [mu]([[alpha].sub.e.a]) = [down arrow][tau]. Since [[chi].sub.e.a] is typed, one has [phi]([[chi].sub.e.a]) = Max [mu]([[alpha].sub.e.a]) = Max([down arrow] [tau]) = {[tau]}, so [phi] is seen to satisfy the flow constraint.

The safety check requires [phi]([[chi].sub.e]) [??] [omega]. Because [[alpha].sub.e] [less than or equal to] [down arrow][omega] is in I [??]M[??], one has [mu]([[alpha].sub.e]) [subset or equal to] [down arrow][omega], and therefore [mu]([[alpha].sub.e]) ?? [omega]. Because we have Max [mu]([[alpha].sub.e]) [subset or equal to] [mu]([[alpha].sub.e]), it follows that [phi]([[chi].sub.e]) ?? [omega], as desired.

--Case z = e with constraint [[chi].sub.e] [subset or equal to] [[chi].sub.z].

We have [[alpha].sub.e] [less than or equal to] [[alpha].sub.z] in I [??]M[??], so [mu]([[alpha].sub.e]) [subset or equal to] [mu]([[alpha].sub.z]), hence [phi]([[chi].sub.e]) [subset or equal to] [phi]([[chi].sub.z]). Here, we used that z is untyped, which guarantees that [phi]([[chi].sub.z]) = [mu]([[alpha].sub.z]), whereas [phi]([[chi].sub.e]) is either [mu]([[alpha].sub.e]) or Min [mu]([[alpha].sub.e]), and the desired inclusion holds in both cases. There is no safety check associated with an assignment statement.

--Case x with constraint [[chi].sub.x] = [tau], where [SIGMA](x) = [tau]. We have [[alpha].sub.x] = [down arrow] [tau] in I [??]M[??], so that (using the fact that [[chi].sub.x] is typed) one gets [phi]([[chi].sub.x]) - Max [mu]([[alpha].sub.x]) = Max([down arrow] [tau]) = {[tau]}, as desired.

There is no safety check associated with a parameter occurrence.

--Case [omega] :: f([x.sub.1] : [[tau].sub.1], ..., [x.sub.n] : [[tau].sub.n]){s} with constraints [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [SIGMA](f) = ([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]'.

One has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are all typed. It follows that one has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and the flow constraints are seen to be satisfied under [phi]. There is no safety check associated with a declaration statement.

The remaining cases are similar and left out. This concludes the soundness proof.

To prove completeness, we need to show that, whenever the least solution to F [??]M[??] satisfies the safety checking rules of Figure 7, then the constraint system I [??]M[??] has a solution in DM(H), so that (by Lemma 6.1) M types. We will do this by defining a mapping [psi] from type variables to DM(H) depending on the least solution to F [??]M[??]. The function [psi] : TyVar [right arrow] DM(H) is defined by

[psi]([[alpha].sub.e]) = [([[chi].sub.e]).sup.ul]

We show that all constraints generated from an arbitrary occurrence of a phrase in M are satisfied under the mapping [psi], provided that the least solution to F [??]M[??] satisfies all safety checks. We proceed by cases over the form of phrases:

--Case e.f([e.sub.1], ..., [e.sub.n]) with constraints [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [SIGMA](f) = ([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]'.

One has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in F [??]M[??], so that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] holds, and therefore we have [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] which shows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and so [psi] solves the first type inequality mentioned above. To see that the second condition, sig([[alpha].sub.e], f) [not equal to] 0, is satisfied, consider that by (S3) one has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], hence for some [omega]' [member of] [([[chi].sub.e]).sup.u] one has sig([omega]', f) [not equal to] 0. It follows that for each [omega]" [member of] [([[chi].sub.e]).sup.ul] one has sig([omega]", f) [not equal to] 0, which shows that the constraint is satisfied as desired. To see that the third inequality is also satisfied under [psi], consider that the safety check (S3) requires that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and therefore one has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], thereby showing [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], as desired.

--Case e.a with constraints [[alpha].sub.e] [less than or equal to] [down arrow] [omega], [[alpha].sub.e.a] = [down arrow] [tau], where [SIGMA](a) = [omega].[tau].

One has [[chi].sub.e.a] = [tau] in F [??]M[??], hence [[chi].sub.e.a] = {tau]}, and hence [psi]([[alpha].sub.e.a]) = [{[tau]}.sup.ul] = [down arrow] [tau], proving that [psi] satisfies the constraint [[alpha].sub.e.a] = [down arrow] [tau]. To see that [psi] satisfies the constraint [[alpha].sub.e] [less than or equal to] [down arrow] [omega], consider that the safety checking rules require that [[chi].sub.e] [??] [omega], which implies [[chi].sub.e] [subset or equal to] [down arrow] [omega], and hence [psi]([[alpha].sub.e]) = [([[chi].sub.e]).sup.ul] [subset or equal to] [([down arrow] [omega]).sup.ul] = [down arrow] [omega], as desired.

--Case z = e with constraint [[alpha].sub.e] [less than or equal to] [[alpha].sub.z].

One has [[chi].sub.e] [subset or equal to] [[chi].sub.z] in F [??]M[??], hence [[chi].sub.e] [subset or equal to] [[chi].sub.z], hence [([[chi].sub.e]).sup.ul] [subset or equal to] [([[chi].sub.z]).sup.ul], which shows that [psi]([[alpha].sub.e]) [subset or equal to] [psi]([[alpha].sub.z]), as desired.

--Case e.a = e' with constraint [[alpha].sub.e]', [less than or equal to] [[alpha].sub.e.a], where [SIGMA](a) = [omega].[tau].

One has [[chi].sub.e.a] = [tau] by the constraints in F [??]M[??], and moreover [[chi].sub.e'] [??] [tau] by the safety checking rule (S2). It follows from the former relation that [psi]([[alpha].sub.e.a]) = [{[tau]}.sup.ul] = [down arrow] [tau], and since [[chi].sub.e'] [??] [tau], we get [[chi].sub.e'] [subset or equal to] [down arrow] [tau], which implies [psi]([[alpha].sub.e],) [subset or equal to] [down arrow] [tau] = [psi]([[alpha].sub.e.a]). This shows that [psi] satisfies the inequality generated in this case.

--Case x with constraint [[alpha].sub.x] = [down arrow][tau] where [SIGMA](x) = [tau].

By the definition of F [??]M[??], one has [[chi].sub.x] = [tau], and therefore [psi]([[alpha].sub.x]) = [{[tau]}.sup.ul] = [down arrow][tau], showing that [psi] satisfies the inequality in this case.

--Case [omega] :: f([x.sub.1] : [[tau].sub.1], ..., [x.sub.n] : [[tau].sub.n]){s} with constraints [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [SIGMA](f) = ([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]'.

One has [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] by the definition of F [??]M[??]. These relations imply that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The remaining cases are similar. This concludes the proof of completeness. []

A.3 Proof of Theorem 6.3

In order to prove Theorem 6.3 we need to understand the satisfiability problem for inequalities over the lattice DM(H). Here it is useful to look at an abstract version of the problem first, by recalling a general lemma concerning inequalities over an arbitrary complete lattice. In order to state this lemma, we need a few definitions. Let P be any poset. A system of inequalities over P is a finite set of formal inequalities of the form [alpha] [less than or equal to] [alpha]', k [less than or equal to] [alpha] or [alpha] [less than or equal to] k, where [alpha] and [alpha]' range over variables and k ranges over constants drawn from P. A system C is said to be satisfiable in P if and only if there exists a substitution v from variables to elements of P such that [upsilon]([xi]) [[less than or equal to].sub.P] [upsilon]([xi]') is true in P, for all [xi] [less than or equal to] [xi]' [member of] C (here [xi] and [xi]' range over variables or constants from C). The closure of a system C, denoted [C.sup.*], is the least system of inequalities containing C and satisfying the condition

([xi] [less than or equal to] [xi]') [member of] [C.sup.*], ([xi]' [less than or equal to] [xi]") [member of] [C.sup.*] [??] ([xi] [less than or equal to] [xi]") [member of] [C.sup.*].

A system C is said to be consistent if and only if we have

([k.sub.1] [less than or equal to] [k.sub.2]) [member of] [C.sup.*] [??] [k.sub.1] [[less than or equal to].sub.P] [k.sub.2]

for all [k.sub.1], [k.sub.2] [member of] P. Finally, for a system of inequalities, C, define for each variable [alpha] in C the subset [[down arrow].sub.C] ([alpha]) [subset or equal to] P, given by

[[down arrow].sub.C] ([alpha]) = {k [member of] P | (k [less than or equal to] [alpha]) [member of] [C.sup.*]}.

Then we have

LEMMA A.1. Let L be a complete lattice, and let C be a system of inequalities over L. Then C is satisfiable if and only if C is consistent. Moreover, if C is satisfiable, then the substitution [mu] given by

[mu]([alpha]) = [disjunction] [[down arrow].sub.C] ([alpha])

is the least solution to C.

PROOF. It is standard that the substitution [mu] solves C, whenever C is consistent, and proofs of this fact in a subtyping setting can be found in, for example, Lincoln and Mitchell [1992] and Tiuryn [1992]. It is easy to verify that [mu] must be the least solution to C, if indeed [mu] is a solution. []

To prove Theorem 6.3, recall the definition of [D.sub.[alpha]],

[D.sub.[alpha]] = {[tau] [member of] H | [tau] [less than or equal to] [alpha] [member of] [[??] C [??].sup.*]}

THEOREM 6.3 Let C = I [??] M [??], and assume that C is satisfiable. Then there is a unique least solution [mu] to C in DM(H), which is given by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. We apply Lemma A.1 to the case L = DM(H), which allows us to characterize the least solution [mu] to I [??] M [??] by

[mu] ([alpha]) = [disjunction] [[down arrow].sub.C] ([alpha])

with C = I [??] M [??]. Now recall that we have [[disjunction].sub.i[member of]I] [A.sub.i] = [([[Union].sub.i[member of]I] [A.sub.i]).sup.u], by equation (1). Moreover, it is easy to verify, using definitions, that one has [([[union].sub.i[member of]I] [A.sub.i]).sub.u] = [[??].sub.i[member of]I][([A.sub.i]).sup.u] and [([down arrow]x]).sup.u] = [up arrow]x. Using these identities, we can compute as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

which proves that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] solves C. At the first equation in the calculation above we used the fact that the only constants occurring in constraint systems of the form I [??] M [??] are principal ideals, of the form [down arrow][tau].

To see that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], it is sufficient (by previous equations) to show that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. To see the inclusion from right to left, suppose that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], it foll [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], hence x [member of] [([D.sub.[alpha]]).sup.u]. To see the other inclusion, assume x [member of] [([D.sub.[alpha]]).sup.u]. Then for any y [member of] [D.sub.[alpha]] one has x [greater than or equal to] y. Hence x [greater than or equal to] [[down arrow]]y for all y [member of] [D.sub.[alpha]], and therefore also [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], hence [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

A.4 Proof of Lemma 6.4

In order to prove Lemma 6.4 we first prove a couple of auxiliary lemmas. Recall that we define A [??] B by

A [??] B iff [for all] x [member of] B. [there exists]y [member of] A. y [less than or equal to] x.

LEMMA A.2. Let A and B be upward closed subsets of a poser P. Then

A [subset or equal to] B [??] B [??] A.

PROOF. Easy. []

LEMMA A.3. Let A, B be subsets of a poset P. Then

[A.sup.ul] [subset or equal to] [B.sup.ul] [??] [A.sup.u] [??] [B.sup.u].

PROOF. ([??]). One has

[A.sup.ul] [subset or equal to] [B.sup.ul] [??] [([B.sup.ul]).sup.u] [subset or equal to] [([A.sup.ul]).sup.u] [??] [B.sup.u] [subset or equal to] [A.sup.u]

by the identity [A.sup.ulu] = [A.sup.u] and antimonotonicity (with respect to inclusion) of the operations [*.sup.u] and [*.sup.l]. Now, [B.sup.u] [subset or equal to] [A.sup.u] implies [A.sup.u] [??] [B.sup.u], by Lemma A.2, which is applicable since [A.sup.u] and [B.sup.u] are upward closed sets.

(<=). Assume [A.sup.u] [??] [B.sup.u]. Since [*.sup.l] is antimonotonic with respect to inclusion, it is sufficient to show that [B.sup.u] [subset or equal to] [A.sup.u]. So let x [member of] [B.sup.u]. Then, by the assumption, [A.sup.u] [??] [B.sup.u], there exists y [member of] [A.sup.u] with y [less than or equal to] x. Since [A.sup.u] is upward closed and y [member of] [A.sup.u] and y [less than or equal to] x, it follows that x [member of] [A.sup.u]. We have shown [B.sup.u] [subset or equal to] [A.sup.u] as desired. []

LEMMA A.4. Let A and B be subsets of the poset H. If H has no infinite descending chains, then one has

A [??] B [??] Min A [??] Min B.

PROOF. (=>). Assume A [??] B. Let x [member of] Min B. Then, by the assumption, there is y [member of] A such that y [less than or equal to] x. Since H has no infinite descending chains, we can find [y.sub.m] [member of] Min A such that [y.sub.m] [less than or equal to] y. So we have [y.sub.m] [less than or equal to] x, hence we have shown Min A [??] Min B.

(<=) Assume Min A [??] Min B. Let x [member of] B. Then, since H has no infinite descending chains, we can find [x.sub.m] [member of] Min B such that [x.sub.m] [less than or equal to] x. Then, by the assumption, there is y [member of] Min A such that y [less than or equal to] [x.sub.m], hence also y [less than or equal to] x. This shows that A [??] B. []

LEMMA 6.4 Let A and B be subsets of the poser H. If H has no infinite descending chains, then one has

[A.sup.ul] [subset or equal to] [B.sup.ul] [??] Min ([A.sup.u]) [??] Min ([B.sup.u])

PROOF. Follows from Lemma A.3 together with Lemma A.4. []

ACKNOWLEDGMENTS

We would like to thank Erik Ruf, Bjarne Steensgaard, Bob Fitzgerald, and David Tarditi for ideas, implementation, and suggestions on improving the presentation.

(1) In Java the type boolean may not be widened to any other type. However, to solve for the small integer types, it is useful to posit widening conversions for boolean for use within (and only during) type elaboration.

(2) One strict reading of this language would have the first "superclass" of any interface be the Object type, as it is the only "class" that is a supertype of an interface. However, this reading would reject as unverifiable bytecode corresponding to legal Java programs. For example, consider having a method with a local declared as an Enumeration, an interface type, which is initialized in one branch of a conditional as a VectorEnumeration, a class implementing an enumeration for a vector, and in the other branch of the conditional as a HashtableEnumeration. After the join, under the strict reading, this variable could not be used as an Enumeration, but only as an Object.

(3) In order to apply the present framework to a program, the program's variables, field names, method names, and parameters may have to be renamed appropriately. We tacitly assume that this has been done.

(4) Hence, if P is already a lattice, then DM acts as the identity, modulo isomorphism.

(5) In detail, let dm([tau]) = [down arrow] [tau], dm([omega].[tau]) = ([down arrow] [omega]).([down arrow] [tau]), and dm(([[tau].sub.1], ..., [[tau].sub.n]) [right arrow] [tau]) = ([down arrow] [[tau].sub.1], ..., [down arrow] [[tau].sub.n]) [right arrow] ([down arrow] [tau]).

(6) This map is indeed a function: let f(X) = Min ([X.sup.u]), for any X [subset or equal to] H. Then f is a well-defined function, and one has f([A.sup.ul]) = Min ([A.sup.ulu]) = Min ([A.sup.u]), by the identity [A.sup.ulu] = [A.sup.u].

(7) While it is convenient to have unique names for variables during type elaboration, it would be possible to modify the type elaboration algorithms to work directly on the Java bytecode. Stack locations would be identified by both stack depth and program point.

(8) An order filter is an upward closed subset of a poset. The principal filter generated from an element x is denoted [up arrow] x.

(9) Subroutines may be supported using the jsr and ret instructions or encoded as in Freund [1998], and lifetime splitting without SSA conversion in O([n.sup.2]) time and O(n) space where n is the size of the bytecode program.

(10) There is some duplication of methods between the different programs caused, principally, by the runtime system.

REFERENCES

AIT-KACI, H., BOYER, R., LINCOLN, P., AND NASR, R. 1989. Efficient implementation of lattice operations. ACM Transactions on Programming Languages and Systems 11, 1 (January), 115-146.

BACON, D. F. 1997. Fast and effective optimization of statically typed object-oriented languages. Ph.D. thesis, U.C. Berkeley.

BENKE, M. 1993. Efficient type reconstruction in the presence of inheritance. In Mathematical Foundations of Computer Science (MFCS). Springer Verlag, LNCS 711, 272-280.

BIRKHOFF, G. 1995. Lattice Theory, third ed. Colloquium Publications, vol. 25. American Mathematical Society, Providence, RI.

CASEAU, Y. 1993. Efficient handling of multiple inheritance hierarchies. In Proceedings OOPSLA '93. Washington, DC, USA, 271-287.

COGLIO, A., GOLDBERG, A., AND QIAN, Z. 1998. Towards a provably-correct implementation of the JVM bytecode verifier. Tech. Rep. KES.U.98.5, Kestrel Institute. August 1998. Also available in Proceedings of the OOPSLA '98 Workshop on the Formal Underpinnings of Java, Vancouver, B.C., October 1998.

CYTRON, R., FERRANTE, J., ROSEN, B. K., WEGMAN, M. N., AND ZADECK, F. K. 1989. An efficient method of computing static single assignment form. In Proceedings of the Sixteenth Annual ACM Symposium on Principles of Programming Languages.

DAVEY, B. A. AND PRIESTLEY, H. A. 1990. Introduction to Lattices and Order. Cambridge Mathematical Textbooks, Cambridge University Press.

FITZGERALD, R., KNOBLOCK, W. B., RUF, E., STEENSGAARD, B., AND TARDITI, D. 2000. Marmot: An optimizing compiler for Java. Software: Practice and Experience 30, 3 (Mar.), 199-232.

FREUND, S. N. 1998. The costs and benefits of Java bytecode subroutines. In Formal Underpinnings of Java Workshop at OOPSLA. http://ww-dse.doc.ic.ac.edu/~sue/oopsla/cfp.html.

FREUND, S. N. AND MITCHELL, J. C. 1998. A type system for object initialization in the Java bytecode language. In Proceedings OOPSLA '98, ACM SIGPLAN Notices. 310-328.

FREUND, S. N. AND MITCHELL, J. C. 1999. A type system for Java bytecode subroutines and exceptions. Tech. rep., Stanford University, Computer Science Department. April.

GAGNON, E. AND HENDREN, L. 1999. Intra, procedural inference of static types for Java bytecode. Tech. Rep. 1, McGill University.

GOLDBERG, A. 1997. A specification of Java loading and bytecode verification. Tech. Rep. KES.U.92.1, Kestrel Institute. December.

GOSLING, J., JOY, B., AND STEELE, G. 1996. The Java Language Specification. The Java Series. Addison-Wesley, Reading, MA, USA.

HOANG, M. AND MITCHELL, J. C. 1995. Lower bounds on type inference with subtypes. In Proc. 22nd Annual ACM Symposium on Principles of Programming Languages (POPL). ACM Press, 176-185.

IBM. 1998. IBM high performance compiler for Java: An optimizing native code compiler for Java applications, http://www.alphaworks.ibm.com/graphics.nsf/system/graphics/HPCJ/ $file/highpcj.html.

INSTANTIATIONS, INC. 1998. Jove: Super optimizing deployment environment for Java. http://www.instantiations.com/javaspeed/jovereport.htm.

KOZEN, D., PALSBERG, J., AND SCHWARTZBACH, M. I. 1994. Efficient inference of partial types. Journal of Computer and System Sciences 49, 2, 306-324.

LEAGUE, C., SHAO, Z., AND TRIFONOV, V. 1999. Representing java classes in a typed intermediate language. In Proceedings of the 1999 ACM SIGPLAn International Conference on Functional Programming.

LEROY, X. AND OHORI, A., Eds. 1998. Types in Compilation. Number 1473 in LNCS. Springer-Verlag.

LINCOLN, P. AND MITCHELL, J. C. 1992. Algorithmic aspects of type inference with subtypes. In Proceedings of the Nineteenth Annual ACM Symposium on Principles of Programming Languages. 293-304.

LINDHOLM, T. AND YELLIN, F. 1999. The Java Virtual Machine Specification, Second Edition ed. Addison-Wesley.

MACNEILLE, H. M. 1937. Partially ordered sets. Transactions of the American Mathematical Society 42, 90-96.

MORRISETT, G., WALKER, D., CRARY, K., AND GLEW, N. 1997. From system F to typed assembly language. Tech. Rep. TR97-1651, Cornell University.

MORRISETT, J. G. 1995. Compiling with types. Ph.D. thesis, Carnegie Mellon University. Published as CMU Technical Report CMU-CS-95-226.

NATURALBRIDGE, LLC. 1998. Bullettrain Java compiler technology. http://www.naturalbridge.com/.

NECULA, G. C. 1998. Compiling with proofs. Ph.D. thesis, Carnegie Mellon University.

O'CALLAHAN, R. 1999. A simple, comprehensive type system for Java bytecode subroutines. In Proceedings 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 70-78.

OHEIMB, D. V. AND NIPKOW, T. 1999. Machine-checking the Java specification: Proving type-safety. In Formal Syntax and Semantics of Java, J. Alves-Foss, Ed. LNCS, vol. 1523. Springer-Verlag, 119-156.

PALSBERG, J. AND O'KEEFE, P. 1995. A type system equivalent to flow analysis. ACM Transactions on Programming Languages and Systems 17, 4 (July), 576-599.

PRATT, V. AND TIURYN, J. 1996. Satisfiability of inequalities in a poset. Fundamenta Informaticae 28, 1-2, 165-182.

PUSCH, C. 1999. Proving the soundness of a Java bytecode verifier specification in Isabelle/HOL. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS'99), W. R. Cleaveland, Ed. LNCS, vol. 1579. Springer-Verlag, 89-103.

QIAN, Z. 1998. A formal specification of a large subset of Java virtual machine instructions for objects, methods and subroutines. In Formal Syntax and Semantics of Java, J. Alves-Foss, Ed. Number 1523 in LNCS. Springer-Verlag.

QIAN, Z. 1999. Least types for memory locations in (Java) bytecode. Tech. rep., Kestrel Institute. http://www.kestrel.edu/HTML/people/ qian/pub-list.html.

REHOF, J. AND MOGENSEN, T. AE.. 1999. Tractable constraints in finite semilattices. Science of Computer Programming 35, 2, 191-221.

SHIVERS, O. 1991. Data-flow analysis and type recovery in scheme. In Topics in Advanced Language Implementation, P. Lee, Ed. The MIT Press, Chapter 3, 47-87.

STATA, R. AND ABADI, M. 1998. A type system for Java bytecode subroutines. In Proceedings 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 149-160.

SUPERCEDE, INC. 1998. SuperCede for Java, Version 2.03, Upgrade Edition. http://www.supercede.com/.

SYME, D. 1997. Proving JavaS type soundness. Tech. Rep. 427, University of Cambridge Computer Laboratory.

TARDITI, D. 1996. Design and implementation of code optimizations for a type-directed compiler for standard ml. Ph.D. thesis, Carnegie Mellon University.

TARJAN, R. E. 1972. Depth first search and linear graph algorithms. SIAM Journal on Computing 1, 2, 146-160.

TIURYN, J. 1992. Subtype inequalities. In Proc. 7th Annual IEEE Symp. on Logic in Computer Science (LICS), Santa Cruz, California. IEEE Computer Society Press, 308-315.

WAND, M. 1987. A simple algorithm and proof for type inference. Fundamenta Informaticae X, 115-122.

YELLAND, P. M. 1999. A compositional account of the Java virtual machine. In Proceedings 26th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 57-59.

Received February 2000; accepted November 2000

Authors address: Microsoft Research, Redmond, WA, 98052; email {toddk, rehof}@microsoft.com.
COPYRIGHT 2001 Association for Computing Machinery, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2001 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Author:Knoblock, Todd B.; Rehof, Jakob
Publication:ACM Transactions on Programming Languages & Systems
Geographic Code:1USA
Date:Mar 1, 2001
Words:14047
Previous Article:Synthesis of current programs for an atomic read/write model of computation.
Topics:

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters