# Polymorphic splitting: an effective polyvariant flow analysis.

1. INTRODUCTIONAdvanced programming languages such as Scheme [Clinger and Rees 1991] and ML [Milner et al. 1990] encourage a programming style that makes extensive use of data and procedural abstraction. Higher degrees of abstraction generally entail higher run-time overheads; hence, sophisticated compiler optimizations are essential if programs written in these languages are to compete with those written in lower-level languages such as C. Furthermore, because Scheme and ML procedures tend to be small, these optimizations must be interprocedural if they are to be effective.

Interprocedural optimizations require interprocedural control-flow and data-flow information that expresses where procedures are called and where data values are passed. In its purest form, the flow analysis problem for these languages involves determining the set of procedures that can reach a given application point and the set of values that can reach a given argument position in an application [Sestoft 1988]. This information can be used to drive program optimizations such as run-time check elimination, inlining, and unboxing that are global in nature, as well as optimizations such as constant folding and loop-invariant code motion that are typically based on special-purpose local analyses.

This article presents a flow analysis framework for call-by-value, higher-order languages. The implementation of the framework is parameterized over different approximations of exact values to abstract values and can thus be used to construct a spectrum of analyses with different cost and accuracy characteristics. A simple monovariant analysis such as OCFA [Shivers 1991] or set-based analysis (SBA) [Heintze 1994] associates with each subexpression of the program a single abstract value describing its potential results. More precise polyvariant analyses [Bulyonkov 1984; Jones and Muchnick 1979] associate several abstract values with each subexpression, corresponding to the different contexts in which the subexpressions are reached. We introduce a novel polyvariance technique called polymorphic splitting that uses let-expressions as syntactic clues to increase precision. Polymorphic splitting borrows ideas from Hindley-Milner [Hindley 1969; Milner 1978] polymorphic type inference systems to create a form of polyvariance that mimics polymorphism. Polymorphic splitting allows a flow analysis to avoid merging information between unrelated calls to a polymorphic function, yielding more precise flow information.

To investigate its utility, we have used our analysis to drive two optimizations. The first is run-time check elimination. Dynamically typed languages such as Scheme require run-time checks at every use of a primitive operator. Furthermore, every function application requires a run-time check to ensure that the function subexpression evaluates to a procedure. Run-time checks are also needed in statically typed languages such as ML that allow operations to be applied to arguments whose domain is smaller than their static type. For instance, the operation hd has type [Alpha] list [approaches] [Alpha], but may be applied only to nonempty lists. Run-time checks can add significant overhead to program execution, but checks at a given operator can be safely avoided if all possible values for the operator's arguments can be proven to lie within the operator's domain. As our analysis yields sets of abstract values that approximate the values of the arguments, it is easy to determine when a run-time check can be eliminated: the arguments' abstract values must denote a subset of the operator's domain.

The second optimization we have studied is inlining. Inlining trades code space for time by replacing a procedure call with the called procedure's body. Our flow analysis enables effective inlining because it identifies the set of procedures that may be invoked at each call site in a program. At a call site, our analysis associates a set of abstract closures with the function position of the call. These abstract closures approximate the exact closures that may be applied at that call site. (Informally, a single abstract closure might correspond to a set of exact closures having the same code pointer, but closed over different environments.) It is now easy to determine when a call site is a candidate for inlining: there must be a single abstract closure applied at the call site. Our particular flow analysis based on polymorphic splitting also facilitates constructing different size estimates for a procedure at different call sites based on the portions of the procedure that may be reached from those sites. For example, large procedures can be aggressively inlined when static determination of conditional tests in their bodies permits entire blocks of code to be pruned.

The implementation of our analysis handles all of the constructs and operators specified in the [R.sup.4]RS Scheme report, including variable-arity procedures, data structures, first-class continuations, and assignment. Experimental results for realistic Scheme programs indicate that polymorphic splitting is extremely precise and has reasonable cost. Furthermore, our polyvariant analysis is often cheaper to compute than less precise monovariant analyses.

We have used our analysis to optimize a collection of Scheme programs. We perform optimization by source-to-source transformation and use Chez Scheme [Dybvig 1987] to compile the resulting programs to native machine code. Together, our run-time check elimination and inlining optimizations yield significant performance improvements over the unoptimized programs. Our optimizer typically yields a 20 to 40% improvement for the programs in our benchmark suite, with some programs running four times as fast.(1) Comparisons of polymorphic splitting to simple flow analyses (closure analysis [Palsberg 1995; Sestoft 1988] and OCFA [Shivers 1990]) and to soft typing [Wright and Cartwright 1997] indicate that polymorphic splitting is significantly more effective than these analyses in eliminating run-time checks.

The remainder of the article is organized as follows. The next section discusses our motivation for exploring flow analysis as an optimization tool. Section 3 presents a precise definition of our flow analysis framework, including a discussion of polymorphic splitting. The fourth section describes an algorithm for computing flow information. Section 5 describes our run-time check elimination and flow-directed inlining optimizations and presents experimental results. Section 6 discusses related work, and Section 7 provides a summary.

2. MOTIVATION

Higher-order functional languages provide rich and expressive programming platforms. However, realizing efficient implementations of these languages has proven to be challenging. This is primarily because the abstract programming style encouraged by functional languages is significantly different from the programming style found in more traditional imperative languages. Because higher-order procedures, abstract datatypes, and sophisticated control abstractions (e.g., continuations) are the fundamental building blocks of functional languages, conventional compiler optimization techniques developed for imperative languages have little effect when applied to programs using these features. A basic premise underlying the motivation of this article is that optimization strategies developed for imperative languages will be only marginally useful in building high-performance implementations of functional languages unless they are used in conjunction with more powerful interprocedural analyses.

In imperative languages, procedures are large, and their call sites are syntactically apparent. Hence intraprocedural local analyses can gather sufficient information to enable useful optimizations. Such local analyses can be adapted to obtain some useful information for functional languages. For example, syntactic heuristics can sometimes be used to identify the procedures that may be applied at a given call site. Thus, in the Scheme expression

(let ([f (lambda (x) x)]) (f 1))

it is easy to recognize that f is only applied at the call (f 1). However, simple syntactic analyses can be easily foiled:

(let ([g (lambda (b,) h)] [f (lambda (x) x)]) ((g f) 1))

In order for a syntactic analysis to correctly determine that the call (g f) yields f and that f is only applied at ((g f) 1), the analysis must track how f is passed through g. Since procedure g might be arbitrarily complex, effective syntactic analysis of its body is impossible.

For higher-order languages, a flow analysis must construct some approximation to the set of procedures applied at a given call site and some approximation to the set of values returned from a call. Control-flow frameworks such as OCFA compute such information. Such monovariant analyses are low-order polynomial-time analyses that effectively determine all potential call sites for all procedures, but merge the values of arguments from all call sites. In the above example, OCFA will correctly indicate that (g f) can return only f. The analysis associates an abstract value with (g f) that describes procedure f. Such an abstract value might be a label that uniquely identifies the lambda-expression bound to f.

Because of the merging performed at calls and returns, monovariant analyses fail to separate the argument values supplied by the two calls to f in the following expression:

(let ([f (lambda (x) x)]) (begin (f #t) (+ (f 1) 0)))

A monovariant analysis computes an abstract value for x that describes the set {1, #t}. Type-specific optimizations must therefore assume that (f 1) can return either an integer or a boolean. Hence monovariant analyses are too weak to eliminate the run-time check at +.

A more sophisticated polyvariant flow analysis avoids merging information from different calls by associating a distinct context with each call. For instance, the call site where a procedure is applied can be used to disambiguate different bindings to the procedure's arguments. In the above example, such an analysis (commonly referred to as 1CFA) deduces that f is applied to 1 and returns 1 in the first call, and that f is applied to #t and returns #t in the second call. With 1CFA, two separate abstract values are constructed for x, one for each call site of f. Hence 1CFA is powerful enough to eliminate the run-time check at +.

1CFA is an instance of a polyvariant flow-analysis framework based on call-strings. Call-string-based analyses use the N most dynamically recent call sites as a disambiguation context, for some small, fixed N. Because the point at which a procedure is defined may be far removed from its point of use, however, call-string-based analyses are highly sensitive to program structure. It is easy to extend the above example so that 1CFA computes overly conservative information:

(let ([f (lambda (x) x)] [g (lambda (a b) (a b))]) (begin (g f #t) (+ (g f 1))))

Here, there is an intervening call between each point where f is referenced and the point where it is applied. 1CFA computes an abstract value for f's result that describes both i and #t. Correctly disambiguating the two calls to f made via the two calls to g requires a 2CFA analysis that preserves two levels of call history.

Another polyvariance technique that has been used in implementations of some object-oriented languages relies on the types of a procedure's or method's arguments to provide a disambiguation context [Chambers and Ungar 1990; Plevyak and Chien 1994]. Under such a type-directed analysis, the two calls to g in the above example are discriminated by their arguments' types. The first call to g is analyzed in the context <procedure, integer>, while the second call is analyzed in <procedure, boolean>. Because type-directed analyses rely on approximations to the values of a call's arguments (the arguments' types), type-directed analyses are less dependent on the syntactic structure of a program than call-string-based analyses. Nonetheless, their effectiveness is also closely tied to how much procedure call history is preserved. While a type-directed analysis need preserve only one level of call history to avoid unwanted merging in the above example, it is easy to construct similar examples where more history is necessary.

In contrast to these flow analysis techniques, polymorphic type inference algorithms [Milner 1978] handle all of the above examples without undesirable merging. Polymorphic type systems disambiguate different calls to a polymorphic procedure by effectively duplicating the procedure wherever it is referenced. Polymorphic type systems are insensitive to the dynamic sequence of calls separating the construction of a value from its use. Consequently, they are less dependent on the syntactic structure of programs.

We therefore consider an alternative form of polyvariance called polymorphic splitting that uses let-expressions as clues to determine when to introduce new analysis contexts. Let f be a procedure bound by a let-expression, and let [f.sub.1], . . ., [f.sub.n] be distinct occurrences of f within the let-expression's body. Our polymorphic splitting analysis associates a distinct abstract closure with each of the occurrences [f.sub.i]. The analysis associates different bindings for arguments to different abstract closures, thus separating abstract values from different calls.

3. FLOW ANALYSIS FRAMEWORK

This section presents a formal definition of our polymorphic splitting flow analysis. We first describe a simple higher-order source language that includes let-expressions as a distinct syntactic form. We present a small-step operational semantics for an enriched version of this language that has explicit notation for closures and local environments. We then use a constraint relation to define a flow analysis of a program in this enriched language and show a soundness result that relates the meaning of expressions as defined by the semantics to the abstract values computed by the flow analysis. Finally, we show how to accommodate recursive definitions, data constructors, and assignment in the analysis.

3.1 Source Language

The abstract syntax for our Scheme-like source language has labeled expressions [e.sup.l] [element of] Exp of the form

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

where c [element of] Const are constants; p [element of] Prim are primitives (eg. +, [divided by]); x,y,z [element of] Var are variables; and l [element of] Label are labels. Labels, which are not present in the surface syntax available to programmers, are discussed below. A lambda-expression constructs an anonymous procedure; terms beginning with p are uses of primitive operations; if provides conditional execution; and, let provides nonrecursive local bindings. Free variables (FV) and bound variables (BV) are defined as usual for a lexically scoped language [Barendregt 1984], with let binding its variable x in [e.sub.b].

Programs are expressions with no free variables. In order to unambiguously name different components of a program, we assume programs have two properties. First, since we use labels to identify subterms of a program, each subterm of a program must have a unique label. Second, since we refer to a specific variable by its name, we assume that all fee and bound variables in a program are distinct. This condition can be met by renaming bound variables appropriately.

3.2 Exact Semantics

Like Scheme and ML, our source language has an ordinary call-by-value semantics. We specify the semantics as a small-step contextual rewriting system. To represent intermediate states of evaluation, we extend the source language with constructs to represent closure values and local environments as follows:

[e.sup.l] :: = . . . [absolute value of [(close [Rho] f).sup.l]] [(bind [Rho] e).sup.l]

[v.sup.l] :: = [c.sup.l] [absolute value of [(close [Rho] f).sup.l]]

[Mathematical Expression Omitted]

We identify a subset of expressions as values, denoted v. A close-expression (closure) represents the value of a procedure and is a value. A bind-expression represents an activation record, holding values for the free variables of a procedure being evaluated. The first component of a close- or bind-expression is a finite list containing pairs of variables and values. We treat this list as an environment, i.e., a finite function. We write p([x.sub.i]) for the unique [v.sub.i] associated with [x.sub.i] in [Rho].

We use two forms of evaluation contexts [Barendregt 1984; Felleisen and Friedman 1986] to encode control information, i.e., to find the next redex:

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

An evaluation context is a term with a single "hole" [] in the place of one subterm. We write E[e] to denote the expression resulting from "filling" the hole of context E with e, with free variables of e captured by bindings in E. The grammar for evaluation contexts specifies where holes may appear and thus controls the order in which the subexpressions of a term are evaluated. For example, the arguments of primitive operations and procedure calls are evaluated from left to right, as the grammar requires all arguments left of the hole to be values. Singular contexts S are a subset of evaluation contexts E and are used by various reductions to gain access to the lexical binding of a variable.

Figure 1 presents the rewriting rules for the semantics. The first rule indicates that a variable's value is obtained from the environment of the closest enclosing bind-expression. Evaluating a lambda-expression yields a closure value whose environment contains bindings for all of the lambda-expression's free variables, i.e., a fiat closure. When a closure is applied to a value, apply-closure constructs a new activation record that holds bindings for the closure's free variables and its formal parameters.(2) Rule primitive uses a partial function [Delta] to handle primitive applications. The return rule discards activation frames when a called procedure returns a value. The remaining rules handle let-expressions and conditionals. A program e is evaluated by reducing the term (bind [null set] e), where [null set] is the empty environment.

Note that reductions are label-preserving, since the contractum of a redex inherits the same label as the redex. For example, lookup replaces [x.sup.l] with [v.sup.l] even though p(x) = [v.sup.l[prime]]. This property will be important later when we relate the abstract values computed by the flow analysis to the exact values specified by the exact semantics.

We sometimes distinguish variables bound by let-expressions as follows. A variable occurrence [Mathematical Expression Omitted] with a subscript indicates that x is bound by a let-expression (let [([x [e.sub.1]) [e.sub.2]).sup.l[prime]] with label l[prime], and that the expression [e.sub.1] is either a lambda- or close-expression.

3.3 Flow Analysis

A flow analysis determines the sets of values that may be bound to variables and returned from subexpressions. We name these sets abstract values; in other words, a single abstract value represents a set of exact values. Polyvariant analyses associate multiple abstract values with each variable and expression, distinguishing them by contours that abstract a program's run-time state. We call these variable-contour pairs and label-contour pairs program points. Let Avalue be some set of abstract values and Contour be some set of contours. A flow analysis of a program P is a function

[Mathematical Expression Omitted]

that maps the program points of P to abstract values. F<x, [Kappa]> identifies the values that may be bound to x in contour [Kappa]. F<l, [Kappa]> identifies the values that may be returned by the subexpression labeled l in contour [Kappa].

Abstract values for our framework are defined as follows:

a [element of] Avalue = P(Aconst [union] Aclosure)

b [element of] Aconst = {true, false, number, nil}

<l, p, [Kappa]> [Lambda] [element of] Aclosure = Label x Aenv x Contour

[Mathematical Expression Omitted]

[Kappa] [element of] Contour = [Label.sup.*]

where P denotes powerset, and [Mathematical Expression Omitted] denotes a finite function. An abstract value a is a set of abstract constants and abstract closures. The abstract constants true and false each denote a single exact value, while the abstract constant number denotes a set of exact values. An abstract closure <l, [Rho], [Kappa]>[Lambda] identifies procedures created from the expression [(lambda ([x.sub.1] . . . [x.sub.n]) [e.sub.b]).sup.l]. The contour [Kappa] of an abstract closure, paired with an argument [x.sub.i] or a label of a subexpression of [e.sub.b], determines the program points for the body of the abstract closure. Thus two abstract closures that share the same label but use different contours will have different program points. The abstract environment [Rho] of an abstract closure records the contours in which its free variables are bound.

Our polymorphic splitting analysis represents contours as finite strings of labels. Contours abstract different execution contexts and are used to distinguish argument bindings and subexpression results arising at different points in a program's execution. The set of contours must be finite to ensure that the analysis computes finite information, i.e., that the analysis terminates. Since the subsets of variables and labels that a program uses are also finite, and since the lengths of contour strings are bounded by lexical nesting depth, the flow analysis can yield only finitely many abstract values for a particular program.

3.4 Constraints for Safety

Figure 2 defines a relation

[Mathematical Expression Omitted]

that imposes safety constraints on a flow function F. These constraints ensure that F conservatively approximates the behavior of P, as we show in Section 3.7.

The rules for constants and lambda-expressions require the corresponding program points to include appropriate values. The function AbsValOf maps a constant to its abstract value. Thus F, [Rho], [Mathematical Expression Omitted] holds if and only if number [Epsilon] F<l, [Kappa]>, given that AbsValOf(4) = number. Similarly, F, [Rho], [Mathematical Expression Omitted] (lambda [([x.sub.1] . . . [x.sub.n]) e).sup.l] holds if and only if the abstract closure [Mathematical Expression Omitted] is in F<l, [Kappa]>.

The rule for primitive applications "evaluates" the subexpressions [e.sub.1] . . . [e.sub.n] from left to right by requiring F, [Theta], [Mathematical Expression Omitted] to hold whenever F<[l.sub.j], [Kappa]> [not equal to] [null set] for all j [less than] i. In other words, F, [Theta], [Mathematical Expression Omitted] must hold whenever the preceding subexpressions might yield values. The notation X [implies] Y means that Y must hold whenever X holds. When F<[l.sub.i], [Kappa]> [not equal to] [null set] for all i, the rule for primitives also requires AbsResultOf(p) [element of] F<l,[Kappa]>. The function AbsResultOf maps a primitive to its abstract result; for example, AbsResultOf(+) = number.

The remaining rules introduce constraints between program points. These constraints ensure that the abstract values of program points safely approximate the exact values those program points can take on during execution. The variable rule constrains the abstract evaluation of a variable x to include the abstract value bound to it in contour p(x). Rule var applies for both lambda-bound variables and let-bound variables where the binding is neither a lambda-expression nor close-expression. For each abstract closure that can arise in the function position of an application (i.e., for each abstract closure in F<[l.sub.0],[Kappa]>), the application rule introduces two kinds of constraints. The first kind, F<[l.sub.i],[Kappa]> [subset or equal to] F<[x.sub.i],[Kappa][prime]>, requires the formal parameters of the abstract closure to include the abstract values of the actual parameters. The second kind, F<[l.sub.b], [Kappa][prime]> [subset or equal to] F<l, [Kappa]>, requires the application point to include the abstract value arising from the evaluation of the body of the abstract closure.

In the rule for if-expressions, evaluation of the then-clause (resp., the else-clause) is conditional upon #t (resp., #f) arising as a possible value of the test. If the abstract value of the test includes both true and false, both the then-clause and the else-clause contribute to the result of the conditional expression. If the test always diverges and its abstract evaluation yields the empty set, neither the then-clause nor the else-clause is evaluated. The analysis is thus sensitive to whether subexpressions are reachable. Suppose that the program contains (if [Mathematical Expression Omitted]) and that the analysis determines that [e.sub.1] can never yield #t in any contour, i.e., true [not element of] F<[l.sub.1], [Kappa]> for any [Kappa]. Then abstract evaluation of [e.sub.2] never commences, since

true [Mathematical Expression Omitted] and F<[l.sub.2],[Kappa]> [subset or equal to] F<1, [Kappa]>

holds without requiring F, [Theta], [Mathematical Expression Omitted] to hold. Similarly, the conditional relations in the rules for primitives and calls cause the analysis to respect left-to-right order of evaluation of subexpressions. Sensitivity to reachability is significant for polyvariant analyses because many expressions are unreachable in particular contours.

None of the rules discussed so far construct new contours. To provide polyvariance, the two rules in the second part of the figure construct new contours for let-bound closures in a manner that mimics the actions of a polymorphic type system. Hence we call this form of polyvariance polymorphic splitting. The [e.sub.1] subexpression of a let-expression evaluates in a contour [Kappa]: l obtained by appending l to the label string [Kappa]. This contour is captured in any abstract closures created by [e.sub.1]. Because a contour behaves like a lexical environment, and since only let-expressions extend contours, the nesting depth of let-expressions within their [e.sub.1] subexpressions bounds a contour's length. Rule let-var applies at a use of a let-bound variable where the binding is either a lambda- or close-expression.(3) Hence, at [Mathematical Expression Omitted], the contours in abstract closures bound to x are modified by replacing occurrences of l[double prime] with l (the notation [l[double prime]/l][Kappa] in the figure means the contour derived by replacing l[double prime] with l in [Kappa]). In this manner, different uses of the same abstract closure evaluate in different contours. The figure presents a rule for let-expressions with a single variable; extending this rule to multiple variables is straightforward.

The third part of Figure 2 includes rules for close- and bind-expressions. Although these expressions do not appear in source programs, their flow analysis rules are necessary for the inductive proof of soundness. The rules use an auxiliary agreement relation [Mathematical Expression Omitted] to relate an exact environment [Rho] to an abstract environment [Theta]. This agreement relation requires the variables defined by [Theta] to have abstract values which approximate the exact values defined by [Rho]. The rule for close-expressions requires F(l, [Kappa]) to include an abstract closure whose abstract environment agrees with the environment of the closure. The rule for (bind [Rho] e) requires that there exist an abstract environment [Theta][prime] agreeing with [Rho] such that the safety constraints hold for e in [Theta][prime].

3.5 Discussion

To illustrate polymorphic splitting, consider the expression

(let ([f (lambda (x) (lambda (y) x))]) ((f 0) (f #t)))

which yields 0 and whose least flow analysis is shown in Figure 3. Because f is a let-bound procedure, each of its two calls are analyzed in separate contours. In the first call (f 0), abstract values for the variables and subexpressions of f are recorded in contour [1]. In the second call (f #t), abstract values are recorded in contour [4]. Hence abstract values from the two calls are not merged, and the result of the analysis for program point <0, []> is {number}.

The following semantically equivalent expression leads to a less precise result:

((lambda (f) ((f 0) (f #t))) (lambda (x) (lambda (y) x)))

Figure 4 presents its least flow analysis. The abstract value for the result of this expression is {number, true}. Because a polymorphic splitting analysis introduces new contours only at references to let-bound procedures, the two applications (f 0) and (f #t) are analyzed in the same contour. Hence arguments to f from the two calls are merged.

Although polymorphic splitting mimics polymorphism, the approximation defined by the flow analysis is less precise than Hindley-Milner typing in some cases. Consider the following modification of the above example:

(let ([f (lambda (x) (lambda (y) x))]) (let ([g (lambda (z) (f z))]) ((g 0) (g #t))))

To faithfully model polymorphism, we require f to be evaluated in two distinct contours, one for each call to g. However, f's closure is split to the same contour for either call to g, because both calls go through the same call to f. Consequently, the abstract value for the result of this expression is {number, true}.

It is well known that the usual quantification rules for type variables in a Hindley-Milner type system are equivalent to a substitution rule [Mitchell 1990]:

[Mathematical Expression Omitted]

We can capture this behavior in an alternative rule for splitting let-bound variables:

[Mathematical Expression Omitted]

Instead of performing a substitution on contours, this rule ignores the contour of the abstract closure and replaces it with a new contour [Kappa] : l that extends the contour n of the call site with the label l of the variable occurrence. By appending labels to contour strings in this manner, different uses of polymorphic procedures will always be associated with different contours. With the above rule, the abstract value for the result of the preceding example is {number}.

Experimental results, however, indicate that the extra precision provided by the above rule is offset by substantially higher analysis costs, and it only marginally contributes to improving the optimizations we have considered. Indeed, for the 10 benchmarks described later in this article, analysis of two exhausted available memory with the more precise rule, while another took 30 times longer to analyze. On the other hand, for run-time check elimination, an optimization discussed in Section 5.2, only one benchmark showed any noticeable improvement (roughly 20%) as a consequence of using the more precise rule.

3.6 [Mathematical Expression Omitted] is Well Defined

Because [Mathematical Expression Omitted] is not defined inductively on the structure of expressions, as witnessed by the rule for applications, it is not immediately clear how to interpret Figure 2. Following Nielson and Nielson [1997], we define [Mathematical Expression Omitted] coinductively as the greatest fixed point of the underlying functional.

Let Rp : Flow x Aenv x Contour x Exp be a family of relations for program P. Define a functional [A.sub.P] : [R.sub.p] [approaches] [R.sub.p], similar to Figure 2, as follows:

[A.sub.P]([R.sub.P])

= {<F, [Theta], [Kappa], [c.sup.l]> [where] AbsValOf(c) [element of] F <l, [Kappa]>}

[Mathematical Expression Omitted]

[Mathematical Expression Omitted]

U . . . remaining cases adapted mutatis mutandis

We define F, [Theta], [Mathematical Expression Omitted] as the greatest fixed point of this functional. This fixed point is guaranteed to exist because [A.sub.P] is monotonic. We want the greatest fixed point, rather than the least fixed point, because many useful programs would not have flow functions under the least fixed point. (Consider a program containing a loop whose integer bounds are abstracted to number. For [Mathematical Expression Omitted] to hold for the loop body, [Mathematical Expression Omitted] must hold for the iterative application within the loop body. But this application requires [Mathematical Expression Omitted] to hold for the loop body.)

We now show that for any program P, there exists a unique least flow analysis that adheres to the safety constraints imposed by [Mathematical Expression Omitted].

THEOREM 3.6.1 (LEAST FLOW ANALYSIS). Given program P, initial contour [[Kappa].sub.0], and closed expression [[e.sub.0].sup.[l.sub.o]], there exists a unique minimal function F such that F, [Phi], [Mathematical Expression Omitted] holds.

PROOF. For any given program P, we can easily construct a flow function [F.sub.max] such that [F.sub.max], [Phi], [Mathematical Expression Omitted] by including all possible abstract closures and abstract constants relevant to P in every program point. [F.sub.max] is finite, since it uses only finitely many contours. We then impose a natural partial ordering on flow functions for P, and we write [F.sub.1] [subset or equal to] [F.sub.2] when [F.sub.1] is smaller than [F.sub.2]. By coinduction, we prove that if [F.sub.1], [[Theta].sub.0], [Mathematical Expression Omitted] and [F.sub.2], [[Theta].sub.0], [Mathematical Expression Omitted] then [F.sub.1] [intersection] [F.sub.2], [[Theta].sub.0], [Mathematical Expression Omitted].

3.7 Soundness

A flow analysis is sound if the abstract values it computes for various program points are conservative approximations of the exact values that those program points take on during execution. We prove soundness via a subject reduction theorem [Curry and Feys 1958] that relates reductions in the exact semantics to the constraints found in the specification of the flow analysis. In the following, we say that an expression e in program P is safe if F, [Theta], [Mathematical Expression Omitted] holds for some [Theta] and [Kappa]. Subject reduction states that if an expression [e.sub.1] reduces to [e.sub.2] and [e.sub.1] is safe, then [e.sub.2] must also be safe.

THEOREM 3.7.1 (SUBJECT REDUCTION). If F, [[Theta].sub.0], [Mathematical Expression Omitted] and [Mathematical Expression Omitted] then F, [[Theta].sub.0], [Mathematical Expression Omitted].

PROOF. We require two supporting lemmas for the proof. The first lemma states that the redex of a safe expression is also safe.

LEMMA 3.7.2. If F, [Theta], [Mathematical Expression Omitted] then [Theta][prime] and [Kappa][prime] exist such that F, [Theta][prime], [Mathematical Expression Omitted] Furthermore, if E = S for some S, then [Theta][prime] = [Theta].

The proof of this lemma proceeds by induction on the structure of E.

The second lemma states that the replacement of a safe subexpression by another safe subexpression yields a safe expression.

LEMMA 3.7.3. If F, [Theta], [Mathematical Expression Omitted] and F, [Theta][prime], [Mathematical Expression Omitted] and F, [Theta][prime], [Mathematical Expression Omitted] then F, [Theta], [Mathematical Expression Omitted].

The proof of this lemma proceeds by induction on the structure of E.

The proof of the subject reduction theorem proceeds by case analysis on the form of the reduction.

Case [Mathematical Expression Omitted] where [Rho](x) = [v.sup.[l.sub.0]]. We have F, [Theta], [Mathematical Expression Omitted] by Lemma 3.7.2. Hence there exist [Theta][prime], [Kappa][prime] such that

[Mathematical Expression Omitted] (1)

and

[Mathematical Expression Omitted] (2)

by bind. From (1) by the definition of [Mathematical Expression Omitted] we have

[Mathematical Expression Omitted] (3)

and

F<[l.sub.0], [Kappa][double prime]> [subset or equal to] F <x, [Theta][prime](x>). (4)

From (2) we have

[Mathematical Expression Omitted] (5)

by Lemma 3.7.2.

Suppose v = c. Then AbsValOf(c) [element of] F<[l.sub.0],[Kappa][double prime]> by const with (3). Hence AbsValOf(c) [element of] F<x,[Theta][prime](x>) by (4). By var, [Mathematical Expression Omitted]. Hence

[Mathematical Expression Omitted] (6)

by const. The result follows from (5) and (6) by Lemma 3.7.3.

Otherwise v = (close [Rho][prime] [e.sup.l[prime]]). Then F, [Mathematical Expression Omitted] and <l[prime], [[Theta].sub.v], [[Kappa].sub.v]>[Lambda] [element of] F<[l.sub.0], [Kappa][double prime]> by close with (3). Hence (l[prime],[[Theta].sub.v], [[Kappa].sub.v])[Lambda] [element of] F<x, [Theta][prime](x)> by (4). By either let-var or var according to how x is bound, <l[prime], [[Theta].sub.v], [[Kappa][prime].sub.v]>[Lambda] [element of] F<l, [Kappa][triple prime]> (where [[Kappa][prime].sub.v] is different from [[Kappa].sub.v] if x is bound by let). Hence

[Mathematical Expression Omitted] (7)

by close. The result follows from (5) and (7) by Lemma 3.7.3.

[Mathematical Expression Omitted]

[Mathematical Expression Omitted].

By Lemma 3.7.2, we have F,[Theta], [Mathematical Expression Omitted] (bind [Rho] S[(lambda [([x.sub.1] . . . [x.sub.n]) e).sup.l]]). Hence

[Mathematical Expression Omitted] (8)

and

[Mathematical Expression Omitted]

by bind. From the latter we have

[Mathematical Expression Omitted] (9)

by Lemma 3.7.2. Hence

[Mathematical Expression Omitted] (10)

by lam. Then from (8) and (10) we have

[Mathematical Expression Omitted] (11)

by close. The result follows from (9) and (11) by Lemma 3.7.3.

[Mathematical Expression Omitted]

By Lemma 3.7.2 we have

[Mathematical Expression Omitted]. (12)

By ap we have

[Mathematical Expression Omitted] (13)

for i = 1 . . . n and

[Mathematical Expression Omitted] (14)

From the latter by close we have

[Mathematical Expression Omitted] (15)

and [<[l.sub.c], [Theta][prime], [Kappa][prime]>.sub.[Lambda]] [Epsilon] F <[l.sub.1], [Kappa]>. Together with (13) we have

[Mathematical Expression Omitted] (16)

and

F<[L.sub.i], [Kappa]> [subset of equal to] F<[x.sup.i], [Kappa][prime]> (17)

for i = 1 . . . n, as well as

F<[L.sub.b], [Kappa][prime]> [subset of equal to] F<l, [Kappa]> (18)

by ap. By the variable conventions, x [not an element of] dom([Rho]); hence with (15) we have

[Mathematical Expression Omitted].

With (13) and (17) we have

[Mathematical Expression Omitted],

and together with (16) and (18) we have

F, [Theta], [Kappa] [[follows from].sup.P] [(bind [Rho][[x.sub.1] [approaches] [v.sub.1]] . . . [[x.sub.n] [approaches] [v.sub.n]] [e.sub.b]).sup.l] (19)

by bind. The result follows from (12) and (19) by Lemma 3.7.3.

Case E[[(bind [Rho] [v.sup.l[prime]]).sup.l]], [approaches] E[[v.sup.l]]. We have

F, [Theta], [Kappa] [Mathematical Expression Omitted] [(bind [Rho] [v.sup.l[prime]]).sup.l] (20)

by Lemma 3.7.2. Then

F, [Theta], [Kappa][prime] [[follows from].sup.P] [v.sup.l][prime] (21)

and

F<l[prime], [Kappa][prime]> [subset or equal to] F<l, [Kappa]> (22)

by bind.

Suppose v = c. Then AbsValOf(c) [Epsilon] F(l[prime], [Kappa][prime]) by const with (21). Hence AbsValOf(c) [Epsilon] F(l, [Kappa]) by (22). Hence

F, [Theta], [Kappa] [Mathematical Expression Omitted] [v.sup.l] (23)

by const. The result follows from (20) and (23) by Lemma 3.7.3.

Otherwise v = (close [Rho][prime] [e.sup.l][prime]). Then F, [Mathematical Expression Omitted] and [<l[prime], [[Theta].sub.v], [[Kappa].sub.v]>.sub.[Lambda]] [Epsilon] F <l[prime], [Kappa][prime]> by close with (21). Hence [<l[prime], [[Theta].sub.v], [[Kappa].sub.v]>.sub.[Lambda]] [Epsilon] F <l, [Kappa]> by (22). Hence

F, [Theta][prime], [Kappa] [[follows from].sup.P] [v.sub.l] (24)

by close. The result follows from (20) and (24) by Lemma 3.7.3.

[Mathematical Expression Omitted].

Then

[Mathematical Expression Omitted]

by Lemma 3.7.2. Hence

[Mathematical Expression Omitted] (25)

and

[Mathematical Expression Omitted] (26)

by bind. From the latter, F, [Theta][prime], [Kappa][double prime] [[follows from].sup.P] [Mathematical Expression Omitted] by Lemma 3.7.2. Hence

[Mathematical Expression Omitted] (27)

and

F<[l.sub.1], [[Kappa][double prime]: l]> [subset or equal to] F<x, [Kappa][double prime]> (28)

by let. Now, either v = c or v = (close [Rho][prime] [e.sup.l][prime]). In the first case, F <x, [Kappa][double prime]> [not equal to] [Phi] by const with (27) and (28). In the second case, F <x, [Kappa][double prime]> [not equal to] [Phi] by close with (27) and (28). Hence

[Mathematical Expression Omitted] (29)

and

F<[l.sub.b], [Kappa][double prime]> [subset or equal to] F <l, [Kappa][double prime]> (30)

by let. Now, by the definition of [Mathematical Expression Omitted] with (27), (28), and (25), we have

[Mathematical Expression Omitted].

Together with (29) and (30) we obtain

[Mathematical Expression Omitted]

by bind. The result follows by Lemma 3.7.3 together with (26).

The remaining cases for primitive application and if are straightforward.

3.8 Recursion, Data Constructors, and Assignment

To add recursion, data constructors, and assignment, we extend our source language with several new syntactic forms:

[Mathematical Expression Omitted]

The letrec form constructs a named recursive procedure, binding x in both f and [e.sub.b]; cons constructs mutable pairs; car and cdr are the first and second projections on pairs, and set-car! and set-cdr! mutate the components of pairs. A letrec-expression binds its variable x in f and [e.sub.b]. The bound expression f of a letrec-expression must be a procedure. We extend our subscript convention for variable occurrences as follows: nonrecursive variable references to x that appear with [e.sub.b] but not within f are subscripted [Mathematical Expression Omitted], where l[prime] is the label of the letrec-expression binding x.

We extend the definition of abstract values to include abstract pairs:

a [element of] Avalue = . . . + Apair

[<l, [Kappa]>.sub.[Pi]] [element of] Apair = Label x Contour

An abstract pair [<l, [Kappa]>.sub.[Pi]] identifies pairs constructed from expression [(cons [e.sub.1] [e.sub.2]).sup.l] in contour [Kappa]. Figure 5 extends relation [[follows from].sup.P] to the new syntactic forms. Theorems 3.6.1 and 3.7.1 extend to this larger language.

The Hindley-Milner polymorphic type system does not support polymorphic recursion. Thus, in the expression (letrec ([x f]) [e.sub.b]), only occurrences of x within [e.sub.b] are polymorphic; those within f are not. Similarly, our polyvariant flow analysis introduces new abstract closures via rule rec-var only at occurrences of x within [e.sub.b]. Recursive uses of x within f evaluate via rule in the most recent contour of a nonrecursive use of x. To illustrate, consider the following expression:

(letrec ([last (lambda (1)

(if (null? (cdr 1))

(car l)

(last (cdr 1))))(1)])

(begin

(last(2) (cons 1 (cons 2 '())))

(last(3) (cons "a" (cons "b" '()))))

The abstract closure created for last is [<1, [Phi], []>.sub.[Lambda]] where 1 is the lambda-expression's label. At last(2), the abstract closure for last is split to yield [<1, [last [approaches], [2], [2]>.sub.[Lambda]]. During abstract evaluation of (last(2) . . .), recursive references to last yield the same abstract closure [<1, [last [approaches], [2], [2]>.sub.[Lambda]]. Similarly, during abstract evaluation of (last(3) . . .), the abstract closure for last is split to yield [<1, [last [approaches], [3], [3]>.sub.[Lambda]], and recursive references to last yield the abstract closure [<1, [last [approaches]. [3], [3]>.sub.[Lambda]]. In each case, the recursive calls evaluate in the same contour as the outermost call. Hence last's argument 1 is bound in different contours for the entire evaluation of each recursive application of last.

The remaining rules of Figure 5 handle construction, projection, and mutation of pairs. A pair expression [Mathematical Expression Omitted] creates an abstract pair [<l, [Kappa]>.sub.[Pi]], where [Kappa] is the contour under which the pair expression was reached. The flow map holds abstract values for the components of the abstract pair at F<[l.sub.1], [Kappa]> and F<[l.sub.2], [Kappa]>. The car and cdr expressions return abstract values from these program points. A set-car! expression that might modify the pair [<l, [Kappa]>.sub.[Pi]] merges the abstract values assigned into F<[l.sub.1], [Kappa]>; a set-cdr! expression similarly modifies F<[l.sub.2], [Kappa]>. Since abstract pairs include the contour in which they were created, uses of cons that occur inside polyvariant procedures will not be merged. For example, consider the following program fragment:

(let ([f (lambda (x) (cons x '()))])

(let ([g (f (lambda (y) (+ y 1)))])

(let ([h (f (lambda (z) (not z)))])

((car g) 1)

((car h) true))))

The abstract value of g is a pair whose first component is the procedure associated with (lambda (y) (+ y 1)). The abstract value of h is a pair whose first component is the procedure associated with (lambda (z) (not z)). No merging of these two procedures takes place through x because the analysis associates two different abstract closures with f for each of its two occurrences; x is bound to distinct contours in the two calls.

4. IMPLEMENTATION

The implementation of our analysis consists of approximately 4000 lines of Scheme code. The implementation constructs the least flow analysis for a program P by building a graph in which nodes represent program points containing sets of abstract values, and edges represent containment constraints between the sets of values at nodes [Henglein 1992; Jagannathan and Weeks 1995]. The algorithm constructs the graph incrementally by propagating values across edges to successor nodes and building new pieces of the graph as new closures arrive at applications. The algorithm maintains a work list of nodes holding values that have not yet been propagated to all successors of a node, and it proceeds by repeatedly picking a node from the work list and propagating its values to its successors. The algorithm terminates when the work list is empty.

Procedure Flow, defined in Figure 6, constructs the graph for an expression reached in a particular environment and contour. The figure includes cases for only some of our language constructs; the remaining cases are similar. For constants and lambda-expressions, Flow just adds the appropriate abstract value to program point <l, [Kappa]>. For variable references, Flow adds a simple edge [Mathematical Expression Omitted] between the appropriate program points; l[prime] denotes the label of the reference. Such an edge indicates that every element arriving at <x, [Kappa]> must be propagated to <l[prime], [Kappa][prime]>.

For applications, we use active edges to construct new parts of the program flow graph as new values arrive. An active edge [Mathematical Expression Omitted] calls function f with each new element that is added to <l, [Kappa]>. Indeed, we can use active edges to define simple edges:

[Mathematical Expression Omitted]

A one-shot edge [Mathematical Expression Omitted] calls f with an uninteresting value only the first time a value arrives at <l, [Kappa]>, i.e., when <l, [Kappa]> becomes nonempty. One-shot edges, which can also be defined in terms of active edges [Mathematical Expression Omitted], allow the implementation to build pieces of the flow graph incrementally, adhering to various reachability constraints. In this sense, one-shot edges are related to the representation of conditional types in Aiken et al. [1994].

When an application is reached, procedure Flow first constructs the graph for the first subexpression [e.sub.0]. It then adds one-shot edges that will evaluate subsequent subexpressions as the nodes for their predecessors become nonempty. Finally, when the node for the last subexpression [e.sub.n] becomes nonempty, Flow establishes an active edge from the node for [e.sub.0] that, when a new closure arrives, will construct the graph for the body of the closure and establish appropriate argument and return edges. The argument edges propagate values from the nodes for the subexpressions [e.sub.i] to the nodes for the formal parameters [x.sub.i]. The return edge propagates values from the node for the closure body [e.sub.b] to the call expression's node. In constructing the graph for the body of the closure, some care is needed to ensure that the algorithm does not loop. MemoFlow is a memoized version of Flow that simply cuts off evaluation when called with the same arguments as some previous call.

For conditionals, Flow constructs the subgraph for [e.sub.1]. It then establishes an active edge from the node for [e.sub.1] that will construct the graph for [e.sub.2] or [e.sub.s] as true or false values arrive from [e.sub.1]. For languages such as Scheme, where [e.sub.2] is executed if [e.sub.1] yields any value other than #f, a little more work beyond changing "v = true" to "v [not equal to] false" must be done to ensure that Flow([e.sub.2], [Theta], [Kappa]) is only called once.

For a let-expression with label l, Flow constructs the subgraph for the let-binding in a new contour constructed by extending the current contour with l. Edges are added between the node for the binding variable and the node holding the result of the binding. Finally, the flow graph for the body is constructed, and a new edge is established between the node holding the result of the let-body and the node holding the result of the let-expression.

The implementation of let-bound variables creates an active edge between the variable node <x, [Kappa]> holding the value of the variable and a procedure that splits the contour of abstract closures that arrive at <x, [Kappa]>.

In this analysis, an abstract environment is a full function space from program variables to contours. Since there are O([2.sup.N]) abstract closures for a program of size N, this analysis has worst-case exponential complexity. However, as the next section demonstrates, the algorithm behaves reasonably in practice.

5. OPTIMIZATIONS

This section describes two optimizations driven by our flow analysis: run-time check elimination and inlining. For both optimizations, we provide experimental results indicating their effectiveness. For run-time check elimination, we also provide a comparison between polymorphic splitting and several other kinds of analyses. We show that polymorphic splitting is often more precise and less expensive to compute.

The experimental results reported in the following sections were obtained by using Chez Scheme 5.0a [Dybvig 1987] to compile programs to native MIPS code. The compiled programs were run on a 150MHz SGI Challenge with 1GB of eight-way interleaved physical memory. Times reflect the least of five runs.

5.1 Run-Time Check Elimination

Dynamically typed languages like Scheme require run-time checks at every use of a primitive operator to ensure that the operator's arguments lie in its domain. There are three kinds of run-time checks that may be required in Scheme programs:

(1) An arity check is required at a lambda-expression if the procedure may be applied to an inappropriate number of arguments.

(2) An application check is required at an application if the expression in the call position may yield a value other than a closure.

(3) A primitive check is required at a primitive operation if the operation may be applied to a value of inappropriate type.

Ordinarily, Scheme compilers conservatively insert run-time checks at all primitive operations and calls except in the simplest of cases. Using the information derived by flow-analysis, the need for many run-time checks can be avoided.

Run-time checks at primitive operations can be eliminated if the values of all arguments lie within the expected domains of the primitive operation. In our experimental framework, we assume that there are two versions of every primitive operation: one that includes a run-time check and one that does not. In a real compiler, run-time checks for each argument could be handled independently. Let AbsArgOf(p, i) be a set of abstract values describing the expected domain of the ith argument of primitive operator p. For example, AbsArgOf(+, i) = {num} for i = 1,2. Then a run-time check can be eliminated if the following condition holds:

Run-Time Check Elimination Condition. No run-time check is necessary at [Mathematical Expression Omitted] if

[Mathematical Expression Omitted]

for all i = 1 . . . n.

Similar conditions enable eliminating arity checks and application checks.

5.2 Experimental Results for Run-Time Check Elimination

Table I lists the benchmarks used to test the analysis, their size in pretty-printed lines of source code after prepending library routines, stripping comments, and expanding macros, the number of sites where run-time checks would ordinarily be required in the absence of any optimizations, and the times to analyze them under polymorphic splitting, a OCFA implementation, a 1CFA implementation, a type-directed analysis,(4) and an implementation of soft typing [Wright and Cartwright 1997]. These benchmarks are characteristic of a wide range of Scheme programs, and none were written with knowledge of the analysis used.

The program Lattice enumerates the lattice of maps between two lattices and is mostly first-order. Boyer is a term-rewriting theorem prover. Graphs counts the number of directed graphs with a distinguished root and k vertices, each having [TABULAR DATA FOR TABLE I OMITTED] out-degree at most 2. It makes extensive use of higher-order procedures and is written in a continuation-passing style. Given an n x n random matrix M with {+1, -1} entries, Matrix tests whether M is maximal among all matrices of the same dimension obtainable by simple reordering of rows and columns and negation of any subset of rows and columns. Like Graphs, this program is written in continuation-passing style. Maze generates a random maze and computes a path through it using a union-find algorithm. It makes extensive use of records and uses Scheme's call-with-current-continuation operator, but is primarily a first-order program. Splay is an implementation of splay trees. It makes extensive use of higher-order procedures and pattern-matching macros [Wright and Duba 1993]. Nbody is a Scheme implementation [Zhao 1987] of the Greengard [1987] multipole algorithm for computing gravitational forces on point-masses distributed uniformly in a cube. Dynamic is an implementation of a tagging optimization algorithm [Henglein 1992] for Scheme. It is primarily a first-order program, but has complex control-flow, and many deeply-nested conditional expressions. Nucleic is a constraint satisfaction algorithm used to determine the three-dimensional structure of nucleic acids [Feeley et al. 1994]. It is floating-point intensive and uses an object package implemented with macros and vectors. Nucleic2 uses a different object package as described below.

For the larger benchmarks, soft typing requires less analysis time than any of the flow analyses. Because the soft typing implementation uses a Hindley-Milner type inference framework, the body of a procedure is analyzed only once. Type constraints within the procedure body are calculated using type variables to express the dependence between unspecified inputs and outputs. These constraints are duplicated for different calls to the procedure via polymorphic type variable instantiation without being recalculated from scratch. In contrast, our flow analysis algorithm establishes multiple copies of the constraint rules between program points of a procedure's body, one copy for each contour in which the body is analyzed. If a procedure is never reached, our algorithm is cheaper than type inference because it never calculates any constraints. But in the more common case where a procedure is reached under many contours, our algorithm is more expensive than type inference because it calculates the constraints many times. Nevertheless, we believe the times obtained from our prototype flow analysis implementation indicate that this technique should be suitable for inclusion in an optimizing compiler.

In many cases, the analysis time for polymorphic splitting is less than the analysis time for 0CFA. On the surface, this is counterintuitive: one might imagine a more precise analysis would have greater cost in analysis time. The reason for the significant difference in analysis times is because 0CFA yields coarser approximations, and thus induces more merging. More merging leads to more propagation, which in turn leads to more reevaluation. Consider an abstract procedure value P. This value is represented as a set of abstract closures. As the size of this set becomes bigger, abstract applications of P require analyzing the bodies of more lambda-expressions, since the application rule applies each element in the set defined by P. Because polymorphic splitting results in less merging than 0CFA, abstract value sets are smaller, which leads to shorter analysis times.

Figure 7 shows the percentage of static checks remaining in the benchmarks under the different analyses after run-time check elimination. The static counts measure the textual number of run-time checks remaining in the programs. For each of the benchmarks, polymorphic splitting requires fewer run-time checks than either 0CFA or soft typing. In many cases the difference is significant. Interestingly, 0CFA also outperforms soft typing on several benchmarks. For example, the Lattice program has roughly half as many static checks when analyzed using 0CFA than using soft typing. The primary reason for this is the problem of reverse flow [Wright and Cartwright 1997] in the soft-typing framework that leads to imprecise typing. Reverse flow refers to type information that flows both with and counter to the direction of value flow. Conditionals often cause reverse flow because the type rules for a conditional require both its branches to have the same type. Consequently, type information specific to one branch may cross over to the other, yielding a weaker type signature than necessary for the entire expression. Because of reverse flow, a cascading effect can easily occur in which many unnecessary run-time checks are required, because of overly conservative type signatures. Our flow analysis does not suffer from the reverse flow problem, nor do soft-typing implementations that use a richer type system [Aiken et al. 1994].(5)

Figure 8 shows the percentage of dynamic checks remaining in the benchmarks. The dynamic counts measure the number of times these checks are evaluated in the actual execution of the program. Dynamic counts provide a reasonable metric to determine the effectiveness of an analysis: a good analysis will eliminate run-time checks along control-paths most likely to be exercised during program execution. The figures for dynamic checks indicate that polymorphic splitting eliminates run-time checks at important program sites. The improvement over soft typing is more striking here. For example, about one third as many run-time checks are encountered during the execution of Lattice optimized under polymorphic splitting than under soft typing. About one quarter as many checks are executed for Dynamic. Here also, a simple 0CFA flow analysis generally performs better than soft typing, although for some benchmarks like Browse, Boyer, and Graphs, 0CFA is worse. This indicates that the approximation used by polymorphic splitting is superior to 0CFA. Insofar as the benchmark programs use common Scheme idioms and programming styles, we conclude that Scheme programs use polymorphism in interesting and nontrivial ways. Analyses which are sensitive to polymorphism will outperform those that are not. The other two polyvariant analyses studied, 1CFA and the type-directed analysis, reduce run-time checks as well as polymorphic splitting on six of the eight benchmarks. While 1CFA seems to be prohibitively expensive, type-directed flow analyses appear to share many of the positive features of polymorphic splitting [Jagannathan et al. 1997].

The counts of static and dynamic checks for Nucleic are significantly higher than for the other benchmarks. Nucleic implements a structure package that supports a simple form of inheritance; the main data objects manipulated by the program are defined in terms of structure instances. Structures are implemented as vectors, and symbolic tags are interspersed in this vector to demarcate distinct substructures in the inheritance hierarchy. Since the analyses do not retain distinct type information for individual vector slots, references to these structures invariably incur a run-time check. Thus, although the vast majority of references in the program are to floating-point data, all the analyses require run-time checks at arithmetic operations that are applied to elements of these structures. Nucleic2 is a modified version of Nucleic that uses an alternative representation for structures. This representation separates the tags from the actual data stored in structure slots, enabling the analyses to avoid run-time checks when extracting elements from structures. Nucleic2 incurs significantly fewer run-time checks than Nucleic for all analyses, although these improvements come at the cost of increased analysis times.

Figure 9 presents execution times for the benchmarks after polymorphic splitting has been used to eliminate run-time checks. The top of the graph (1.0 on the y-axis) represents the normalized execution time of the unoptimized program. Execution time is further broken down by color into mutator time excluding run-time checks (black), garbage collection time (shaded), and run-time checking remaining (white). The white sections of the bars indicate run-time checking overhead that the analysis was unable to safely eliminate. For Graphs, Matrix, Dynamic, and Nucleic2, the analysis was sufficiently precise to eliminate essentially all run-time check overhead. Even for those benchmarks where some run-time check overhead remains, the optimization was still able to yield significant performance improvement, as indicated by the gap between 1.0 and the top of each bar.

5.3 Inlining

The second interprocedural optimization we consider is inlining. Polyvariant flow analysis is especially useful here for two reasons. First, because it disambiguates different uses of a procedure at different call sites, polyvariance yields very precise flow information. This enables many call sites to be identified as candidates for inlining. Second, because it analyzes a procedure with respect to a specific call site, polyvariant flow analysis enables specialization of the procedure for the call sites where it is inlined.

Inlining can be performed at a call site only if there is a unique procedure applied at that site. For example, in

(define g (lambda (f x) (f x)))

the call site (f x) is considered a candidate for inlining only if flow analysis determines that there is a single abstract closure associated with f in the call. An abstract closure corresponds to a set of exact closures. These exact closures may be closed over different environments, but they must all share the same code. In other words, for f to be inlined, all calls to g must supply closures that are created from evaluation of the same lambda-expression in the program. We state this condition formally thus:

Inlining Condition 1. If [Mathematical Expression Omitted] is a call site such that

[Mathematical Expression Omitted]

then [(lambda ([x.sub.1] . . . [x.sub.n]) [e.sub.b]).sup.m] can be specialized to contour [Kappa] and inlined at call site l, provided n = p.

The proviso n = p ensures that we do not eliminate applications which raise an error due to an argument count mismatch.

To illustrate how flow analysis enables potential inline sites to be identified, consider the following object-oriented program fragment:

(define make-network (lambda (args) . . . (lambda (msg) (case msg ((open) (lambda (addr) open a new port)) ((close) (lambda (port) close a port)) ((send) (lambda (msg port) send msg to port)) ((receive) (lambda (port) receive from port)) . . .))))

A network is a procedure that dispatches on a request. A network is created by calling make-network with a set of arguments. Given a network N, the expression

((N 'open) "http://www.foo.com")

invokes a procedure to open a connection to the specified address. The flow analysis will associate N with the procedure

(lambda (msg) (case msg . . .))

and (N 'open) with the procedure

(lambda (addr) open a new port). (*)

This approach to identifying potential inline sites is quite different from traditional approaches to inlining that rely on syntactic heuristics. For example, in the expression

(let ([f (lambda (N) . . . ((N 'open) "http://www.foo.com") . . .)]) . . . (f (make-network args))

a conventional optimizer will inline procedure (*) at (. . . "http://www.foo.com") only if it also inlines f, performs significant simplification, and repeats the inline algorithm on the simplified program. In contrast, an inlining optimization using results from a polyvariant analysis may treat the call (. . . "http://www.foo.com") as a candidate for inlining even if f is not inlined. All inlining decisions are made prior to any simplification.

Besides identifying potential inline sites, flow analysis also helps to compute reasonably efficient cost estimates. To illustrate, consider Figure 10 which provides an implementation of Scheme's map procedure. Two local procedures are defined in map's body: map1 is called when map is invoked with a unary procedure, and [map.sup.*] is invoked for all other cases. Because map1 operates over procedures whose arity is known, it is more efficient than [map.sup.*], whose implementation uses an expensive apply operator to handle the variable-arity case.

Now, consider the call (map car m). The cost of inlining map at this call site is dependent on how much specialization can be performed at the inlined site. Because the inlining algorithm uses results from a polyvariant flow analysis, it can tailor its cost estimate based on the specific context in which a call occurs. In this call, flow analysis reveals that the conditional test (null? args) will be true, and hence the else-branch will not be taken. Thus, the cost estimate for this call need not include the else-branch or the definition of [map.sup.*]. If map is inlined, the outer conditional test, the definition of [map.sup.*], and its call will all be pruned in the inlined copy. Figure 11 shows the result of inlining (map car m).

We can define the transformations given in the above example formally. Recall that Inlining Condition 1 determines an initial set of call sites where inlining can be performed. For such a call site l, we can specialize the unique procedure m called at l. Specialization involves (1) pruning code within m that is never reached from call site l and (2) recursively inlining at call sites within m.

Pruning unreachable code is most important for conditional expressions. Let <m, [Rho], [Kappa]>[Lambda] be the unique closure called at l. To prune unreachable code at a conditional expression [Mathematical Expression Omitted] within m, we use contour [Kappa] to find an upper bound F<[l.sub.1], [Kappa]> on the set of values that the test expression [e.sub.1] can yield when m is called from l. If F<[l.sub.1], [Kappa]> does not include true, the consequent [e.sub.2] can be pruned. If F<[l.sub.1], [Kappa]> does not include false, the alternative [e.sub.3] can be pruned. If F<[l.sub.1], [Kappa]> includes neither true nor false, both [e.sub.2] and [e.sub.3] can be pruned.

Pruning is also possible for most other expression forms. For example, assuming subexpressions are evaluated from left to right in an application, all subexpressions to the right of a subexpression whose abstract value is empty can be pruned. To simplify the presentation, we have included pruning only for conditionals. Our flow-directed inlining implementation prunes wherever possible.

The second kind of specialization is recursive inlining at call sites within a procedure that is being inlined. Inlining Condition 1 identifies call sites where inlining is possible within the unspecialized procedures of the original program. To identify candidates for inlining within a specialized procedure, we employ a similar condition that takes into account the specialization contour.

Inlining Condition 2. Let <m, [Rho], [Kappa]>[Lambda] be an abstract closure that is being inlined at call site l and specialized to contour [Kappa]. A call site [Mathematical Expression Omitted] within the expression labeled by m can be inlined in the specialized version of m if

[Mathematical Expression Omitted]

and closure [Mathematical Expression Omitted] has arity [Mathematical Expression Omitted].

That is, we can inline [Mathematical Expression Omitted] at call site [Mathematical Expression Omitted] within the specialized version of m if the flow analysis identifies a unique abstract closure at program point [Mathematical Expression Omitted]. Support for recursive inlining follows naturally from a polyvariant flow analysis.

After inlining, [[Beta].sub.v]-substitution, dead-code elimination, and local code transformations are performed. These transformations include restructuring procedure definitions and calls to eliminate unused formal parameters. These optimizations are all syntactic and use no flow information. The simplifier performs no transformations that violate the results of the flow analysis. Hence, other optimizations could use flow information generated for the original program when operating over the inlined version. Figure 12 shows the result of applying the simplifier to the inlined code for (map car m).

5.3.1 Inlining Procedures with Free Variables. Our discussion of inlining has so far neglected an important issue in a language with higher-order procedures and lexical scoping. How do inlined procedures gain access to the values of their free variables?

To answer this question, we define a target language that adds an ordered list of free variables [[z.sub.1] . . . [z.sub.m]] to each lambda-expression:

(lambda [[z.sub.1] . . . [z.sub.m]] ([x.sub.1] . . . [x.sub.n]) [e.sub.b])

The target language also includes an expression form

(cl-ref [e.sub.1] n)

to access the nth element of a closure's free variable list. The cl-ref operator (short for closure-reference) requires [e.sub.1] to evaluate to a closure.

When a procedure with free variables is inlined, all references to these variables are replaced with cl-ref operations. To illustrate, suppose that we inline (lambda ([x.sub.1] . . . [x.sub.n]) [e.sub.b]) which has {[Z.sub.1] . . . [z.sub.m]} free at ([e.sub.0] [e.sub.1] . . . [e.sub.n]). The original lambda-expression is replaced with

(lambda [[z.sub.1] . . . [z.sub.m]] ([x.sub.1] . . . [x.sub.n]) [e.sub.b]).

We replace the call with

((lambda (w [x.sub.1] . . . [x.sub.n]) [e[prime].sub.b]) [e.sub.n] [e.sub.1] . . . [e.sub.n])

where w is a new variable, and [e[prime].sub.b] is a specialized copy of [e.sub.b] with references to [z.sub.i] replaced by (cl-ref w i).

The cl-ref operation is easy to implement in a compiler that uses fiat closures [Appel 1992]. A closure record consists of a code pointer and the values of the variables in the free variable list for that procedure. A compiler that uses linked closures may require additional information in the cl-ref operation to indicate the forms of the linkage structures.

5.3.2 An Inlining Algorithm. Figure 13 collects the above observations into a complete inlining algorithm. Given a flow analysis F for a program P, the transformation I[[e]]K[Mu] takes a subexpression e of P, a contour [Kappa], and a loop map [Mu], and produces an equivalent inlined and specialized expression. The initial contour for P is the special contour ?, and the initial loop map is the empty map. The contour ? denotes the union of all possible contours, as in Inlining Condition 1. In other words,

[Mathematical Expression Omitted].

The contour parameter [Kappa] of the algorithm selects a particular context in which to specialize subexpressions (see the condition F<[l.sub.0], n> = {<l[prime], [Rho][prime], [Kappa][prime]>[Lambda]} in the rule for applications, and the various conditions in the rule for if-expressions). Since the original copy of a lambda-expression must not be specialized to any specific contour, the transformation of its body is performed in contour ?.

To prevent uncontrolled unwinding of recursive calls, the loop map [Mu] maps label-contour pairs to either variables or the special symbol ??. If, while inlining procedure l[prime] in contour [Kappa][prime], the algorithm encounters another call site for l[prime] in contour [Kappa][prime] (see the second case of the rule for applications), it constructs a loop by binding a fresh variable to the inlined procedure at the initial call site with letrec. In fact, the algorithm introduces a letrec for every inlined procedure, whether recursive or not, but subsequent local simplification removes any unnecessary bindings. For letrec-expressions occurring in the program, and not introduced by the inlining transformation, the letrec-bound variable y is bound to ?? in the loop-map. This mapping ensures that calls to y are not transformed to calls which pass an extra closure argument, since only inlined procedures bound to letrec-expressions introduced by the transformation expect an additional closure record as their first argument.

This method of controlling unwinding avoids unrolling loops. Loop unrolling can be a valuable optimization and would be easy to include in this framework. We have intentionally avoided unrolling loops in order to isolate the benefits of inlining. Similarly, the algorithm would be simpler if we closure-converted all procedures regardless of whether they were inlined or not, but this would be expensive because all our transformations are performed at the source-language level.

5.3.3 Making Inlining Decisions. Inlining every call site that is a candidate for inlining is impractical, as it can lead to exponential code growth. To limit inlining to those sites where it is likely to be most profitable, Figure 13 requires the predicate Inline? to hold for the procedure body when specialized for the contour and loop map under consideration. This predicate estimate the sizes of the generated code for the inlined procedure at a particular call site and limits inlining to cases where the generated code is smaller than some threshold size.(6) In the next section, we present performance results for various inlining thresholds.

Table II. Code Size Ratios for Various Inlining Thresholds Ratio of object code size to original object code size for various inlining thresholds Program 50 100 200 500 1000 Lattice 0.86 0.86 0.88 1.38 2.49 Boyer 0.92 1.00 1.03 0.89 0.87 Graphs 0.74 0.80 0.75 0.73 0.72 Matrix 0.92 0.96 0.95 1.00 1.16 Maze 0.89 0.91 0.88 0.86 0.83 Splay 0.91 0.91 0.91 0.91 0.91 Nbody 1.12 1.38 1.47 2.01 2.28 Dynamic 1.41 1.53 2.15 2.25 2.25 Nucleic 0.77 0.79 1.09 1.22 1.61 Nucleic2 0.86 0.87 1.15 1.15 1.69

5.4 Experimental Results for Inlining

We have implemented a source-to-source flow-directed inlining optimization for the full [R.sup.4]RS Scheme language [Clinger and Rees 1991]. Given a Scheme program and an inline threshold T, our optimizer inlines procedures whose specialized size is estimated to be less than T.

Without access to Chez's internal closure representation, it is impossible to implement cl-ref with the same efficiency as a variable reference. Indeed, using a faithful implementation of the algorithm from the previous section, we found the premium for accessing variables via cl-ref masked the benefit of inlining. Therefore, the results presented in this section reflect an inlining strategy in which the Inline? predicate requires inlined procedures be closed up to top-level variables. Under this constraint, the inlining algorithm never generates cl-ref operations. The algorithm will still allow a procedure P that has a free variable x to be inlined at call site C if (1) x occurs in a conditional branch which can be eliminated in the specialized copy of P at C or (2) x refers to a procedure that will be inlined. In either case, a free variable in the source does not appear in the specialized version. Surprisingly, this inlining strategy leads to consistent performance improvements for most programs we have tested. For many of the benchmarks in our test suite, the performance improvements are significant. We would expect even greater improvements with an efficient implementation of cl-ref, since this would enable inlining open procedures.

We applied our optimizer to the benchmarks listed in Table I. Table II indicate code sizes after inlining for various size thresholds.(7) For example, the call (map car m) from Figures 10, 11, and 12 is inlined at thresholds above 60. Threshold 0 reflects dead-code elimination, constant folding, useless binding elimination, [[Beta].sub.v] substitution, as well as inlining of primitives and procedures used only once.

Code size ratios less than one indicate that the inlined program is smaller than the original. For example, at threshold 1000, the inlined version of Graphs is still 28% smaller than the original program. For most benchmarks, object code size grows quite slowly as the inlining threshold increases. There are two reasons for this. First, Scheme programs typically consist of many small procedures, so higher inlining thresholds expose relatively fewer opportunities for inlining. Second, even when relatively large procedures become candidates for inlining, specialization and local simplification significantly reduce the size of the inlined copy.

Figure 14 presents execution times for these benchmarks under different inlining thresholds. The leftmost bar indicates the execution times of the original programs after local simplification (i.e., threshold 0). The dark section of each bar indicates mutator time, and the shaded section indicates collector time. The y-axis of each graph measures normalized total execution time, which is indicated by the total heights of the bars. Mutator execution time, the time taken by the application when garbage collection overhead is discounted, is indicated by the tops of the dark bars. For example, at size threshold 200, Graphs shows a nearly 40% performance improvement in total execution time.

As size thresholds increase, performance either increases or remains roughly constant. This indicates that our size metric is benign, and inlining decisions rarely impact performance negatively. On average, the best performance occurs at size thresholds of about 200. At smaller thresholds, our implementation inlines small library procedures like cadr or a specialized version of map, but is unlikely to inline nontrivial user-defined procedures. At higher thresholds, more user-defined procedures are inlined, and more opportunities for specialization of these procedures become apparent. Few additional procedures are inlined at thresholds above 500; thus, we see little further performance improvement. Both Nbody and Matrix show performance improvement of roughly 10%, and Nucleic exhibits no improvement. A relatively large percentage of the execution time for these programs is spent in tight loops, and not in calls to out-of-line procedures. Hence, inlining exposes few optimization opportunities. On the other hand, Graphs, Boyer, and Maze make many out-of-line calls relative to the overall computation. Inlining these calls removes significant overhead in these benchmarks.

Garbage collection time and space allocated are not affected by inlining for most of the benchmarks in Figure 14. The exception is Graphs, which shows a reduction of 50% in collection time. This reduction is a consequence of reducing allocation by 43%. Thus inlining, in combination with simplification, can even improve space usage by eliminating intermediate data structures.

5.5 Combining Run-Time Check Elimination and Inlining

As the information computed by flow analysis is not destroyed by run-time check optimization, we can drive both run-time check elimination and inlining with the information from one analysis pass. But a more intelligent combination of the two optimizations is possible than simply applying one after the other. The specialization contours of our inlining optimization enable a stronger condition for detecting when a run-time check can be eliminated than our original condition described in Section 5.1. When a procedure containing a primitive operation is being inlined at contour [Kappa], this stronger condition need only inspect the flow function F at contour [Kappa] to determine if the primitive operation needs a run-time check.

Run-Time Check Elimination under Inlining Condition. Let [<m, [Rho], [Kappa]>.sub.[Lambda]] be an abstract closure that is being inlined and specialized to contour [Kappa]. No run-time check is necessary at primitive [Mathematical Expression Omitted] within the specialized version of m if

F<[l.sub.i], [Kappa]> [subset or equal to] AbsArgOf(p,i)

for i = 1 . . . n.

Consequently, inlining and run-time check elimination used together reduce run-time check overhead below what run-time check elimination alone can achieve.

Figure 15 presents execution times for our benchmarks under this combination of run-time check elimination and inlining. The leftmost bar indicates the original, unoptimized program. The dark section of each bar indicates mutator time; the shaded section indicates collector time; and the white section indicates run-time check overhead. lattice and Maze show the synergistic effect of combining run-time check elimination and inlining. The run-time check overhead remaining in Lattice with no inlining (threshold 0) is cut by 50% at inline thresholds of 50 or greater. Similarly, the run-time check overhead remaining in Maze is significantly reduced at thresholds of 200 or greater. Taken together, run-time check elimination and inlining driven by a polyvariant flow analysis can substantially impact program performance.(8)

6. RELATED WORK

Three independent topics related to our work have been investigated by other researchers: flow analysis, run-time check elimination, and inlining.

6.1 Flow Analysis

Flow analysis can facilitate a number of optimizations essential to implementing realistic high-level languages efficiently. Although flow analysis for imperative languages has been studied for many years [Jones and Muchnick 1979], its application to functional languages is relatively recent. Efforts at building serious implementations of functional languages that employ flow analysis are even newer. Besides our work, we are aware of only a few implementations of interprocedural flow analysis for functional languages [Ashley 1996; Flanagan and Felleisen 1995; Heintze 1994; Serrano and Feeley 1996]. All of these implementations are based on a simple monovariant (0CFA) analysis of the source program.

A well-known approach to polyvariance proposed by Shivers [1991] is to distinguish different procedure calls by call site. In a framework similar to ours, contours are finite strings of call site labels. An N-CFA analysis uses the most recent N call sites to distinguish different invocations of a procedure. While call-string-based analyses are highly precise, they also appear to be quite expensive to compute, as we illustrated in Section 5.2. Polymorphic splitting provides as good or better precision than analyses based on call strings at significantly lower cost.

Type-directed analyses have been investigated for optimizing object-oriented languages [Chambers and Ungar 1990; Diwan et al. 1996]. For example, the Self compiler performs significant program analysis to optimize method calls based on the expected type of the method's arguments. Unlike analyses based on call strings, type-directed analyses for languages like Scheme appear to offer reasonable precision with less cost.

6.2 Run-Time Check Elimination

For many years, optimizing compilers for Lisp-like languages have used type recovery procedures based on simple syntactic properties and local flow analysis to eliminate unnecessary run-time checks [Beer 1987; Kaplan and Ullman 1980; Ma and Kessler 1990; MacLachlan 1992]. These techniques can be reasonably effective [Steenkiste and Hennessy 1987], but because they are intraprocedural they do not handle higher-order functions well.

Shivers [1990] describes a type recovery method based on his polyvariant flow analysis framework using call strings. Our experiments indicate that a flow analysis based on polymorphic splitting is less costly and more effective.

Type inference is an alternative to flow analysis for recovering type information for untyped programs. Soft typing [Aiken et al. 1994; Cartwright and Fagan 1991; Wright and Cartwright 1997] uses traditional polymorphic type inference techniques, extended with union types and recursive types, to derive type information that is then used to eliminate run-time checks. Our experimental results indicate that, for at least the soft-typing system described by Wright and Cartwright [1997], flow analysis is more precise. Richer soft-typing systems such as that described by Aiken et al. [1994] which also support conditional types and subtyping may yield more precise type information, but we are unaware of implementations of these systems for Scheme. Henglein [1992] describes a tagging and checking optimization based on an extension to simple type inference. While the analysis can reduce tagging overheads in many Scheme programs, it does not consider polymorphism or union types and uses a coarse type approximation that is significantly less precise than control-flow analysis obtained via polymorphic splitting. Henglein and Rehof [1995] extend Henglein's system for optimizing run-time tagging and checking to include polymorphism and union types. They use their system to translate Scheme programs to typable ML programs, injecting values tagged into an ML datatype with variants for the different Scheme types. By inferring both tag checking and tagging coercions, their system also permits using untagged data representations (uninjected ML values). Aiken and Fahndrich [1995] show how to use set constraints to infer tagging and checking coercions. Their system subsumes Henglein's original system and is quite similar to Henglein and Rehof's system. While the information inferred by these systems is ideal for eliminating run-time checks, it is of less use for other optimizations like inlining.

6.3 Inlining

Cooper et al. [1991; 1992] studied the effects of inlining for Fortran programs. They determined inlining sites manually and were able to eliminate the majority of dynamically executed procedure calls from their benchmark programs. Furthermore, they found that other optimization opportunities revealed by inlining mitigated object code growth: object code size grew by only 10% even when inlining doubled the size of the source code. However, they were unable to demonstrate consistent speedups for the inlined programs.

In a later study [Cooper et al. 1992], they examined one program that ran 8.5% slower after inlining despite a decrease in code size of 9%. They found that inlining had forced the compiler to make pessimistic assumptions about aliasing and hence scheduled instructions poorly in an inner loop. Better instruction scheduling was possible in the noninlined program because the Fortran standard allows compilers to assume that the arguments to a procedure call do not alias.

Several differences between Scheme and Fortran may explain our better results. First, our inlined programs have smaller procedures than the inlined Fortran programs, so register pressure is less significant. Second, Scheme compilers cannot optimistically assume that parameters to procedure calls do not alias. Hence inlining introduces no additional requirements on instruction scheduling. Finally, and most importantly, as Scheme programs tend to have more procedure calls, there is more procedure call overhead to remove.

Chang et al. [1992] constructed an automatic inlining optimization for C programs. As with Cooper et al.'s experiments with Fortran, Chang et al. were able to eliminate the majority of procedure calls while increasing object code size by only 16%. They obtained consistent performance improvements that averaged about 10%.

Much attention has been devoted to reducing the overhead of dynamically bound method dispatches in object-oriented languages. Method calls that can lead to only one receiver can be replaced with direct procedure calls or inlined. Even if multiple receivers are possible, the most likely receiver can be inlined by preceding the inlined code with an appropriate conditional test. Another solution is to streamline method dispatches for the most common case using either profiling information from test runs [Grove et al. 1995] or inlining trials [Dean and Chambers 1994]. An inlining trial captures the effect of an inlining optimization at a call site in a persistent database that is consulted when making future inlining decisions at similar calls encountered elsewhere in the program or during later compiles. More recently, static analyses have been applied to either inline most dispatches or replace them with direct procedure calls [Agesen and Holzle 1995; Dean et al. 1995; Diwan et al. 1996; Pande and Ryder 1994].

Functional language compilers like Standard ML of New Jersey [Appel 1992] use syntactic heuristics to guide inlining decisions. A call site is a candidate for inlining if the called procedure is syntactically evident; typically, the function expression is a variable that is bound to a [Lambda]-expression by let. Inlining is intertwined with other syntactic optimizations, so additional inlining candidates can arise as optimization proceeds. In contrast, flow-directed inlining separates inlining decisions from post-inlining simplifications. Heuristics that estimate object code size, similar to ours, are used to control code growth. But these heuristics are necessarily more conservative, as they cannot take into account future simplifications that may take place if inlining is performed. Furthermore, because candidates for inlining are identified syntactically, procedures used in a higher-order manner will not be inlined unless other simplifications reduce the higher-order uses to syntactic ones. We believe that for programs which make heavy use of data and procedural abstraction, flow-directed inlining is likely to identify more profitable sites for inlining.

7. CONCLUSION

We have proposed a new polyvariant global program analysis that has a firm theoretical grounding. In addition, we have experimentally investigated the effectiveness of this analysis. for driving global program optimizations. Our results indicate that precise polyvariant analyses like ours can be effective for optimizing functional languages. Indeed, our experiments reveal that precise interprocedural flow information can be computed without prohibitive cost and can often be cheaper to compute than coarser information. We expect that implementations of similar analyses and optimizations for other high-level functional and object-oriented languages will yield similar results. We anticipate that further positive results will lead to the inclusion of sophisticated program analyses into next-generation optimizing compilers for advanced programming languages.

ACKNOWLEDGMENTS

We thank Henry Cejtin, Richard Kelsey, Francois Pessaux, Stephen Weeks, and the anonymous referees for many helpful suggestions.

1 The unoptimized programs were compiled by Chez Scheme under maximum safe optimization (optimize-level 2) after performing the same suite of local optimizations as for the optimized programs.

2 This semantics is not tail-recursive. It is straightforward to add a tail-call rule that pops useless bind-expressions before pushing the new activation record.

3 Our earlier papers [Jagannathan and Wright 1995; Jagannathan and Wright 1996] did not incorporate this value restriction [Wright 1995].

4 The type-directed analysis uses the outermost type constructors of the arguments to the two most recent procedure calls to determine the contour in which to analyze a procedure body.

5 The soft-typing implementation used for Figure 7 analyzes several of the benchmarks more precisely if allowed to use its own definitions of library routines such as map, append, and apply that were carefully written to avoid reverse flow. We prepended the same library routines to the benchmarks for all of the analyses in Figure 7. These routines were written in a natural style, as if they were part of the application.

6 The algorithm in Figure 13 requires constructing the specialized version of the procedure body before deciding if it should be inlined. Our implementation uses an equivalent but more efficient algorithm that estimates the size of the specialized procedure without actually constructing it.

7 Code sizes are smaller than reported in our conference paper [Jagannathan and Wright 1996] because we improved the simplifier.

8 We were unable to determine the reasons behind the unusual increases in run-time check overhead in Lattice at threshold 500 and in Nucleic at higher thresholds. We conjecture that register allocation or instruction selection led to additional spill code or cache conflicts. Indeed, when these programs are run on a similarly equipped Pentium, the magnitudes of these spikes are greatly reduced.

REFERENCES

AGESEN, O. AND HOLZLE, U. 1995. Type feedback vs. type inference: A comparison of optimization techniques for object-oriented languages. In Proceedings of the loth Annual Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, 91-107.

AIKEN, A. AND FAHNDRICH, M. 1995. Dynamic typing and subtype inference. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture. ACM Press, New York, 182-191.

AIKEN, A., WIMMERS, E. L., AND LAKSHMAN, T. K. 1994. Soft typing with conditional types. In Proceedings of the 21st Annual Symposium on Principles of Programming Languages. ACM Press, New York, 163-173.

APPEL, A. 1992. Compiling with Continuations. Cambridge University Press, New York.

ASHLEY, J. M. 1996. A practical and flexible flow analysis for higher-order languages. In Proceedings of the 23rd Annual Symposium on Principles of Programming Languages. ACM Press, New York, 184-194.

BARENDREGT, H. P. 1984. The Lambda Calculus: Its Syntax and Semantics, Revised ed. Studies in Logic and the Foundations of Mathematics, vol. 103. North-Holland, Amsterdam.

BEER, R. D. 1987. Preliminary report on a practical type inference system for Common Lisp. Lisp Pointers 1, 2, 5-11.

BULYONKOV, M. 1984. Polyvariant mixed computation for analyzer programs. Acta Inf. 21, 473-484.

CARTWRIGHT, R. AND FAGAN, M. 1991. Soft typing. In Proceedings of the Conference on Programming Language Design and Implementation. ACM Press, New York, 278-292.

CHAMBERS, C. AND UNGAR, D. 1990. Iterative type analysis and extended message splitting: Optimizing dynamically-typed object-oriented programs. In Proceedings of the Conference on Programming Language Design and Implementation. ACM Press, New York, 150-164.

CHANG, P. P., MAHLKE, S. A., CHEN, W. Y., AND HWU, W.-M. W. 1992. Profile-guided automatic inline expansion for C programs. Softw. Pract. Exper. 22, 5 (May), 349-369.

CLINGER, W. AND REES, J. 1991. [Revised.sup.4] report on the algorithmic language Scheme. Lisp Pointers 6, 3 (July), 1-55.

COOPER, K., HALL, M., AND TORCZON, L. 1991. An experiment with inline substitution. Softw. Pract. Exper. 21, 6, 581-601.

COOPER, K., HALL, M., AND TORCZON, L. 1992. Unexpected side effects of inline substitution: A case study. ACM Lett. Program. Lang. Syst. 1, 1, 22-32.

CURRY, H. B. AND FEYS, R. 1958. Combinatory Logic. Vol. 1. North-Holland, Amsterdam.

DEAN, J. AND CHAMBERS, C. 1994. Towards better inlining decisions using inlining trials. In Proceedings of the Conference on Lisp and Functional Programming. ACM Press, New York, 294-306.

DEAN, J., CHAMBERS, C., AND GROVE, D. 1995. Selective specialization for object-oriented languages. In Proceedings of the Conference on Programming Language Design and Implementation. ACM Press, New York, 93-102.

DIWAN, A., MOSS, J. E. B., AND MCKINLEY, K. S. 1996. Simple and effective analysis of statically-typed object-oriented programs. In Proceedings of the 11th Annual Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York.

DYBVIG, K. 1987. The Scheme Programming Language. Prentice-Hall, London.

FEELEY, M., TURCOTTE, M., AND LAPALME, G. 1994. Using MultiLisp for solving constraint satisfaction problems: An application to nucleic acid 3D structure determination. J. Lisp Symb. Comput. 7, 2/3, 231-248.

FELLEISEN, M. AND FRIEDMAN, D. P. 1986. Control operators, the SECD-machine, and the [Lambda]calculus. In Formal Description of Programming Concepts. Vol. 3. North-Holland, Amsterdam, 193-217.

FLANAGAN, C. AND FELLEISEN, M. 1995. Set-based analysis for full Scheme and its use in soft-typing. Tech. Rep. TR95-253, Rice Univ., Houston, Tex.

GREENGARD, L. 1987. The Rapid Evaluation of Potential Fields in Particle Systems. ACM Press, New York.

GROVE, D., DEAN, J., GARRETT, C., AND CHAMBERS, C. 1995. Profile-guided receiver class prediction. In Proceedings of the loth Annual Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, 108-123.

HEINTZE, N. 1994. Set-based analysis of ML programs. In Proceedings of the Conference on Lisp and Functional Programming. ACM Press, New York, 306-317.

HENGLEIN, F. 1992. Global tagging optimization by type inference. In Proceedings of the Conference on Lisp and Functional Programming. ACM Press, New York, 205-215.

HENGLEIN, F. AND REHOF, J. 1995. Safe polymorphic type inference for a dynamically typed language: Translating Scheme to ML. In Proceedings of the Conference on Functional Programming Languages and Computer Architecture. ACM Press, New York, 192-203.

HINDLEY, R. 1969. The principal type-scheme of an object in combinatory logic. Trans. Am. Math. Soc. 146, 29-60.

JAGANNATHAN, S. AND WEEKS, S. 1995. A unified treatment of flow analysis in higher-order languages. In Proceedings of the 22nd Annual Symposium on Principles of Programming Languages. ACM Press, New York, 392-401.

JAGANNATHAN, S., WEEKS, S., AND WRIGHT, A. 1997. Type-directed flow analysis for typed intermediate languages. In Proceedings of the 4th Static Analysis Symposium. Springer-Verlag, Berlin, 232-249.

JAGANNATHAN, S. AND WRIGHT, A. 1995. Effective flow-analysis for avoiding runtime checks. In Proceedings of the 2nd Static Analysis Symposium, Lecture Notes in Computer Science, Vol. 983. Springer-Verlag, Berlin, 207-225.

JAGANNATHAN, S. AND WRIGHT, A. 1996. Flow-directed inlining. In Proceedings of the Conference on Programming Language Design and Implementation. ACM Press, New York, 193-205.

JONES, N. AND MUCHNICK, S. 1979. Flow analysis and optimization of Lisp-like structures. In Proceedings of the 6th Annual Symposium on Principles of Programming Languages. ACM Press, New York, 244-256.

KAPLAN, M. A. AND ULLMAN, J. D. 1980. A scheme for the automatic inference of variable types. J. ACM 27, 1 (Jan.), 128-145.

MA, K. L. AND KESSLER, R. R. 1990. TICL - A type inference system for Common Lisp. Softw. Pract. Exper. 20, 6 (June), 593-623.

MACLACHLAN, R. A. 1992. The Python compiler for CMU Common Lisp. In Proceedings of the Conference on Lisp and Functional Programming. ACM Press, New York, 235-246.

MILNER, R. 1978. A theory of type polymorphism in programming. J. Comput. Syst. Sci. 17, 348-375.

MILNER, R., TOFTE, M., AND HARPER, R. 1990. The Definition of Standard ML. MIT Press, Cambridge, Mass.

MITCHELL, J. 1990. Type systems for programming languages. In Handbook of Theoretical Computer Science. Vol. B. MIT Press, Cambridge, Mass., 367-453.

NIELSON, F. AND NIELSON, H. R. 1997. Infinitary control-flow analysis: A collecting semantics for closure analysis. In Proceedings of the 24th Annual Symposium on Principles of Programming Languages. ACM Press, New York, 332-345.

PALSBERG, J. 1995. Closure analysis in constraint form. ACM Trans. Program. Lang. Syst. 17, 1, 47-62.

PANDE, H. D. AND RYDER, B. G. 1994. Static type determination for C++. In Usenix Conference Proceedings. MIT Press, Cambridge, Mass., 85-97.

PLEVYAK, J. AND CHIEN, A. 1994. Precise concrete type inference for object-oriented languages. In Proceedings of the 9th Annual Conference on Object-Oriented Programming, Systems, Languages, and Applications. ACM Press, New York, 324-340.

SERRANO, M. AND FEELEY, M. 1996. Storage use analysis and its applications. In Proceedings of the Conference on Functional Programming. ACM Press, New York.

SESTOFT, P. 1988. Replacing function parameters by global variables. M.S. thesis, DIKU, Univ. of Copenhagen, Denmark.

SHIVERS, O. 1990. Data-flow analysis and type recovery in Scheme. In Topics in Advanced Language Implementation. MIT Press, Cambridge, Mass.

SHIVERS, O. 1991. Control-flow analysis of higher-order languages or taming lambda. Ph.D. thesis, School of Computer Science, Carnegie-Mellon Univ., Pittsburgh, Pa.

STEENKISTE, P. AND HENNESSY, J. 1987. Tags and type checking in Lisp. In Proceedings of the 2nd Architectural Support for Programming Languages and Systems Symposium. ACM Press, New York, 50-59.

WRIGHT, A. 1995. Simple imperative polymorphism. LISP Symbol. Comput. 8, 343-355.

WRIGHT, A. AND CARTWRIGHT, R. 1997. A practical soft type system for Scheme. ACM Trans. Program. Lang. Syst. 19, 1, 87-152.

WRIGHT, A. K. AND DUBA, B. F. 1993. Pattern matching for Scheme. Unpublished document. Available from http://www.intertrust.com/star/wright/code.html.

ZHAO, F. 1987. An O(N) algorithm for three-dimensional N-body simulations. M.S. thesis, Dept. of Electrical Engineering and Computer Science, MIT, Cambridge, Mass.

Printer friendly Cite/link Email Feedback | |

Author: | Wright, Andrew K.; Jagannathan, Suresh |
---|---|

Publication: | ACM Transactions on Programming Languages & Systems |

Date: | Jan 1, 1998 |

Words: | 15943 |

Previous Article: | Fast algorithms for compressed multimethod dispatch table generation. |

Next Article: | Controlling generalization and polyvariance in partial deduction of normal logic programs. |

Topics: |