# Graph Rewrite Systems for Program Optimization.

1. INTRODUCTIONWriting an optimizer for a programming language is an error-prone and tedious task. To shorten development time, to facilitate validation, and to improve maintainability, a methodology for specification and generation of optimizers is highly desirable, Such a method should be apt for many subtasks of program optimization, should lead to optimizers which execute fast enough, and should not be tied to a specific intermediate language.

This paper demonstrates how graph rewriting can be used for the specification and generation of program optimizations. The central idea of our method is to regard all of the information in an optimizer as a set of relational graphs. Program objects, intermediate code instructions, and predicates of the entities of the program are represented either as nodes or relations. Program analysis and transformation are performed using graph rewriting. Typically, analyzing programs means enlarging graphs with new edges which represent the information, while code transformation means rewriting graphs by deleting and attaching subgraphs. Thus the optimization process is divided into a sequence of graph rewrite system applications, starting with the intermediate representation given by the front-end, and ending with an intermediate graph which is handed over to the back-end for code generation.

However, when trying to specify program optimization with graph rewrite systems, systems may be specified that neither terminate nor yield unique normal forms. Thus some rule-based termination criteria, termination by edge accumulation and termination by subtraction, are defined which result in the class of exhaustive graph rewrite systems. For nondeterministic systems stratification is developed which automatically selects stratified normal forms. This technique is especially important in program optimization, since negation, deletion, and addition of subgraphs disturb the convergence of the systems.

A practical method for generating optimizer components will be judged according to three criteria. First it must be able to express a broad range of program analysis and transformation problems. This is demonstrated by specifying several parts of lazy code motion [Knoop et al. 1994]. Second the generated algorithms must run efficiently so that the components can be used in practical optimizers. A uniform evaluation algorithm is presented, which works efficiently on directed sparse graphs and often results in linear or quadratic algorithms. Third the specification method must be flexible and provide an open system. Because the shape of the graphs is specified in a data model, our method can handle arbitrary source and intermediate languages. The optimizer components can be generated in standard programming languages, and can be easily integrated with existing front-ends or back-ends.

The structure of the paper is as follows. First a short example is presented, the collection of intermediate code expressions. A sketch of the code, which can be generated from the specification, is provided. Section 2 defines exhaustive graph rewrite systems. Section 3 introduces stratification. Then important parts of the lazy code motion optimization are presented, including syntactic equivalence of intermediate expressions (Section 4.2), global data-flow analysis (Section 4.3), and the transformation phase (Section 4.4). Since not all optimizations can be expressed in our framework, its limitations are discussed in Section 4.5. Section 5 deals with the uniform evaluation algorithm, states its cost for our examples, and gives a short overview of other evaluation methods. Section 6 describes practical experiences with the optimizer generator OPTIMIX, which has been developed for our method. The tool has been used to generate several prototypical optimizer parts in the compiler framework CoSy (Esprit project COMPARE) [Alt et al. 1994], and some figures concerning their speed are given. A section about related and future work concludes the paper.

1.1 Motivation

This section contains a short example, a graph rewrite system that attaches all expressions in the statement list of a procedure to the procedure object. The system consists of only one rule, COLLECT-EXPRESSIONS-1, which is drawn in Figure 1 in two forms. In the left form the left-hand side and the right-hand side are separated by the symbol ::=, which means that the left-hand side is rewritten to the right-hand side. If a rule only adds edges, a shorthand form can be used, in which the rule-invariant part is drawn only once, together with dotted added edges. Nodes are marked with their label (e.g., Block) and a number (e.g., :1) which identifies nodes with the same label on the right- and left-hand side.

[Figure 1 ILLUSTRATION OMITTED]

It is assumed that the intermediate representation provides the object types Proc, Block, Strut, and Expr, as well as the directed relations Blocks, Struts, and Exprs. COLLECT-EXPRESSIONS-1 assembles all expressions of all statements in a procedure into a directed relation AllExprs. If an edge Blocks exists between a procedure Proc:1 and a block Block:1 and there is an edge Struts from Block:1 to a statement Stmt:1 and there is an edge Exprs from Stmt:1 to an expression Expr:1, then there will be an edge AllExprs from Proc:1 to Expr:1. Thus the only task which COLLECT-EXPRESSIONS performs is to add edges with label AllExprs to the intermediate representation.

Our goal is to generate optimizers and optimizer parts from specifications automatically. Using this graph rewrite system, a practical optimizer generator should generate an executable procedure which can be embedded into a compiler. OPTIMIX is such a generator, and Figure 2 contains the layout of the code it generates for COLLECT-EXPRESSIONS. In our example it is assumed that the given relations are represented as sets of neighbor sets in objects (directed graphs with embedded neighbor sets). Thus o.a denotes the set of neighbors of object o in relation a. Since all objects, which may be matched by COLLECT-EXPRESSIONS-1, must be reached from within a procedure via the depicted relations, several loops over the relevant neighbor sets are generated. In our case the rule test condition is simple: whenever a suitable permutation of nodes is enumerated (a redex), an edge is added between the current Proc-node and the current Expr-node.(1) If the generated algorithm is investigated superficially, it has cost O([Proc||Blocks||Stmt||Expr|). However, in classical intermediate representations the relations form a tree-like structure which partitions the expressions. Thus a linear algorithm in the number of expressions results.

Fig. 2 An algorithm solving COLLECT-EXPRESSIONS.

Input: Node domains Proc. Relations Blocks, Stmts, Exprs. Output: New relation AllExprs. /* Enumerate all objects of Proc, Block, Stmt, Expr */ forall p [element of] Proc do for all b [element of] p.Blocks do for all e [element of] s.Exprs do /* Redex found. */ p.AllExprs += e;

1.2 Overview of the Method

When specifying program analysis and transformation with graph rewriting, a uniform view of the intermediate language and the analysis information is attained: everything is represented as a graph, and all tasks are performed by graph rewriting. Moreover, specifications can be split and coalesced easily, which enables modular composition of program optimizations. However, when the number of rules are increased, easily nonterminating or nondeterministic systems may be specified. Thus it is better to divide the optimization process into an application sequence of smaller graph rewrite systems (Figure 3). Each of these graph rewrite systems generates a procedure in the implementation language of the compiler, which is appropriately called from the main compiler routine. The optimization process starts with a graph given by the front-end, which is the abstract syntax tree or the intermediate code. First the program is analyzed with several graph rewrite systems which also may comprise data-flow analyses. Then the graph rewrite system for a transformation can be applied. After some iterations of analyses and transformations, the modified code is handed over to the back-end.

[Figure 3 ILLUSTRATION OMITTED]

This scheme has the following advantages. First, a suitable split into smaller graph rewrite systems isolates nonconfluence effects and/or resolves nontermination. Second, for smaller systems faster algorithms can often be generated, because these may have a lower order (Section 5). Third, if no graph rewrite system is found for a certain task, or some system turns out to be too slow, it can always be substituted with a hand-written algorithm. This also holds for the front-end and the back-end; they are implemented as usual.

To make this scheme work, all specified and hand-written algorithms have to refer to a uniform data model of the intermediate representation. Therefore the first step of the method consists of developing a data model or graph schema for the intermediate representation [Schurr 1991]. The following must be modeled:

(1) The types of graph nodes. Among these are intermediate code instructions such as expressions, statements, loops, and procedures. Also, analyzed information can be encoded in nodes of graphs, e.g., variable definitions, or expression equivalence classes.

(2) The types of graph edges (relations). These represent all relevant predicates which should be inferred from the intermediate representation. Some examples are relations such as classical flow dependencies, equivalence relations, calling relations, or control flow relations. In addition, the information which the front-end produces must be modeled, e.g., expression tree pointering or statement lists.

2. TERMINATION ORDERINGS FOR GRAPH REWRITE SYSTEMS

This section defines our concept of graph rewriting and some termination orderings for it. The resulting graph rewrite system classes can be used for program analysis as well as for certain classes of transformations.

2.1 Relational Graph Rewriting

2.1.1 Graphs. Throughout this paper rewritings of directed, relational, and labeled graphs ar discussed. Contrary to other approaches in the literature, multigraphs are not dealt with. Although several edges between two arbitrary nodes are allowed, they are required to have different labels. Thus the graphs consist of a set of nodes with several (directed) relations.

Definition ([Sigma]-GRAPHS). Let [Sigma] be a finite set of constants, the labels. [Sigma] is the disjoint union of two sets [Sigma] = [[Sigma].sub.N] [union] [[Sigma].sub.E], the node and edge labels. Then a (relational) [Sigma]-graph G = (N, E, nlab) consists of the following.

(1) N is a finite set of nodes.

(2) Nodes are labeled with labels from [[Sigma].sub.N] by a function nlab: N [right arrow] [[Sigma].sub.N]. A node with label l is called an l-node. The set [N.sub.l] = {n [element of] N|nlab(n) = l} of all l-nodes forms a node domain.

(3) The edges E are a family of binary relations [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. An edge with label l is called an l-edge.(2)

Implicitly associated with G are several functions, elab: E [right arrow] [[Sigma].sub.E] projects an edge to its label. The functions source, target: E [right arrow] N provide source and target nodes for edges, e [element of] E is incident to source(e) and target(e) while source(e) and target(e) are adjacent. All nodes adjacent to a node form its neighbor set. The function in-degree: N [right arrow] IN is the number of incoming incident edges of a node (in-degree(n) = |{e [element of] E|target(e) = n}|). Similarly, the function out-degree is the number of outgoing incident edges (out-degree(n) = |{e [element of] E|source(e) = n}|). A node with in-degree(n) = 0 is called root. A node with out-degree(n) = 0 is called sink. The functions slab, tlab: [[Sigma].sub.E] [right arrow] [[Sigma].sub.N] project a label of an edge to the labels of the corresponding source and target nodes. If necessary, the following occurs: (1) the components of a [Sigma]-graph G = ([N.sub.G], [E.sub.G], [nlab.sub.G]) are indexed, (2) the functions nlab and elab are extended to sets of items, and (3) in-degree and out-degree are restricted to specific edge labels.

Between two nodes several edges are allowed; however, their labels must be distinct. For the purposes of optimization, it is not necessary to allow several edges of the same label between two nodes: edges represent predicates of their source and target node objects, and a predicate either holds or does not hold. However, it is very important to have different relations in order to represent different kinds of information. On the other hand, although attributes for nodes may be introduced, this is left out for the sake of simplicity. If designed carefully, node attributes do not change the results obtained in this paper, because they can be seen as an extension of the node label set [[Sigma].sub.N].

Several operations on [Sigma]-graphs are standard extensions of set operations over nodes and edges. A [Sigma]-graph G = [G.sub.1] [union] [G.sub.2] is called union of [G.sub.1] and [G.sub.2], if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The [Sigma]-Graph-intersection [intersection] and the [Sigma]-Graph-inclusion [subset] are similarly defined. The difference G = [G.sub.1] - [G.sub.2] of [G.sub.1] and [G.sub.2] is ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). The set of all [Sigma]-graphs over [Sigma] is called [SG.sup.+]. The set of all [Sigma]-graphs over a fixed set of nodes N, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], forms a finite lattice with join [union], meet [intersection], and partial order [subset]. A linear acyclic subgraph G' [subset] G, [E.sub.G'] = {([x.sub.1], [x.sub.2], [l.sub.1]), ([x.sub.2], [x.sub.3], [l.sub.2]), ..., ([x.sub.n-1], [x.sub.n], [l.sub.n-1])|[inverted]A1 [is less than or equal to] i [is less than or equal to] n : [x.sub.i] [element of] [N.sub.G], [l.sub.i] [element of] [[Sigma].sub.E], [inverted]A1 [is less than or equal to] i, j [is less than or equal to] n : [x.sub.i] [is not equal to] [x.sub.j]} is cal a path in G. The function index : N x [SG.sup.+] [right arrow] IN yields the position of a node in a path, starting with 1.

A [Sigma]-Graph-morphism g : G [right arrow] H consists of two functions [g.sub.N] : [N.sub.G] [right arrow] [N.sub.H] and [g.sub.E] : [E.sub.G] [right arrow] [E.sub.H] which fulfill two conditions:

label preservation.

[inverted]An [element of] [N.sub.G] : [nlab.sub.H]([g.sub.N](n)) = [nlab.sub.G](n), [inverted]Ae [element of] [E.sub.G] : [elab.sub.H]([g.sub.E](e)) = [elab.sub.G](e)

incidence preservation.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

If both [g.sub.N] and [g.sub.E] are injective, g is called injective. If both [g.sub.N] and [g.sub.E] are surjective, g is called surjective. If g is injective and surjective, it is an isomorphism, [g.sub.N]/[g.sub.E] are often extended to sets of nodes and edges. If it is clear from the context, [g.sub.N]/[g.sub.E] are also called g. gG abbreviates g(G) to save using brackets.

A tree-shaped complete partial order on node labels could be allowed to model simple inheritance on nodes. Then the smallest element of this partial order, the root of the tree, is the most general label and type schemas for [Sigma]-graphs can be drawn as higraphs [Harel 1988]. Graph morphisms have to respect this node label order: nodes with a more general label are homomorph to nodes with a more specific label. For the sake of simplicity, this is left out.

2.1.2 Graph Rewrite Rules. To define graph rewrite rules and to allow more expressiveness for their left-hand sides, the definition of a [Sigma]-graph has to be extended:

Definition (GRAPH REWRITE RULES). A [Sigma]??-graph J = (N, E, nlab, negated) is a [Sigma]-graph, augmented by a function negated: E [right arrow] IB which tells whether an edge e of the graph is negated (negated(e) = true). J has a related [Sigma]-graph [J.sup.pos] with all negated edges stripped ([J.sup.pos] = (N, E - {e [element of] E|negated(e) = true}, nlab). The set of all [[Sigma].sup.[negation]]-graphs is called [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

A graph rewrite rule r = (L, R) consists of a left-hand side [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and a right-hand side R [element of] [SG.sup.+]. A rule invariant K = L [intersection] R [element of] [SG.sup.+], K [is not equal to] 0 (interface graph), is implicitly associated with a rule r (Figure 4).

[Figure 4 ILLUSTRATION OMITTED]

Negated edges are only allowed in left-hand sides; they are used to test the absence of subgraphs. Each rule has a name (in Figure 4, R-1) and the terms from Table I are used to refer to rule items.

Table I. Definition of Rule Items.

[N.sup.test](r) = [N.sub.L] (rule test nodes) [E.sup.test](r) = [E.sub.L] (rule test edges) [[Sigma].sup.test](r) = [nlab. (rule test node labels) sub.L] ([E.sub.L] [MATHEMATICAL EXPRESSION NOT (rule test edge labels) REPRODUCIBLE IN ASCII] [MATHEMATICAL EXPRESSION NOT (rule test labels) REPRODUCIBLE IN ASCII] [E.sup.inv](r) = [E.sub.K] (rule-invariant nodes) [N.sup.inv](r) = [N.sub.K] (rule-invariant edges) [[Sigma].sup.inv](r) = [nlab. (rule-invariant node labels) sub.K] ([E.sub.K] [MATHEMATICAL EXPRESSION NOT (rule-invariant edge labels) REPRODUCIBLE IN ASCII] [MATHEMATICAL EXPRESSION NOT (rule-invariant labels) REPRODUCIBLE IN ASCII] [N.sup.add](r) = [N.sub.R] - [N. (rule addition nodes) sub.K] [E.sup.add](r) = [E.sub.R] - [E. (rule addition edges) sub.K] [[Sigma].sup.add](r) = [nlab. (rule addition node labels) sub.R] ([N.sub.R] - [N.sub.K]) [MATHEMATICAL EXPRESSION NOT (rule addition edge labels) REPRODUCIBLE IN ASCII] [MATHEMATICAL EXPRESSION NOT (rule addition labels) REPRODUCIBLE IN ASCII] [N.sup.del](r) = [N.sub.L] - [N. (rule deletion nodes) sub.K] [E.sup.del](r) = [E.sub.L] - [E. (rule deletion edges) sub.K] [[Sigma].sup.del](r) = [nlab. (rule deletion node labels) sub.L] ([N.sub.L] - [N.sub.K]) [MATHEMATICAL EXPRESSION NOT (rule deletion edge labels) REPRODUCIBLE IN ASCII] [MATHEMATICAL EXPRESSION NOT (rule potential deletion REPRODUCIBLE IN ASCII] edge labels) [MATHEMATICAL EXPRESSION NOT (rule deletion labels) REPRODUCIBLE IN ASCII]

Definition (RULE APPLICATION). A rule r = (L, R) is applicable to a [Sigma]-graph G, the host graph, if the following conditions hold:

(t1) matching (subgraph test). Find a [Sigma]-Graph-morphism g : [L.sup.pos] [right arrow] G, called occurrence, g([L.sup.pos]) is called redex, its nodes redex nodes, its edges redex edges. Because [L.sup.pos] is used, negated edges from L are neglected, g([L.sup.pos]) is also abbreviated to gL. gL need not be an induced subgraph of G, i.e., if g([N.sub.L]) is the set of redex nodes from gL, not all linking edges of nodes in g([N.sub.L]) need to be in gL. The graph G - gL is called rest graph. Edges from the rest graph into gL may exist (hidden edges, dangling edges).

(t2) negated edge test. (e = ([n.sub.1], [n.sub.2], l) [element of] [E.sub.L], negated(e) = true) ?? [negation][exists](g([n.sub.1]), g([n.sub.2]), l) [element of] [E.sub.G], i.e., if an l-edge in [E.sub.L] is negated, between the corresponding nodes in g([N.sub.L]) no l-edge may exist.

(t3) relation completeness test. [E.sup.add](r) [is not equal to] 0 ?? [exists]e = ([n.sub.1], [n.sub.2], l) [element of] [E.sup.add](r), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], i.e., not all edges which the application of r would add between nodes with nonadded labels are already in G.

If r is applicable, r derives from G the [Sigma]-graph H by performing the following steps in order (direct derivation G [right arrow] H):

(a1) redex edge deletion. All redex edges matched by rule deletion edges are deleted: [E.sub.G] = [E.sub.G] - [g.sub.E]([E.sup.del](r)).

(a2) hidden edge deletion. All redex edges are deleted which are incident to nodes matched by rule deletion nodes: [E.sub.G] = [E.sub.G] - {([n.sub.1] , [n.sub.2], l) [element of] [E.sub.G]|[n.sub.1] [disjunction] [n.sub.2] [element of] g([N.sup.del](r))}.

(a3) node deletion. All redex nodes matched by rule deletion nodes are deleted: [N.sub.G] = [N.sub.G] - g([N.sup.del](r)).

(a4) node addition. A set of new nodes corresponding to [N.sup.add](r) is added to G: [N.sub.G] = [N.sub.G] [union] {n|m [element of] [N.sup.add](r), [nlab.sub.G](n) = [nlab.sub.R](m)} and the occurrence map g|K is appropriately extended to a mapping gt to reflect the node addition.

(a5) edge addition. A set of new edges corresponding to [E.sup.add](r) is added to G, if they are not already in G : H = ([N.sub.G], [E.sub.G] [union] {(g'([n.sub.1]),g'([n.sub.2]),l)|([n.sub.1], [n.sub.2], l) [element of] [E.sup.add]}, [nlab'.sub.G]) where [nlab.sub.G has been appropriately extended to [nlab'.sub.G]. [E.sub.G] and [E.sub.H] are sets, i.e., after the application of r there will still be only one l-edge between g'([n.sub.1]) and g'([n.sub.2]), so that H is a well-formed [Sigma]-graph.

The relation between rule and redex is illustrated in Figure 5. Occurrence g maps the left-hand side to the host graph, while the extended occurrence g' maps the right-hand side.

[Figure 5 ILLUSTRATION OMITTED]

Definition (RELATIONAL GRAPH REWRITE SYSTEM). A (relational) graph rewrite system G = (S, Z) consists of a set of graph rewrite rules S and a [Sigma]-graph Z (the axiom). The set of all (relational) graph rewrite systems is called RGRS.

Figure 6 contains an example of a graph rewrite system.(3) Rule GRS-1 adds a b-edge between two nodes with labels A and B which are linked by an a-edge (a5). GRS-1 is not applied if there is already an b-edge (t3). GRS-2 adds an e-edge between two A-nodes (a5) which are not already linked by an c-edge (t3). GRS-3 removes a C-node (a3) together with an incoming f-edge (a1) and allocates a new D-node (a4) with incoming g-edge (a5). Figure 7 shows an occurrence of GRS-2 in a possible host graph. The rule matches a noninduced subgraph containing the redex nodes A:1 and A:2.

[Figures 6-7 ILLUSTRATION OMITTED]

In this paper a specific graph rewrite approach is used. As the rewriting deals with relational graphs, it has to be ensured that rewritten graphs still contain only one edge of a certain label between two arbitrary nodes (action (a5)). Additionally, noninduced subgraph tests for left-hand sides are used (matching, condition (t1)). Testing induced subgraphs is not appropriate for the specification of optimization algorithms: if--for the application of a rule--all linking relations between nodes were to be enumerated, specifications would be rather difficult to read and write. Realistic optimizer specifications will work with an enormous amount of relations over the object types of the intermediate representation. It is estimated that a full-fledged optimizer will use about 30-100 object types and around 100-500 relations. Also, testing induced subgraphs prevents specifications from being easily extended; if new relations are added to host graphs, formerly matching rules suddenly need not match anymore [Blostein et al. 1994a]. Noninduced subgraph matching abstracts from all nonrelevant relations in left-hand sides, but if nodes are deleted, all incident edges of the node must also be deleted automatically.

As a consequence of using subgraph matching, a left-hand side that lacks a rule edge between two tested nodes [n.sub.1] and [n.sub.2] matches both redexes with or without a graph edge between g([n.sub.1]) and g([n.sub.2]). Checking the latter is possible with explicitly negated rule edges (condition (t2)).

Then termination must be ensured. This is prepared by condition (t3) which checks that all edges, which should be added between nodes with nonadded label, do not already exist in the graph. This condition emphasizes the relational aspect of our graph rewrite systems, mimicking the relational calculus: rewritings add sets of edges, even if they are already partially found in the graph; rewriting is not performed if all edges between nodes with rule nonadded label already exist. Hence, relations between nodes with nonadded label are enriched step by step up to their completion, and condition (t3) forms the basis for a general termination criterion on relational graph rewrite systems (Sections 2.2 and 2.3).

A further aspect is that none of the approaches reported in the literature seems to meet the efficiency criteria which are required for program optimization. This concerns embedding of rules, gluing conditions, and hidden edges. The rule application conditions are restricted to those that can be tested in constant time when a set of nodes of the host graph is under investigation. This is invariant rule embedding in which hidden edges are not allowed to be redirected, split, or coalesced [Blostein et al. 1994b]. Redexes always refer to a fixed number of nodes and edges. More liberal embedding mechanisms would form additional rule application conditions. Although this resembles the algebraic approach [Ehrig et al. 1990], its gluing condition is not used because it prohibits hidden edges to nodes which are deleted. Hence rule applications need not be pushouts in the category of graphs, and a rule does not always have an inverse. However, our focus is on rewriting and not on parsing.

2.1.3 Derivations. Let G = (S, Z) [element of] RGRS. A derivation in G is a sequence of [Sigma]-graphs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], linked by direct derivations with the rules [r.sub.i] [element of] S. Then [G.sub.n] is derivable from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. If there is no r [element of] S which can be applied to [G.sub.n], [G.sub.n] is called a normal form of G. A derivation is called terminating if n is finite, otherwise nonterminating. If all derivations in G are terminating, G is also called terminating.

Two alternative direct derivations of rules [r.sub.1] = ([L.sub.1], [R.sub.1]) and [r.sub.2] = ([L.sub.2], [R.sub.2]) from S are called independent, if they can be interchanged or lead to a direct common reduct: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the reflexive closure of [right arrow]. Otherwise the direct derivations are called an overlap.

The following definitions hold for G and its direct derivation relation [right arrow]. G is called strongly confluent, if [inverted]A [G.sub.1], [G.sub.2], [G.sub.3] [element of] [Sigma]-graph: [G.sub.2] [left arrow] [G.sub.1] [right arrow] [G.sub.3] ?? [exist]H : [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], i.e., if all pairs of direct derivations are independent. G is called confluent, if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. G is called convergent if it is terminating and confluent [Dershowitz and Jouannaud 1990]. Such systems deliver a unique normal form [Newman 1942]. Graph rewrite systems with several normal forms are called nondeterministic.

2.2 Termination by Edge Accumulation

A set of graph rewrite rules S terminates on an axiom Z, if every derivation step adds some edges to a particular relation and if the relevant node domains do not grow. As only one edge with a certain label between two nodes is allowed, the derivation process must stop when this relation is complete.

To this end, a specific kind of rule is defined, an edge-accumulative rule. Such a rule adds at least one edge which is only incident to nodes with nonadded rule labels. The labels of these edges, the edge-accumulative-termination labels, denote a set of host graph relations, the termination-relations, which are being completed during the applications of the rule. Together with the incident host graph nodes, these relations form the rule's termination-subgraph.

Definition (EDGE-ACCUMULATIVE RULE). Let G = (S, Z) [element of] RGRS. Define for r = (L, R) [element of] S the set of termination-subgraph node labels

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the set of edge-accumulative-termination labels

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the set of termination-subgraph nodes in Z

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

and the termination-relations in Z

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], i.e., there are rule addition edges which are only incident to nodes with rule nonadded labels, r is called edge-accumulative.

An edge-accumulative rule r with [N.sup.add](r) = [N.sup.del](r) = [E.sup.del](r) = 0 is called an edge-addition rule, because it only adds edges.

For a termination-subgraph of the host graph it is required that its node set is invariant. This complicates the definition of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]: edge-accumulative termination must be be based on rule labels, and not on rule items. For instance, it is not sufficient to base the definition on added edges which are incident to rule-invariant l-nodes, because there could be other l-nodes added by the rule which cause the termination subgraph to grow infinitely. Additionally, the termination relations have to be augmented monotonically, so that edge-accumulative-termination labels must neither belong to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] nor [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

To ensure termination of a rewrite system, it is important that each rule completes such a termination-subgraph, that no other rule adds nodes to the termination-subgraph of another rule, nor deletes items from it. This behavior is captured in the definition of edge-accumulative graph rewrite systems:

Definition (EDGE-ACCUMULATIVE GRAPH REWRITE SYSTEM). G = (S, Z) [element of] RGRS is called edge-accumulative (an AGRS) if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [inverted]A [r.sub.1] = ([L.sub.1],[R.sub.1]),[r.sub.2] = ([L.sub.2],[R.sub.2]) [element of] S:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

i.e., all rules are edge-accumulative, every rule has an edge-accumulative termination label that is not deleted by any other rule, and no rule adds nodes to the termination-subgraph of another rule. Then define the set of edge-accumulative-termination labels of G

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and the termination-subgraph of G

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The set of all edge-accumulative graph rewrite systems is called AGRS.

A graph rewrite system with only edge-addition rules is called edge-addition rewrite system (EARS). The set of all edge-addition rewrite systems is called EARS.

In edge-accumulative graph rewrite systems, a subgraph of the axiom can be identified, the termination-subgraph. Its nodes carry labels from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], its edges from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Rules may manipulate arbitrary nodes and edges, except that each of them only adds edges to the termination-subgraph. Thus the termination-subgraph is being completed, and the system terminates. EARS are very specific graph rewrite systems because they only add edges to graphs [Assmann 1994]. Clearly EARS [subset] AGRS can be asserted.

Theorem (TERMINATION OF AGRS). Let G = (S, Z) [element of] AGRS. G terminates. If T = ([N.sub.T], [E.sub.T], [nlab.sub.T]) [subset] Z is the termination-subgraph of G and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the set of edge-accumulative-termination labels, a derivation in G has a maximum of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] steps.

PROOF. (1) G [element of] EARS. A derivation [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of G forms an ascending chain in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] because the rules of 6 only add edges. Due to rule addition action (a5) edges are only added once. Thus every direct derivation leads to another [Sigma]-graph in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Because [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is finite, the chain of the derivation is noetherian. Thus G terminates.

(2) G [element of] AGRS-EARS. Because G adds and deletes items, the inputs of direct derivations need not be subgraphs of the results, and the derivable graphs do not form a lattice. However, every graph derivable from Z contains a subgraph which is restricted to the termination-subgraph nodes [N.sub.T] [subset] [N.sub.Z]. Consider this set of subgraphs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and an arbitrary direct derivation G [[right arrow].sub.r] H. G has a subgraph [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. According to the definition of edge-accumulative graph rewrite systems, no rule adds nodes to the termination subgraph of any other rule, nor deletes edges from the termination relations of other rules. Hence G [[right arrow].sub.r] H adds at least one edge to a termination-relation of G' and otherwise leaves G' untouched.

Hence H has a subgraph [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] which is the result of applying r to G'. It holds G' [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] H' in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Subsequent direct derivations will never remove edges added by r nor add nodes to [N.sub.T]. Therefore each derivation d = (Z = [G.sub.0] [right arrow] [G.sub.1] [right arrow] ... [right arrow] [G'.sub.n]) in G has an associated ascending chain of subgraphs of the [G.sub.i] in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This sequence is always finite because [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is finite.

Due to application condition (t3) the rewrite process stops at the latest when [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is reached: rules fail to match when termination-relations are complete. (t3) guarantees that after the completion of the termination subgraph no additional redexes can be found in the manipulated graph. This could be the case if there were a rule that only tests an edge from the termination subgraph but does not modify it, and instead modifies only parts out of the termination subgraph. This is prevented, since (t3) tests relational completeness for all relations that are candidates for termination subgraphs. Then, if a set of termination relations exists, the rewriting stops at the latest when the termination subgraph is completed. Thus d terminates, and hence G.

In both cases the number of edges in a complete graph over [N.sub.T] [subset] [N.sub.Z] with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] relations is limited by [MATHEMATICAL EXPRESSIONS NOT REPRODUCIBLE IN ASCII]. Because every direct derivation adds an edge, its maximum length is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

As an example consider the graph rewrite system in Figure 1. Because [N.sup.add](COLLECT-EXPRESSIONS-1) = 0 = [N.sub.del](COLLECT-EXPRESSIONS-1) = [E.sup.del](COLLECT-EXPRESSIONS-1), COLLECT-EXPRESSIONS is an EARS. It holds [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and COLLECT--EXPRESSIONS terminates on a finite intermediate representation.

Figure 8 details this with two other examples. AGRS can be used to look for patterns in a graph and to allocate an object for each pattern. If no new patterns are created the rewrite process will stop. CREATE-REGISTERS allocates a virtual-register object for each expression which represents the result of the expression. It can be seen that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Figure 9 depicts how a derivation in CREATE-REGISTERS behaves: each rule application adds an edge to the termination-subgraph, while its nodes are left untouched. In essence, found patterns can be marked by termination-relations; when these are complete, rewriting stops. In contradistinction CREATE-EXPRESSIONS additionally creates Expr-nodes. Thus there is no termination-relation and a derivation may be nonterminating.

[Figures 8-9 ILLUSTRATION OMITTED]

2.3 Termination by Subtraction

Also edge or node deletion can be used to achieve termination. Similar to AGRS a termination-subgraph may be identified from which items are subtracted until the system stops. However, with subtraction there are two cases. A rule may be only edge-subtractive, i.e., it deletes only edges from its termination-subgraph. Secondly, a rule may be node-subtractive, which means that additionally nodes from the termination-subgraph are deleted.

Definition (EDGE- AND NODE-SUBTRACTIVE RULE). Let G = (S, Z) [element of] RGRS. Define for r = (L, R) [element of] S the set of edge-termination-subgraph node labels

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the set of edge-subtraction-termination labels

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the set of node-subtraction-termination labels

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the set of termination-subgraph nodes in Z

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

the termination-relations in Z

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], r is called edge-subtractive. If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], r is called node-subtractive.

In the edge-subtractive case, node labels that are neither added nor subtracted by a rule are marked by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Subtracted edges that are incident to such nodes form the termination-relations. Hence, nodes and edges in the termination-subgraph carry labels from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In the node-subtractive case, termination results from node labels that are deleted and not added by a rule (marked by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Hence, nodes and edges in the termination-subgraph carry labels from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For rule systems, other rules of the system are required not to add items to a rule's termination-subgraph. The union of all rule-specific termination-subgraphs yields the system's termination-subgraph; its nodes and edges carry labels from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (edge-subtractive systems) or from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (subtractive systems):

Definition (EDGE.-SUBTR. AND SUBTRACTIVE GRAPH REWRITE SYSTEM). Define for G = (S, Z) [element of] RGRS the set of edge-subtraction-termination labels of G

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

G = (S,Z) [element of] RGRS is edge-subtractive (an ESGRS), if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [inverted]A [r.sub.1] = ([L.sub.1],[R.sub.1]),[r.sub.2] = ([L.sub.2], [R.sub.2]) [element of] S :

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

i.e., all rules are edge-subtractive and every rule has an edge-subtraction-termination label which is neither added by any other rule nor has incident node labels which are added by another rule.

G = (S, Z) [element of] RGRS is subtractive (an SGRS), if all rules are edge-subtractive or node-subtractive and [inverted]A [r.sub.1] = ([L.sub.1], [R.sub.1]), [r.sub.2] = ([L.sub.2, [R.sub.2]) [element of] S:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

i.e., every rule has a subtraction-termination label which is not added by any other rule.

Then define the termination-subgraph of G to be the subgraph of Z

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The set of all edge-subtractive graph rewrite systems is called ESGRS. The set of all subtractive graph rewrite systems is called SGRS. Clearly ESGRS [subset] SGRS holds. The union of edge-accumulative and subtractive graph rewrite systems is called the class of exhaustive graph rewrite systems XGRS = AGRS [union] SGRS.

In an edge-subtractive system, every rule deletes at least one edge with a label from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] from the termination-subgraph. Hence these systems form the counterpart of edge-accumulative systems. In a subtractive system, a rule may also delete nodes from the termination-subgraph. At latest when this subgraph is empty, the system terminates.

Theorem (TERMINATION OF SGRS). Let G = (S, Z) [element of] SGRS. G terminates. If T = ([N.sub.T], [E.sub.T], [nlab.sub.T]) [subset] Z is the termination-subgraph of G and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the set of edge-subtraction-termination labels, a derivation in G has a maximum of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] steps.

PROOF. The number of nodes in T [subset] Z is |NT|, and the number of edges with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] edge-subtraction-termination labels is maximally [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus the length of any derivation in G may be maximally [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

Consider Figure 8. If rule CREATE-REGISTERS-1 is reversed, Figure 9 displays an example derivation whose direction is reversed: each rewrite step is (node- and edge-)subtractive [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. As a subtractive graph rewrite rule, which is not edge-subtractive, consider GRS-3 from Figure 6. This rule is neither edge-accumulative [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] nor edge-subtractive [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] but node-subtractive [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

2.4 The Nature of Exhaustive Graph Rewrite Systems

Exhaustive graph rewrite systems rely on the fact that all redex parts in the termination-subgraph of the host graph are reduced step by step. The termination-subgraph is either completed or consumed. Since the termination criteria rely only on the features of the rules they can be decided on in an optimizer generator.

Edge-accumulative systems may create new redex parts in the termination-subgraph, but there will be at most as many of them as the number of edges in the termination-subgraph. Subtractive systems are even more restricted: they do not create subredexes in the termination-subgraph but destroy them. Therefore exhaustive graph rewrite systems can only be used to specify algorithms which perform a finite number of actions depending on the size of the host graph. In fact this is the case for many code optimizations, especially for code motion and replacement algorithms: after calculating the places where code transformations occur, they transform each of the places once. Thus they reduce a number of redexes in proportion to the size of the host graph, and the transformation stops after a finite number of actions. On the other hand, optimizations, for which the number of redexes is not dependent on the size of the intermediate representation, cannot be proven as terminating with our method. This is the case for example for peephole optimization.

EARS have already been investigated [Assmann 1994]. EARS which do not use negation on their left-hand sides are not only terminating, but also strongly confluent, and thus convergent. Because EARS only add edges, they operate on the complete lattice of relations over the nodes of the axiom, Each direct derivation enlarges a relation, and a derivation forms an ascending chain in the lattice. Because rule applications can be interchanged arbitrarily, each rule models a distributive function. Finding a normal form of an EARS is similar to finding a fixpoint in a distributive data-flow framework: all rules are applied to the host graph until a fixpoint, the normal form, is reached. Thus an EARS may be used to model a distributive data-flow framework [Assmann 1994].

3. STRATIFICATION OF GRAPH REWRITE SYSTEMS

If an exhaustive graph rewrite system is not confluent, it has several normal forms. Often it is useful to select one of these automatically. To this end, the concept of stratification is transferred from Datalog to graph rewrite systems [Ceri et al. 1989b], Stratification orders the rules into a list of rule sets (the strata). When the rules are executed in this order they yield one unique result.

Stratification takes the dependencies of the rules into account, While in Datalog these are due to negation, in graph rewrite systems they are additionally caused by addition and deletion of items. Here the major idea is to enlarge the host graph as much as possible before rules are applied that delete items or test negation, With this heuristic, subgraphs are concluded to be absent only if they really cannot be derived. In a lot of cases this turns out to yield a rather natural semantics of the rewrite system.

3.1 Rule Dependencies in Graph Rewrite Systems

A rule of a graph rewrite system may have several effects on a node domain or a relation: test (t), test absence (negation) (n), add (a), and delete (d). Additionally, if the set of edge-accumulative termination labels is nonempty, the add-effect can be subdivided into the effects edgeadd ([bar]a), edgeacc ([bar]a), and restadd (a).

Definition (LABEL EFFECTS OF A RULE). Let G = (S, Z) [element of] RGRS, r [element of] S, l [element of] [Sigma]. Let EFF = {t, n, a, a, a, a, d} be a set of relations between rules and labels defined as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Label effects denote the following, t(r, l) means that a rule r tests a label l [element of] [Sigma]. n(r, l) means that a rule r tests absence of an edge with label l [element of] [[Sigma].sub.E]. a(r, l) means that a rule r adds a label l [element of] [Sigma]. The designated set of labels is further subdivided by [bar]a(r, l), [bar]a(r, l), and a(r, l). [bar]a(r, l) denotes all added edge labels of an edge-additive rule r. [bar]a(r, l) means that a rule r adds an edge-accumulative termination label [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], while the set of added nodes is not empty, a(r, l) means that a rule r adds a label l [element of] [Sigma], which is not from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. d(r, l) means that a rule r deletes a label l [element of] [Sigma], or potentially deletes an edge label from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

A dependency between two rules consists of a pair of label effects, related to the same label: the first component specifies the effect of the first rule on a node domain or relation, the second component the effect of the second rule. A dependency concerning a certain label relates two rules [r.sub.1 and [r.sub.2] as follows. When [r.sub.1] is applied, there may be no effect on the redex of [r.sub.2] (harmless influence). Obviously, the two direct derivations can be interchanged. Second, a redex for [r.sub.2] may be created (i.e., [r.sub.2] is supported). Also then, the two direct derivations do not hinder that a direct common reduct exists. Third, the redex of [r.sub.2] may be deleted so that the direct derivations cannot be interchanged and a common reduct cannot be directly derived. Then [r.sub.2] is hindered.

Table II considers the all pairs of Pedex effects for all dependencies. In the case of pairwise-harmless dependencies both rules do not have an effect on each other (at least for one label). In the case of supporting dependencies, only one of the rules supports the other. In the case of pairwise-supporting dependencies, both rules support each other. For dangerous rule dependencies, only one rule of the pair hinders the other rule such that no common reduct exists. Lastly, the nonstratifiable dependency indicates that both rules hinder each other.

[TABULAR DATA II NOT REPRODUCIBLE IN ASCII]

These dependencies form relations between the rules and span up the rule dependencygraph (RDG) of a graph rewrite system.

Definition (RULE DEPENDENCY GRAPH). Let G = (S, Z) [element of] RGRS, l [element of] [Sigma]. The following defines a set of rule dependency relations for a pair of effects x, y [element of] EFF and two rules [r.sub.1], [r.sub.2] [element of] S:

[[Delta].sub.xy]([r.sub.1], [r.sub.2]) :?? [exist]l [element of] [Sigma] : x([r.sub.1], l), y([r.sub.2], l).

The relations [[Delta].sub.tt], [[Delta].sub.tn], [[Delta].sub.nt], [[Delta].sub.nn], [[Delta].sub.aa], [[Delta].sub.aa], and [[Delta].sub.aa] are called pairwise-harmless. The relations [[Delta].sub.ta], [[Delta].sub.at], [[Delta].sub.nd], and [[Delta].sub.dn] are called supporting. The relations [[Delta].sub.ad], [[Delta].sub.da] are called pairwise-supporting. The relations [[Delta].sub.ta], [[Delta].sub.at], [[Delta].sub.an], and [[Delta].sub.na] are called dangerous. The relations [[Delta].sub.dd], [[Delta].sub.[bar]aa], [[Delta].sub.[bar]aa], and [[Delta].sub.[bar]aa] are called nonstratifiable.

Then the rule dependency graph (RDG) of G is the [Sigma]-graph

RDG(G) = (S, [[Delta].sub.at] [union] [[Delta].sub.dn] [union] [[Delta].sub.da] [union] [[Delta].sub.ad] [union] [[Delta].sub.td] [union] [[Delta].sub.an], [union] [[Delta].sub.dd] [union] [[Delta].sub.[bar]aa] [union] [[Delta].sub.[bar]aa] [union] [[Delta].sub.[bar]aa], 0)

The key idea of stratification is to analyze the RDG to divide the rule set into a linear list of rule classes which are executed each after the other. The ordering of the classes fulfils the following requirements: first, addition should be applied not later that testing, and deletion not later that negation. In this way, chances for rules that have a testing or negation effect are enlarged. Second, deletion should be applied after testing. Then chances for rules that that have a testing effect are enlarged. Third, addition should be applied before negation, since only graphs should be derived from the axiom that really cannot be derived. All in all, this strategy enlarges the host graph as much as possible, then applies deletion and finally negation. It has the effect that as many subgraphs as possible are derived from the axiom and the nonexistence of a subgraph is concluded only if it is really not derivable. This leads to normal forms that are rather natural, since for testing rules chances are increased, and negation is only applied, if the items cannot be derived or can be deleted.

The rule dependency classes influence the stratification process as follows (summarized in Table II). Pairwise-harmless dependencies between rules can be neglected for the stratification because they do not prevent that corresponding direct derivations are independent (Section 2.1.3). Thus they are not considered for the RDG. In particular, the dependency relations [[Delta].sub.aa], [[Delta].sub.aa], and [[Delta].sub.[bar]aa] do not create overlaps: either one rule adds items which do not destroy the redex of the other rule, or both rules are edge-additive; then the direct derivations yield a direct common reduct. Hence these dependencies do not disturb strong confluence. All other pairs with edgeacc- and restadd-effect (e.g., [[Delta].sub.aa] or [[Delta].sub.aa]) can be subsumed in the relations with add-effect.

Also supporting dependencies do not lead to overlaps. However, they have to be considered in the stratification because they generate redexes for other rules. Since the adding rule should be applied not later than the testing rule and the deletion rule not later than the negation rule, [[Delta].sub.at] and [[Delta].sub.dn], are included in the RDG. In a pairwise-supporting dependency both rules support each other; hence both [[Delta].sub.da] and [[Delta].sub.an] are included in the RDG.

Dangerous dependencies indicate that their rules may form overlaps, in which the application of one rule destroys the redex of the other so that no common reduct exists, but not vice versa. For testing rules that are hindered, the hindering rule should be applied later. Hence [[Delta].sub.td] is included in the RDG. For negation rules that are hindered by addition, the hindering rule should be applied earlier, since the nonexistence of subgraphs should be concluded only if it really cannot be derived. Hence [[Delta].sub.an] is included in the RDG.

The dependencies [[Delta].sub.dd], [[Delta].sub.aa], [[Delta].sub.aa], and [[Delta].sub.aa] are not stratifiable, because the rules may produce overlaps in which the application of each rule destroys the redex of the other so that no common reduct exists (mutual exclusion). This means that no reasonable order for a stratification can be found. While this clearly holds for pairs of deletion effects, it also holds for certain pairs with edge-accumulation effects. If the rules add edge-accumulative termination edges together with other items each rule may block the other: edge-accumulative termination enforces that when a rule reduces a subredex in the termination subgraph, another competing rule cannot be applied anymore (rule application condition (t3)).

3.2 Stratification of Relational Graph Rewrite Systems

This section formalizes the conditions for a stratification and gives some fundamental theorems about normal forms in stratified graph rewrite systems.

Definition (STRATIFICATION OF A RGRS). Let G = (S, Z) [element of] RGRS. A stratification of G is a partition of S into a list of sets of rules (strata) [[S.sub.1], ..., [S.sub.n]], so that the following conditions hold for the relations of RDG(G): Let [r.sub.1],[r.sub.2] [element of] S. If [r.sub.1] [element of] [S.sub.i] [conjunction] [r.sub.2] [element of] [S.sub.j], then

(1) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(2) [[Delta].sub.at]([r.sub.1], [r.sub.2]) [disjunction] [[Delta].sub.dn]([r.sub.1], [r.sub.2]) ?? i [is less than or equal to] j

(3) [[Delta].sub.da]([r.sub.1], [r.sub.2]) [disjunction] [[Delta].sub.ad]([r.sub.1], [r.sub.2]) ?? i [is less than] j

(4) [[Delta].sub.td]([r.sub.1], [r.sub.2]) [disjunction] [[Delta].sub.an]([r.sub.1], [r.sub.2]) ?? i [is less than] j.

If there is a stratification for S, G is called stratifiable. The set of all stratifiable graph rewrite systems is called STRGRS.

Additionally, if [r.sub.1] [is not equal to] [r.sub.2] is assumed, we have a weak stratification, and G is called weakly stratifiable. The set of all weakly stratifiable graph rewrite systems is called WSTRGRS.

Condition (1) excludes the fact that a stratifiable graph rewrite system contains nonstratifiable rule dependencies. Conditions (2) and (3) mean that if there is a supporting dependency between a rule [r.sub.1] and a rule [r.sub.2], then [r.sub.1] must be evaluated in an earlier or in the same stratum as [r.sub.2]. Since [[Delta].sub.ad] and [[Delta].sub.da] are symmetric, both rules have to be evaluated in the same stratum. Condition (4) means that if there is a dangerous dependency between [r.sub.1] and [r.sub.2], then [r.sub.2] must be evaluated in a later stratum than [r.sub.1]. This implies that the loops of the RDG do not contain dangerous dependencies. If so, rules, which delete parts of the host graph or test the nonexistence of parts, depend upon each other recursively, and the stratification heuristic cannot find an execution order.

Of course, if dangerous self-dependencies exist, stratification is not possible. This is always the case if a rule deletes an item, since a deleted item is also tested, but need not be the case for addition/negation. If additionally [r.sub.i] [is not equal] [r.sub.2] is required, self-dependencies are not considered in the RDG, and weak stratification may be possible. Clearly STRGRS[subset]WSTRGRS holds.

Theorem (STRATIFIED CONVERGENCE). Let G = (S, Z) [element of] XGRS[intersection]STRGRS, and let T = [[S.sub.1], ..., [S.sub.n]] be a stratification of S. Then [S.sub.i], 1 [is less than or equal to] i [is less than or equal to] n, yields a unique normal form.

Let G = (S, Z) [element of] XGRS[intersection]WSTRGRS, and let T = [[S.sub.1], ..., [S.sub.n]] be a weak stratification of S. Suppose every rule of [S.sub.i] does not create overlaps with itself in an arbitrary derivation. Then [S.sub.i] is strongly confluent and yields a unique normal form.

If every stratum [S.sub.i] [element of] T has a unique normal form and the strata are computed in stratification order, the whole process selects a single normal form, the stratified normal form. Then G is called stratified-convergent (resp. weakly stratified-convergent).

PROOF. Since G is terminating, all strata of G must terminate. Within a stratum only pairwise-harmless and supporting rule dependencies are allowed. Rules with such dependencies never form overlaps, i.e., their direct derivations are always independent (see Section 2.1.3). For weakly stratifiable systems which do not consider self-dependencies, it must be required additionally that each rule does not produce overlaps with itself. Then all pairs of direct derivations from rules in the stratum are independent; the stratum is strongly confluent and therefore yields a unique normal form.

If all strata of G yield a unique normal form, and the strata are computed in stratification order, the whole derivation process also delivers a single normal form. []

The above heuristic is a generalization of the stratification of Datalog?? to graph rewrite systems. The rule dependency graph of a Datalog??-program results as a special case of our RDG and only contains relations with add- and negation-effect on edges [Ceri et al. 1989a].

Additionally, weak stratification has been introduced. In such a system, the stratification process deals well with overlaps created by different rules, but the absence of overlaps created from a single rule can only be decided if a host graph is given. For instance, if a rule deletes items and can be applied competitively on two overlapping subgraphs, these form an overlap, and the corresponding stratum does not yield a unique normal form. Thus a weak stratification on the rule set may be possible, but whether or not a single stratified normal form exists can only be checked by investigating all direct derivations on the host graph. As a nonstratifiable system may be weakly stratifiable, it is expected that weakly stratifiable systems allow a greater range of optimizations to be specified. Additionally, experienced users may be able to assert whether single rules may create overlaps because they know about the shape of the intermediate representation. Even if this is not possible, weak stratification reduces the number of normal forms enormously.

If edge-accumulation was not used for termination the stratification conditions could be simpler. Then rule application condition (t3) could be dropped so that edge-accumulative rules would never create overlaps with others. Then for stratification, it would not be necessary to consider the subclasses of the a-effect, and only [[Delta].sub.dd]-dependencies would be nonstratifiable.

It is possible to stratify a rule dependency graph according to a different heuristic. For instance, rules could be defered that hinder other rules. Then [[Delta].sub.an] would be included in the RDG in its inverse form [[Delta].sub.na], and a different normal form would result, although still being unique.

Similar to Datalog??, all stratifications of a stratified-convergent XGRS lead to the same normal form. Thus such a system is a function:

Theorem (UNIQUENESS OF STRATIFIED NORMAL FORMS). Let G = (S, Z) [element of] XGRS[intersection]STRGRS, and let [T.sub.1] and [T.sub.2] be two arbitrary stratifications of S. Then [T.sub.1] and [T.sub.2] yield the same normal form.

Let G = (S, Z) [element of] XGRS[intersection]WSTRGRS and let [T.sub.1] and [T.sub.2] be two arbitrary weak stratifications of S. Suppose that every rule of S does not create overlaps with itself in an arbitrary derivation. Then [T.sub.1] and [T.sub.2] yield the same normal form.

For the following, let [RDG.sub.dang](G) = (S, [[Delta].sub.td][union][[Delta].sub.an] ??) be the subgraph of RDG(G) with all dangerous dependencies and [RDG.sub.undang] (G) = (S, [[Delta].sub.at] [union] [[Delta].sub.dn] [union] [[Delta].sub.ad], ??) be the subgraph of RDG(G) with all nondangerous dependencies. According to the stratification conditions, [RDG.sub.dang](G) is acyclic while [RDG.sub.undang] may be cyclic. However, all nodes in a cycle in the latter graph need to be grouped into one stratum. Hence we can abstract a cycle into one single node for stratification, and can refer to the acyclic condensation of [RDG.sub.undang] (G), AC.

PROOF. The proof has several parts (see Figure 10):

[Figure 10 ILLUSTRATION OMITTED]

(1) It is shown that every stratification T has associated at least one canonical stratification T' with the same normal form (lemma NORMAL TO CANONICAL STRATIFICATIONS). The strata of T' are the strongly connected components (SCC) in the RDG, i.e., the nodes in the acyclic condensation AC of the RDG. Canonical stratifications can be produced by a transformation process which splits off nodes from strata if they are not involved in an NCC of [RDG.sub.undang]. The set of canonical stratifications of T is called [T]; all of them have the same normal form.

(2) Canonical stratifications have the same strata but differ in their ordering. However, all topological sortings of AC--which correspond to the canonical stratifications--have the same normal form. This can be proved by a second transformation algorithm which transforms them into each other while preserving the normal form (lemma CANONICAL TO CANONICAL STRATIFICATIONS). Hence, arbitrary stratifications [T.sub.1] and [T.sub.2] whose sets of canonical stratifications do not overlap ([[T.sub.1]] [intersection] [[T.sub.2]] = 0) have the same normal form which had to be shown.

The proof for stratifications also holds for weak stratifications, supposed every rule does not create overlaps with itself. []

Lemma (NORMAL TO CANONICAL STRATIFICATIONS). Let G = (S, Z) [element of] XGRS[intersection]STRGRS and let T be an arbitrary stratification of S. A stratification is called canonical if its strata are the strongly connected components of the [RDG.sub.undang] (G).

Then T can be transformed to a canonical stratification T' with the same normal form. T' need not be unique. T and its associated set of canonical stratifications [T] have the same normal form.

PROOF. The proof is constructive. A canonical stratification has minimal-sized strata and a maximal number of them, since its strata cannot be split. They appear in two forms:

Multinode SCC. When a node (a rule) is involved in a loop of [RDG.sub.undang] (G) it belongs together with all other nodes of the SCC to the same multinode stratum. For all nodes [r.sub.1], [r.sub.2] in the stratum holds that [r.sub.1] is reachable in [RDG.sub.unadang] (G) from [r.sub.2] and vice versa, as required for the SCC.

Singlenode SCC (Trivial SCC). When a node does not belong to a loop of the [RDG.sub.undang](G), it must be alone in a stratum, since the subgraphs of [RDG.sub.undang](G), which are not contained in loops, are acyclic and do not belong to the same SCC.

In an arbitrary stratification, due to the SCC feature, all nodes/rules of an SCC must be contained in a single stratum. However, a stratum may contain other nodes which do not belong to the SCC. Hence, an SCC may be a real subgraph of a stratum but not vice versa. Define a fat stratum to a stratum which is a real supergraph of an SCC (Figure 11). Transform an arbitrary stratification by splitting fat strata as follows. When all strata are nonfat, i.e., SCC, a canonical stratification is reached.

[Figure 11 ILLUSTRATION OMITTED]

(1) Nodes can be split off from fat strata when they are not linked to any other node of the stratum. Then all derivations of G are unchanged by the split, and the normal form is preserved.

(2) Nodes can be split off when they do not belong to a multinode SCC. In particular, this holds for root or sink nodes of a fat stratum which do not belong to a multinode SCC. All derivations in G under the old stratification are still possible under the transformed stratification, since the different strata grouping of rules linked by undangerous dependencies preserves the normal form (Figure 12). When such a root/sink node is split off of a fat stratum a set of new root/sink nodes of the transformed stratum results, which can be split off again, or a nonfat stratum is reached.

[Figure 12 ILLUSTRATION OMITTED]

In a canonical stratification, only a finite number of redexes for the split-off transformation exists. Since each split-off destroys one of these redexes, the transformation terminates, although it is not confluent. However, all resulting canonical stratifications (the set [T]) have the same normal form, since every split-off preserves the normal form of the original stratification T. []

Additionally, every canonical stratification corresponds to a topological sorting (topsort) of the nodes of AC. This is due to the stratification conditions and leads to the following lemma.

Lemma (CANONICAL TO CANONICAL STRATIFICATIONS). Let G = (S, Z) [element of] XGRS[intersection]STRGRS. Then two canonical stratifications [T.sub.1] and [T.sub.2] have the same normal form.

PROOF. Canonical stratifications build on the acyclic condensation AC of the RDG and have the same number of nodes. They can only differ in their stratum order, and this order must be a topsort of AC, since only a topsort respects the stratification conditions. (A topsort of an acyclic directed graph G is a total order [is less than] on the nodes such that if ([n.sub.1], [n.sub.2]) [element of] G then [n.sub.1] [is less than] [n.sub.2]). Hence, topsorts of AC and canonical stratifications correspond one-to-one to each other. For the proof, consider only topsorts with consecutive numberings 1, ..., |[N.sub.AC]|. For brevity, each node of the AC is identified with a stratum.

First of all, if there are multiple roots of AC, or several nonconnected subgraphs, create an artificial root and link all roots of subgraphs as children to it.(4) Such an artificial root can be thought of as an identity rule which does not modify the host graph and does not change normal forms of any canonical stratification.

Fix an arbitrary topsort of the AC as canonical stratification T. Choose another canonical stratification [T.sub.1] based on another topsort. It will be shown that by a systematic transformation, [T.sub.1] can be transformed into T, without changing the normal form of [T.sub.1]. The transformation is applied in a forward sweep over AC in which nodes/strata are renumbered that appear "on the same level". The algorithm works as follows.

(0) Let tnum(n) be the topological number of a node n under a canonical stratification T. Let marked(T), resp. unmarked(T) be the set of marked and unmarked nodes of a stratification T. Start with the root of AC, which has number 1, and mark it. Set i = 1.

(1) Define the set of renumber candidates of T, candidates(T), as the nodes in T whose ancestors in AC have been marked already and which are reachable by exactly one path from marked nodes. Those nodes are all direct successors of marked nodes which are not comparable plus those which are reachable from them on one path. Do the same for [T.sub.i].

candidates(T) and candidates([T.sub.i]) consist of the same number of nodes. Also the topological numbers of candidates(T) and candidates([T.sub.i]) must be the same, since

--the topological numberings of T and [T.sub.i] are consecutive and

--all nodes in T and [T.sub.i] fulfill the topsort condition: if ([n.sub.1], [n.sub.2]) [element of] AC then [n.sub.1] [is less than] [n.sub.2] which means in particular that

--all topological numbers of candidates must be greater than all numbers of marked nodes ([inverted]At [element of] {T, [T.sub.i]} : [inverted]Ac [element of] candidates(t), [inverted]Am [element of] marked(t) : [tnum.sub.t](c) [is greater than] [tnum.sub.t](m)) and

--all topological numbers of candidates must be smaller than all numbers of all other unmarked nodes ([inverted]At [element of] {T, [T.sub.i]}: [inverted]Ac [element of] candidates(t), [inverted]Am [element of] unmarked(t) -- candidates(t) : [tnum.sub.t](c) [is less than] [tnum.sub.t](m)).

(2) No matter how T has topsorted its set of renumber candidates candidates(T), the set of candidates candidates([T.sub.i]) can be renumbered such that the numbering of candidates(T) results (Figure 13). There are two cases.

[Figure 13 ILLUSTRATION OMITTED]

(2a) Swaps of nodes on the same distance level. Either candidates(T) as well as candidates([T.sub.i]) contains only elements which have distance 1 from marked nodes (i.e., they lie on the same level of distance from the marked nodes). Then all candidates are noncomparable: they are not adjacent to each other. In the stratification, they can be renumbered arbitrarily; the topsort condition is still fulfilled. Since they are noncomparable, they correspond to strata whose rules are not dependent on each other. This assures that the rules work on different parts of the host graph and do not create overlaps with each other in G. Hence, all direct derivations in G with rules from candidates([T.sub.i]) can be interchanged freely, and a swap of numbers for an arbitrary pair of nodes from candidates([T.sub.i]) preserves the normal form of the stratification.

(2b) Swaps of nodes on different levels. Let [AC.sub.candidates](T) be the subgraph of AC restricted to candidates(T). [AC.sub.candidates](T) (and then also [AC.sub.candidates]([T.sub.i])) may contain elements which have distance k [is greater than] 1 from marked nodes (i.e., there are different levels of distance from the marked nodes). Then they contain several candidates comparable to each other, namely those which lie on one path in [AC.sub.candidates](T), but lie on different levels. Comparable nodes, however, cannot be renumbered arbitrarily; the topsort condition must be ensured for swappings, i.e., the numbering may be "rotated" in the following way.

In this case, [AC.sub.candidates](T) is a forest of trees, and this forest can be grouped into levels, where each node n in a level is marked by the distance from the forest root nodes, dist(n) (Figure 14). Now, candidates(T) can be renumbered to a standardized form(5) in which every node on level k is has a smaller topsort number than every node on level k + 1:

[inverted]Ax, dist(x) = k, [inverted]Ay, dist(y) = k + 1: tnum(x) [is less than] tnum(y)

[Figure 14 ILLUSTRATION OMITTED]

Clearly, such a candidate set in normal form fulfils the topsort condition, i.e., belongs to a canonical stratification.

candidates(T) can be transformed to standardized form retaining the normal form of its underlying stratification. The algorithm applies the swap rule as long as it can be applied (Algorithm 15, Figure 14 upper step). To be correct, the swap rule in Figure 15 must be applied starting from lower levels toward the upper levels. Then the swap rule is topsort compatible, since smaller numbers are moved to lower levels. After one swap, the resulting numbering still fulfils the topsort condition for all comparable nodes on the paths of the swapped nodes. Every swap leads to another canonical stratification; and since the rules which are swapped are noncompatible, the swap retains normal forms. When the rule cannot be applied any more, a stratification is reached whose candidate set is in standardized form.

Fig. 15. An algorithm with a rule to swap noncomparable nodes in candidate sets.

Input: candidates(T). Output: New numbering in standardized form. maxdepth := max({dist(n)|n [element of] candidates(T)}); Levels [l.sub.k] := {n [element of] candidates(T)|dist(n) = k} from 1 ... maxdepth; for m = 1 to maxdepth-1 do change = TRUE; while change do change := FALSE; choose [n.sub.1] such that tnum([n.sub.1]) = max({tnum([l.sub.m])}); choose [n.sub.2] such that tnum([n.sub.2]) = min({tnum([l.sub.m]+1)}); if not comparable([n.sub.1], [n.sub.2]) and tnum([n.sub.1]) [is less than] tnum([n.sub.2]) then swap(tnum([n.sub.1]), tnum([n.sub.2])); change := TRUE; end end end

Standardized forms, however, can be transformed into each other by swaps of noncomparable nodes in each level (Figure 14 lower step), as in (2a). Since these swaps retain the normal forms of the stratifications, candidates(T) can be renumbered into any standardized form with the same normal forms for the underlying stratifications. In the same way, candidates([T.sub.i]) can be renumbered into any standardized form under retaining the normal form for the underlying stratifications.

In consequence, candidates([T.sub.i]) can be renumbered into candidates(T) without changing the normal form of [T.sub.i]. And additionally, candidates(T) can be renumbered into candidates([T.sub.i]) without changing the normal form of T (which is what we wanted).

As a whole, the entire renumbering process of (1) and (2) results in a stratification [T.sub.i]+1 which has the same normal form as [T.sub.i] and is more similar to T. After step (2), it is invariant that the subgraph of all marked nodes in [T.sub.i]+1 has the same topological order as the subgraph containing all marked nodes of T (compare to Figure 10).

(3) As the next step, mark all renumber candidates in candidates(T) and candidates([T.sub.i]) and empty both sets.

(4) If no unmarked nodes are left, stop. Otherwise increase i and go to step (1).

The algorithm terminates, since each node of AC is marked once. Since both T and [T.sub.1] refer to the same graph AC the algorithm reorders an arbitrary canonical stratification [T.sub.1] into another arbitrary canonical stratification T. Since all transformations preserve the normal form of [T.sub.1], [T.sub.2], ... [T.sub.n] = T, T must have the same normal form as [T.sub.1], and the lemma follows. []

In summary, two stratifications [T.sub.1] and [T.sub.2] may differ only in three aspects:

(1) how do they group rules to strata which have supporting dependencies?

(2) how to they order areas in the RDG that are not connected?

(3) how do the stratifications topologically sort subgraphs in the RDG with dangerous dependencies only?

An ordering of the RDG compatible to the stratification conditions may answer these questions differently. However, always the same normal form results, no matter how question (1) is answered (explained by lemma NORMAL TO CANONICAL STRATIFICATIONS) and how question (2) and (3) are answered (explained by lemma CANONICAL TO CANONICAL STRATIFICATIONS).

(Weak) stratification handles certain conflicts between rules and guarantees functional semantics for otherwise nonconfluent graph rewrite systems. This allows to specify more optimization problems deterministically with graph rewrite systems. The next section shows several examples, terminating as well as stratifiable graph rewrite systems for program optimization.

4. HOW TO USE GRAPH REWRITE SYSTEMS FOR OPTIMIZATION

Developing an optimization has to model the optimizer's data, to prepare as well as to perform global analysis, and to transform. This section demonstrates how these tasks can be specified with graph rewriting, with the example of lazy code motion [Knoop et al. 1994]. This optimization is a variant of the elimination of partial redundancies (also called busy code motion [Morel and Renvoise 1979]). Partial redundancy elimination removes repeated computations of the same value. To this end, it inserts computations higher up in the code so that their results can be reused later on in the program. Redundancy elimination is especially beneficial for array-intensive computations because it moves invariant address expressions out of loops.

Lazy code motion additionally tries to reduce expression life-times. First the same information as in busy code motion is computed: for each expression the earliest place in the code is determined (earliest information) for which the most reuses are possible and for which the most computations are saved, so that an expression is computationally optimal. Then those places further down in the code are determined, which still are computationally optimal, but yield a shortest register life-time for the computed values (latest information).

Figure 16 contains two loops (1) and (2) with two loop entry headers (LH1) and (LH2). First only expression x := a+b is regarded. Suppose it is a redundant expression in loop 2 and not invalidated in loop 1. Then the earliest place to move it would be (LH1) (indicated by the predicate EARLIEST(x)); the latest place would be (LH2), indicated by the predicate LATEST(x). Thus, busy code motion would move a+b out of loop body (2) up into LH1, while lazy code motion would move it into LH2. This results clearly in a shorter register life-time and a better register allocation.

[Figure 16 ILLUSTRATION OMITTED]

4.1 Designing the Data Model

Our running example relies on the data model in Figure 17. The diagram is a higraph, i.e., an entity-relationship model extended by single inheritance on nodes [Harel 1988]. Boxes denote object types; box inclusion denotes inheritance; diamond shapes depict n:m-relations, and simple lines 1:1-relations. Arrows are used to indicate the direction of the relations. Object types are used, which appear frequently in intermediate representations, such as blocks, statements, and expressions. Several of them are produced by the front-end and form the basis for the optimization process (Proc, Block, Assign, LoadIntConst, Plus, IntConst). Others are produced by the optimizer: AssReg and UseReg objects denote assignments and uses of (virtual) registers (Register); objects of type ExprClass denote syntactically equivalent expressions. Also several relations are produced by the front-end (Blocks, Struts, Exprs, Left, Right), while the others are the results of optimizer components.

[Figure 17 ILLUSTRATION OMITTED]

For the examples in this section, simple inheritance for node types is allowed, i.e., the node label space is hierarchical (tree-like). This was left out in Section 2.1.1 to simplify definitions. However, all results in this paper can be extended to this case without problems, because general labels are abbreviations for more specific labels: node sets with a more general label are partitioned into node sets with more specific labels, and in rule application, a rule l-node matches all graph nodes with label l or labels below l in the label tree. Thus a rule can be regarded simply as shorthand for a family of rules, one rule for each combination of labels.(6)

4.2 Expression Equivalence (Value Numbering)

The first step of lazy code motion is to assemble all intermediate expressions of a statement list from a procedure. COLLECT-EXPRESSIONS from Section 1.1 can be used for this purpose. The result is unique because COLLECT-EXPRESSIONS is an EARS without negation.

The second step of lazy code motion maps all syntactically equal expressions to each other. Figure 18 contains the core of a specification. In intermediate languages, two kinds of expressions can be distinguished: those without operand expressions (leaf expressions), and those with operands (nonleaf expressions). Among the former there are operators such as LoadIntConst,(7) among the latter there are operators such as Plus. Two expressions are equivalent, if they use the same objects and if their operands are equivalent. Then the expressions should be related in the relation eq. For LoadIntConst expressions, the constant object used must be equal (COMPUTE-EQUIVALENCE-1). For Plus expressions their operands must be equivalent (COMPUTE-EQUIVALENCE-2). For a complete specification similar rules for all subtypes of expressions have to be added.

[Figure 18 ILLUSTRATION OMITTED]

COMPUTE-EQUIVALENCE is an EARS without negation, and thus convergent [Assmann 1994]. A stratification groups all rules into one stratum. Since the global data-flow analysis can be performed on the classes of equivalent expressions--if define/use/kill information is taken into account--the information about equivalent expressions can be used to reduce the number of objects considered. Thus the next step should summarize all strongly connected components of the eq-relation into a set of objects, the expression classes (node domain ExprClass). This can be specified with an AGRS that allocates the objects, one for each partition, and an EARS which transitively links all reachable expressions. Based on this information global data-flow analysis can be performed.

4.3 Global Data-Flow Analysis

In this section the last two parts of the global analysis for lazy code motion are presented: the recursive equation system for isolation analysis and the nonrecursive equation system which determines the places of transformations at the end of blocks. The remainder of the lazy code motion analysis can be specified similarly.(8)

The data-flow information is represented with several relations. The basis of the analysis are the results from the first equation systems in Knoop [1994]. EARLIEST_IN/OUT denotes block entries/exits which are the earliest points in the program where an expression is computationally optimal. LATEST_IN/OUT denotes those block entries/exits where expressions have to be inserted to achieve the shortest register life-time and computational optimality. If code is inserted as indicated by the LATEST_IN/OUT relations, expressions are also inserted which have only one use in the same block of the insert. These expressions are called isolated and are marked by the isolation analysis with the relations ISOLATED_IN/OUT. Figure 16 contains a simple example: expression z = c+d is latest but also isolated at (LH2), as the expression is only used once. The figure indicates that the introduction of an intermediate value h2 does not save computations. Thus z = c+d should not be moved. Finally, starting from the isolation relations, the relations INSERT/REPLACE_IN/OUT are computed, which denote those block entries/exits where expressions are to be inserted or replaced.

Isolation analysis uses all-quantified variables. All-quantifiers in data-flow analysis can be mimicked with an ESGRS that approaches the greatest fixpoint of the ISOLATED_IN/OUT-relations, starting from the complete relations, subtracting edges step by step. Also, all-quantifiers can be eliminated with a transformation of the equations. This is possible because must-data-flow analyses on complete lattices have a dual may-analysis [Marlowe and Ryder 1990]. Approaching the greatest fixpoint in a must-problem can always be performed by dualizing the problem, approaching the smallest fixpoint in the corresponding may-problem, and dualizing the solution. Thus a relation SOCIAL_IN/OUT is introduced, which is dual to ISOLATED_IN/OUT. For this let b1, b2 [element of] Blocks.

ISOLATED-IN(b1) = EARLIEST_OUT(b1) [disjunction] ISOLATED-OUT(b1)

ISOLATED_OUT(b1) = [inverted]A b2 Block-succ(b1) :(EARLIEST_IN(b2) [disjunction] ??COMP_IN(b2) [conjunction] ISOLATED_IN(b2))

(dualizing) ?? (SOCIAL_IN(b1) = [negation]ISOLATED_IN(b1); SOCIAL_OUT(b1) = [negation]ISOLATED_OUT(b1))

[negation]SOCIAL_IN(b1) = EARLIEST_OUT(b1) [disjunction] [negation]SOCIAL_OUT(b1)

[negation]SOCIAL_OUT(b1) = [inverted]A b2 [element of] Block-succ(b1) :(EARLIEST_IN(b2) [disjunction] ??COMP_IN(b2) [conjunction] [negation]SOCIAL_IN(b2))

ISOLATED_IN (b1) = [negation]SOCIAL_IN (b1)

ISOLATED_OUT (b1) = [negation]SOCIAL_OUT (b1))

(negation elim.) ??

SOCIAL-IN(b1) = [negation](EARLIEST_OUT(b1) V [negation]SOCIAL_OUT(b1))

SOCIAL_OUT(b1) = [exist]b2 [element of] Block-succ(b1) : ??(EARLIEST_IN(b2) [disjunction] ??COMP_IN(b2) [conjunction] ??SOCIAL_IN(b2))

ISOLATED_IN (b1) = [negation]SOCIAL_IN(b1)

ISOLATED_OUT(b1) = [negation]SOCIAL_OUT(b1))

??

SOCIAL_IN(b1) = ??EARLIEST-OUT(b1) [conjunction] SOCIAL_OUT(b1)

SOCIAL_OUT(b1) = [exist]b2 [element of] Block-succ(b1): [negation]EARLIEST_IN(b2) [conjunction] (COMP_IN(b2) [disjunction] SOCIAL_IN(b2))

ISOLATED_IN(b1) = SOCIAL-IN (b1)

ISOLATED_OUT (b1) = [negation]SOCIAL_OUT(b1))

The last equation system is in a form that can be specified with EARS (Figure 19). Before the system can be evaluated, the computed relations have to be initialized appropriately. For instance, in our case exit-isolation must hold for exit-nodes of the control-flow graph. This is left out for simplicity: if the rules for initialization do not disturb convergence of the system, they may be specified together with the equations. Otherwise a stratification may be possible, or the initialization must be specified in a rule system that is executed before the main analysis.

[Figure 19 ILLUSTRATION OMITTED]

The rules denote the following. An expression is social at the entry of a block if it is not earliest and social at the exit of the block (LCM-ANALYSIS-1). If the expression is earliest at the exit of the block, its life-range begins there and is isolated at the entry of the block. In the second case, the expression is used in later blocks and cannot be isolated at the entry.

Rules LCM-ANALYSIS-2-4 carry these conditions over to the case for exits of blocks. An expression is social at the exit of a block Block:1, if a successor block Block:2 exists, which conforms to the following conditions. First, the expression must not be earliest at the entry of Block:2. Then again, its life-range would start at the entry of Block:2, and the expression might be isolated at the exit of Block: 1. Second, the expression must be either computed or social at the entry of Block:2 (denoted by relation COMP_SOC_IN). In the first case, the expression is live in Block: 2, i.e., its liveness extends to the exit of Block: 1. In the second case it is used later on. Thus socialness (and isolation) at the entry and exit of a block is expressed by recursive rules, defined over the successors of the block. This makes a fixpoint evaluation necessary.

Finally, rules LCM-ANALYSIS-5-7 compute the necessary relations for a transformation (only block entries are regarded). An expression must be inserted at the entry of a block, if it is latest and social there (LCM-ANALYSIS-6). An expression is to be replaced at the entry of a block if it is computed there (LCM-ANALYSIS-7) and not isolated-and-latest (LCM-ANALYSIS-5, relation ISO_LAT_IN). Thus expressions are only moved when they are social.

LCM-ANALYSIS uses negation, and stratification must be applied. The rule dependency graph of our example is given in Figure 20. A stratification consists of three strata of which the first is cyclic. One possible solution is [{LCM-ANALYSIS-1,..., LCM-ANALYSIS-4, LCM-ANALYSIS-6}, {LCM- ANALYSIS-5}, {LCM- AN ALYSIS- 7}].

[Figure 20 ILLUSTRATION OMITTED]

4.4 Transformation

Our examples are concluded by specifying some of the necessary transformation rules for lazy code motion (Figure 21). Only the parts of the transformation at the blocks entries are regarded. Also a relation LocallnsertIN is required, which denotes for each block entry the exact point in the statement list, where an expression of an expression class can be inserted. This relation must be computed with a local analysis and is required to prove life-time-optimality ([Knoop et al. 1994, Section 2.4]). In the sequel, these points are referred to, when it is spoken of an entry of a block.

[Figure 21 ILLUSTRATION OMITTED]

LCM-TRAFO-1 inserts an expression at the entry of a block. It creates a new statement which computes the value of the expression and stores it in a register. Edges INSERT_IN and LocalInsertIN are deleted in order to prevent LCMTRAFO-1 from being applied again. LCM-TRAFO-2 replaces a partially redundant expression at the entry of a block whose expression class is marked by an edge REPLACE_IN. The rule replaces the redundant expression with a UseReg expression which reuses the already computed value of the class from the corresponding register. To simplify presentation, it is assumed that only one expression is to be replaced per block.

LCM-TRAFO is edge-subtractive. Although its rules add items they do not produce a new subredex in the termination subgraph: LCM-TRAFO-1 subtracts the edges INSERT_IN and LocallnsertIN, LCM-TRAFO-2 the edge REPLACE_IN. Thus LCM-TRAFO terminates. LCM-TRAFO need not be strongly confluent, because LCM-TRAFO-2 deletes expressions. However, the rule dependency graph in Figure 22 can be weakly stratified with one stratum. Since self-dependencies are neglected for weak stratification, it must be asserted that LCM-TRAFO-2 does not form overlaps with itself. In deed, because each expression class may be handled separately in the data-flow analysis, all transformations can be carried out in parallel for each expression. Thus according to Section 3, LCM-TRAFO is weakly stratified-convergent, and the transformation yields one unique result.

[Figure 22 ILLUSTRATION OMITTED]

Table III summarizes the features of our graph rewrite system examples.

Table III. The Features of the Graph Rewrite System Examples.

example termination normal forms #strata COLLECT-EXPRESSIONS edge-accumulative convergent 1 COMPUTE-EQUIVALENCE edge-accumulative convergent 1 LCM-ANALYSIS edge-accumulative stratified- 3 convergent LCM-TRAFO edge-subtractive weakly 1 stratified- convergent

4.5 Usefulness and Limitations

Exhaustive graph rewrite systems can be used in the following areas:

(1) Preparation work for analysis and transformation: To be able to apply dataflow equations or transformation rules, a lot of preparatory work has to be done. Many of the algorithms collecting information, querying the intermediate representation, and solving reachability problems can be specified with our method.

(2) Global data-flow analysis: With EARS the classical distributive problems and control flow analysis, such as dominator analysis or dominance frontiers, can be specified. If an analysis is not distributive (e.g., alias analysis), stratification helps to structure a specification.

(3) Mark-transform transformations: As explained in Section 2.4, XGRS allow only a limited number of redexes in the graph. Hence those transformations can be specified which mark redexes in the intermediate representation (by analysis) and then perform the transformation once. Examples are code motion, deadcode elimination, or strength reduction.

On the other hand our method has several limits:

(1) Currently there is no methodology on how to specify general abstract interpretations with graph rewrite systems.

(2) In interprocedural analysis, instead of chaotic iteration special evaluation strategies must be used [Shark and Pnueli 1981; Knoop and Steffen 1992; Reps et al. 1995]. Currently these have to be modeled in the rewrite specifications explicitly, which should be transparent.

(3) Several optimizations can be specified with RGRS which are not exhaustive. Especially interesting here are syntactic transformations that do not need global analysis, such as peephole optimization, constant propagation with partial evaluation [Binkley 1994], or idiom recognition [Pinter and Pinter 1994]. These systems need specific termination proofs, and of course more general termination criteria should be investigated.

(4) Weakly stratifiable transformations that create overlaps do not yield unique normal forms.

(5) As general rule embedding is not allowed, a rule only matches a fixed number of nodes. Thus those transformations, which refer to an arbitrary set of nodes, cannot be specified.

5. METACODE GENERATION BY ORDER EVALUATION

In this section a generic algorithm to execute an XGRS, the order evaluation, is presented. The idea of the algorithm is similar to a nested-loop join in relational algebra: edge relations are joined to find all the redexes in a graph. However, in contrast to databases, order evaluation assumes a node-based, directed representation of the graph. This is required for an optimizer because it has to share data with front-ends and back-ends. In addition, the cost of the algorithm does not refer to the amount of edges in a relation but only to the maximal out-degree of a node under a certain relation. Thus it works best on sparse and width-limited graphs which are often encountered in program analysis. Section 5.4 gives a short overview of other evaluation methods.

5.1 Overview

Order evaluation is based on the notion that all redexes of a rule will be found, if all host graph nodes are considered which can be homomorphic to root nodes of left-hand sides. All other redexes are found by traversing neighbor sets, starting from the redex root nodes. To accomplish this, the order of a rule set has to be calculated, which is the maximal number of root nodes in left-hand sides. If a lefthand side does not have a root, an arbitrary node is chosen as an artificial root, normalizing the order to at least 1.

Definition (ORDER OF A RULE SET). Let S be a set of graph rewrite rules, r = (L, R) [element of] S. Define the signature sig(L) to the multiset of the root node labels in a left-hand side L. Define the order k(S) to be the maximum of the cardinalities of sig over all left-hand sides, normalized to 1:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Let G = (S, Z) [element of] STRGRS, and let [[S.sub.1],..., [S.sub.n]] be a stratification of S. Then the order of G is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

A rule equivalence class is a set of rules [r.sub.i] = {[r.sub.j] [element of] S|sig([L.sub.i]) [subset or equal to] sig([L.sub.j]) [disjunction] sig([L.sub.i]) [contains] sig([L.sub.j])}. The set of all equivalence classes is called [disjunction](S). sig is extended to rule equivalence classes [Nu] [element of] [disjunction] as follows: sig([Nu]) = sig([L.sup.']) : ([L.sup.'],[R.sup.']) [element of] [disjunction] [inverted]A [element of][disjunction]: sig(L) [subset or equal to] sig([L.sup.']). The maximal number of elements in a signature of an equivalence class [Nu] is called k([Nu]) =|sig([Nu])|. It holds [inverted]A [element of] V(S) : k([Nu]) [is less than or equal to] k(S).

k(S) or k(G) is abbreviated to k, if clear in the context. To find a redex in the host graph, for a left-hand side L an edge-disjoint path cover PC(L) has to be computed:

Definition (EDGE-DISJOINT PATH COVER). Let L E Z-Graph. An edge-disjoint path cover PC(L) is a tuple (PS, I, B) with the following components. PS is a set of paths in L, for which the following conditions hold:

--The paths from PS cover [L.sup.pos], i.e.,[inverted]Ae [EL.sup.pos][element of]PS: e [element of] [E.sub.p]

--The paths from PS intersect with each other only at beginning and end points: [inverted]A p1, p2 [element of] PS: [inverted]An [element of] [N.sub.p1][union] [N.sub.p2]: ([in-degree.sub.p2] (n) = 0 [disjunction] [out-degree.sub.p2 (n)] = 0). [[conjunction] (in-degree.sub.p2](n) = 0 [disjunction] [out-degree.sub.p2](n) = 0 ).

I = {n|[in-degree.sub.L](n)[is greater than]1} is the set of inner intersection points of PS. B = {n|[out-degree.sub.L](n) [is greater than] 1, [in-degree.sub.L](n) [is greater than] 0} is the set of inner branch points of PS.

A path p [element of] PS which starts at an inner branch point is called inner path. Also, the length of the longest path is called l; the number of paths is called p = |PS|. For referencing, the paths carry an arbitrary number (PS = {[P.sub.1],...,[P.sub.p]}).

Figure 23 contains an edge-disjoint path cover of LCM-TRAFO-2, containing two paths. Because the intersection node Block is a root of LCM-TRAFO-2, I = {ExprClass}.

[Figure 23 ILLUSTRATION OMITTED]

These definitions are used in the generic algorithm XGRS-ORDER in Figures 24 and 25. The algorithm is always generated for a set of rules S and relies on several generic parameters, which are constant for a concrete rule set. Hence some loops are marked by forall-const and can be unrolled in a concrete algorithm instance. Also, statements marked with let-const or if-const can be evaluated at generation time. In the generic algorithms, the neighbor set of a node x under relation l is denoted x.l.

Fig. 24. Generic algorithm XGRS-ORDER,

Input: Generic: rule set S with order k, rule equivalence classes V(S), path covers PC(L) = (PS, I, B) for all (L, R) [element of] S. Nongeneric: axiom Z = (N, E, nlab).

Output: Nongeneric: added relations.

Global: change: boolean;

/* Mark nodes as nondeleted */ forall-const [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] forall y [element of] [N.sub.i] do y.deleted := FALSE; /* Fixpoint loop: enumeration of redexes */ change := TRUE; 11: while change = TRUE do change := FALSE; 12: forall-const [Nu] [element of] [disjunction] (S) do let {l1,...,[l.sub.k([Nu]])} = sig([Nu]) in 13: forall ([x.sub.1],... ,[xk([Nu])]) [element of] (N.sub.l1]x...x[[N.sub.l.sub.k([Nu]]]) do /* Tests of rules with overlapping signature */ 14: forall-const r = (L, R) [element of] [Nu] do let-const [z.sub.1],...,[z.sub.k(L)]= project(sig(L), [x.sub.1],...,[x.sub.k([Nu]])) in /* Test rule r with roots [z1,...,[z.sub.k(L)]*/ RULETEST(r, [z1],...[z.sub.k(L)]); end /* Cleanup relations of graph (due to rule action (a2))*/ forall-const ([n.sub.1], [n.sub.2],l) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] m := nlabL(n.sub.1); forall [y.sub.1] [element of] [N.sub.m] do forall [y.sub.2] [element of] [y.sub.1].l do if [y.sub.2].deleted then [y.sub.1].l -= [y.sub.2];

Fig. 25. Generic procedure RULETEST in XGRS-ORDER.

/* Rule test for all redexes that can be reached from root nodes */ procedure RULETEST(r = (L, Fi), [x.sub.1],..., [X.sub.k(L)]) return BOOLEAN begin /* Cross product of all host graph paths induced by [P.sub.1],..., [P.sub.p] [element of] PS(PC(L)).*/ /* Traversal of the neighbor sets, starting from root nodes [x.sub.1],..., [X.sub.k(L)] */ 15: forall (([y.sub.11],...,[y.sub.1l]),...,([y.sub.p1],..., [y.sub.pl])) [element of] (([P.sub.1] x ... x [P.sub.p]) ([x.sub.1],...,[x.sub.k(L)])) do /* Test equality join conditions */ 16: forall-const n [element of] I(PC(L)), n [element of] [N.sub.pj], n [element of] [N.sub.pj] do a := index(n, [P.sub.i]); b := index(n, [P.sub.j]); if [Y.sub.ia] [is not equal to][y.sub.jb] then continue 15; end /*--A redex candidate found. Test rule application conditions. --*/ /* (t1) Test matching: it is only necessary to test whether node has been deleted already */ forall-const ([n.sub.i], [n.sub.2],l) [element of] [E.sub.pi] do a :=- index([n.sub.i], [P.sub.i]); b := index([n.sub.2], [P.sub.i]); if [y.sub.ia].deleted or [y.sub.ib].deleted then continue 15; end /* (t2) Test all negated rule edges */ forall-const e = ([n1],[n2],l) [element of] [E.sub.L], negated(e) -- TRUE, [n.sub.i] [element of] [N.sub.pi], [n.sub.2] [element of] N[N.sub.pj] do a := index([n.sub.i], [P.sub.i]); b := index([n.sub.2], [P.sub.j]); if [y.sub.jb] [element of] [y.sub.ia].l then continue 15; end /* (t3) Test whether all added edges are already there */ if [E.sup.add] [is not equal to] 0 begin AllEdgesThere := TRUE; forall-const ([n.sub.i],[n.sub.2], l) [element of] [E.sup.add](r), nlab([n.sub.1]), nlab([n.sub.2]) [element of] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [n.sub.i] [element of] [N.sub.pi],[n.sub.2] [element of] [N.sub.pj] do a := index([n.sub.i], [P.sub.i]); b := index ([n.sub.2], [P.sub.i]); if [y.sub.jb] [is not an element] [y.sub.ia].l then AllEdgesThere := FALSE; end if AllEdgesThere = TRUE then continue 15; /* Terminating relation is complete */ end /*-- All tests passed: redex found. */ /* (a1) Delete redex edges.*/ forall-const ([n.sub.1], [n.sub.2], l) [element of] [E.sup.del](r), [n.sub.1],[n.sub.2] [element of] [N.sub.pi] do a := index(n.sub.1, [P.sub.i]); b := index([n.sub.2], [P.sub.i]); [y.sub.ia].l -= [y.sub.ib]; end /* (a3) Delete nodes by invalidation.*/ forall-const [n.sub.i] [element of] [N.sup.del](r) do a := index(n.sub.i, P.sub.i); [y.sub.ia].deleted := TRUE; end /* (a4) Add nodes and memorize them in mapping newnodes. */ newnodes := 0; forall-const [n.sub.1] [element of] [N.sup.add](r) do l := n[lab.sub.R](n.sub.1); z := new(1); newnodes(n.sub.1) := z; [N.sub.1]+=z; end /* (a5) Add all edges of the rule */ forall-const ([n.sub.1], [n.sub.2], l) [element of] [E.sup.add](r) do if-const [n.sub.1] [element of] [N.sup.add](r) then [y.sub.1] := newnodes([n.sub.1]); else [n.sub.1] [element of] [N.sub.pi];a := index ([n.sub.1],[P.sub.i]); [y.sub.1] := [y.sub.ia]; end if-const [n.sub.2][element of] [N.sup.aad](r) then [y.sub.2] :=newnodes([n.sub.2]); else [n.sub.2] [element of] [N.sub.pj];b := index ([n.sub.2],[P.sub.j]); [y.sub.2] :=[Y.sub.jb]; end [y.sub.1].l += [y.sub.2]; end change := TRUE; end end

XGRS-ORDER consists of the following parts. First, all deletable host graph nodes are marked as valid. Then all permutations of possible redex root nodes are enumerated. All rules of a rule equivalence class [Nu] [element of] V(S) are handled together (loop 13 and 14). For one single permutation of root nodes and for one rule r = (L, R), algorithm RULETEST reduces all redexes which are in the host graph currently. RULETEST is called with a set of rule-specific redex root nodes ([z.sub.1],..., [z.sub.k](L)) which are projected by a function project from the equivalence-class-specific set ([x.sub.1],...,[x.sub.k(v)]). The projection consists of a simple algorithm which associates with each label I [element of] sig(L) an l-node from {[x.sub.1],...,[x.sub.k(v)]) such that the signature of L is covered. Since this depends on rule features only, the projection function as well as loop 14 can be fully expanded by the generator. Lastly, the rule tests have to be embedded into a fixpoint loop (loop 11). After the normal form is reached, the host graph relations must be cleaned up, i.e., edges to deleted nodes have to be deleted.

In RULETEST, for one single permutation of root nodes and for one rule, all reachable redexes are found, traversing neighbor sets of nodes with nested loops. In essence, RULETEST performs a nested-loop join on the host graph, induced by the edge-disjoint path cover PC(L) = (PS, I, B) of the rule's left-hand side L. The paths in PS indicate which host graph paths have to be enumerated for the nested-loop join, i.e., each path p in PS denotes a path expression of a single-source path expression problem on the host graph [Tarjan 1981]. It can easily be seen that the set of all paths of a single-source path expression problem can be calculated by a nesting of loops. This nesting has to enumerate all nodes which are reachable under the set of relations contained in the path expression. Figure 26 shows a loop nesting induced by the path 1 of the edge-disjoint path cover for LCM-TRAFO-2. When applied to an appropriate host graph node, in the innermost loop all relevant host graph nodes are enumerated which are reachable under the path expression problem induced by the path.

Fig. 26. Above: Code skeletons for the loop nestings that solve the two path expression problems induced by LCM-TRAFO-2. Middle: Loops for path expression problem 2 nested into loops for problem 1. Down: Alternative nesting yielding the same result..

/* Loop nesting to enumerate path 1 */ forall [exprclass.sub.1] [element of] block.REPLACE-IN do forall register [element of] [exprclass.sub.1].InRegister do /* .. */ /* Loop nesting to enumerate path 2 */ forall strut [element of] block.Stmts do forall expr [element of] stmt.Exprs do forall [exprclass.sub.2] [element of] expr.Computes do /* .. */ /* -- Loop nesting 2 nested in loop nesting 1 -- */ /* Loop nesting to enumerate path 1 */ forall [exprclass.sub.1] [element of] block. REPLACE_IN do forall register [element of] [exprclass.sub.1].InRegister do /* Loop nesting to enumerate path 2 */ forall stmt [element of] block. Struts do forall expr [element of] stmt.Exprs do forall [exprclass.sub.2] [element of] expr.Computes do /* -- Test equality join condition -- */ if [exprclass.sub.1] [is not equal to] [exprclass.sub.2] then continue; /* -- Loop nesting 1 nested in loop nesting 2 -- */ /* Loop nesting to enumerate path 2 */ forall stmt [element of] block.Struts do forall expr [element of] stmt.Exprs do forall [exprclass.sub.2] [element of] expr.Computes do /* Loop nesting to enumerate path 1 */ forall [exprclass.sub.1] [element of] block. REPLACE_IN do forall register [element of] [exprclass.sub.1]. InRegister do /* -- Test equality join condition -- */ if [exprclass.sub.1] [is not equal to] [exprclass.sub.2] then continue ;

In order to enumerate all redex nodes for a rule r = (L, R) correctly, the path problems have to be nested again. This is expressed in loop 15 of RULETEST as a cross-product of path expression problems on k(L) root nodes:

forall (([y.sub.11], ..., [y.sub.11]), ..., ([y.sub.p1], ..., [y.sub.p1)) [element of] (([P.sub.1] x ... x [P.sub.p])([X.sub.1], ..., [X.sub.k(L)])) do

The algorithm works, no matter how the path expression problems are nested; the only condition is that path expression problems of inner paths have to be nested inside loops that correspond to their beginning nodes. Since a beginning point of an inner path is also a branch point of L, each node instance of the branch point is a starting point for the path expression problem of the inner path.

Then the inner intersection points from I indicate which enumerated host graph nodes in the host graph paths have to be tested on equality. This ensures that host graph nodes which instantiate inner intersection points are enumerated by all path expressions problems. In summary, RULETEST computes the cross product over all host graph nodes in all relevant host graph paths (loop 15), and tests equality on certain enumerated nodes, as indicated by I (loop 16).

For instance, the paths from Figure 23 induce a nested-loop join in the host graph with the join condition that host graph nodes which are enumerated for rule node ExprClass are identical. To this end, all nodes have to be cross-enumerated that belong to all host graph paths induced by the two paths of the path cover (Figure 26). Equality need not be tested for host graph nodes which correspond to redex root nodes, because they automatically belong to all host graph paths.

As soon as a valid set of host graph nodes is under consideration in RULETEST, for each permutation of nodes the rule application conditions (t1)-(t3) are tested. The occurrence g is modeled with the path index function and the path indices of the path cover. As host graph paths are enumerated according to the path cover, redex edges are tested by the enumeration.

If a redex has been found, the rule actions (a1)-(a5), except (a2), are executed. New nodes are memorized in a mapping newnodes (a4) so that new edges to new nodes can be entered (a5). This mapping models the extended occurrence g' and can be expanded to symbolic node names in a concrete algorithm. Deletion of hidden edges needs special handling: if rule nodes have to be deleted, the corresponding redex nodes are invalidated (a3), so that they are not considered anymore (t1).

Suppose that LCM-TRAFO-2 was the only rule of LCM-TRAFO. Then from XGRS-ORDER and the edge-disjoint path cover from Figure 23 the code in Figure 27 can be generated. In this simple case there is one edge-subtractive stratum with order 1 and one rule equivalence class. According to the following section, a fixpoint evaluation loop need not be generated. Loop 110 coalesces loops (12-14) from XGRS-ORDER. The outer loops and statements 111 and 112 enumerate the host graph path of path 1, the inner loops and statements 113-115 that of path 2. The host graph paths are joined under the equality join condition [exprclass.sub.1] = [exprclass.sub.2] in order to find the intersection of their enumerated objects. When LCM-TRAFO-1 is added to the system, a fixpoint loop has to be added, since its RDG is cyclic. Its rule test procedure can be called in a second loop over the blocks, generated after loop 110 in Figure 27.

Fig. 27. Order evaluation for LCM-TRAFO-2.

Input: Node domains Block, Strut, Expr, ExprClass, Register. Relations Stmts, Exprs, REPLACE_IN, Computes, InRegister. Output: Modified intermediate code: expressions replaced by UseReg expressions. /* Marking expressions as nondeleted */ forall expr [element of] Expr do expr.deleted := FALSE; /* Fixpoint loop 11 is not necessary, since only one acyclic stratum */ /* Enumeration of redexes (loop (12-14) are collapsed) */ 110: forall block [element of] Block do RULETEST_LCM_TRA FO-2 (block); /* Clean up relations */ forall stmt [element of] Struts do forall expr [element of] stmt.Exprs do /* (a2) Delete all invalid edges. */ if expr.deleted = TRUE then stmt.Exprs -= expr; procedure RULETEST_LCM_TRAFO-2 (block:Block) /* Enumerate path 1 */ 111: forall [exprclass.sub.1] [element of] block. REPLACE_IN do 112: register := [exprclass.sub.1].InRegister; /* No loop necessary, 1:1-relation */ /* Enumerate path 2 */ 113: forall stmt [element of] block.Stmts do 114: forall expr [element of] stmt.Exprs do 115: [exprclass.sub.2] := expr.Computes; /* 1:1-relation */ /* -- Test equality join condition -- */ if [exprclass.sub.1] [is not equal to] [exprclass.sub.2] then continue ; /* -- Redex candidate found. Investigate. -- */ /* (t1) Test matching: test on deleted nodes */ if expr.deleted = TRUE then continue ; /* (t2) There are no negated rule edges */ /* (t3) There are no edges added to nodes with nonadded labels */ /* -- Redex found. Do the transformation -- */ /* (a1) Delete redex edges. */ block.REPLACE-IN -= [exprclass.sub.1]; expr.Computes := NIL; stmt.Exprs -= expr; /* (a3) Delete nodes */ expr.deleted := TRUE; /* (a4) Add nodes */ usereg := new(UseReg); /* (a5) Add all edges of the rule */ stmt.Exprs += usereg; usereg.UsedReg := register; end return TRUE; end

5.2 The Cost of XGRS-ORDER

The following theorem deals with the cost of an algorithm generated from XGRS-ORDER.

Theorem (EVALUATION OF XGRS WITH XGRS-ORDER). Let G = (S, Z) [element of] XGRS with order k. Let l be the length of the longest path of an edge-disjoint path cover over all left-hand sides, p the maximal number of paths in a path cover, n the maximal number of nodes in a node domain with an arbitrary label, e the maximal out-degree of a node concerning an arbitrary edge label. The addition and deletion of an edge shall be performed in constant time.

If G [element of] XGRS has one stratum, it can be evaluated with an algorithm generated from XGRS-ORDER in O([n.sup.k+2][e.sup.lP]).

If G [element of] XGRS has one noncyclic stratum, it can be evaluated with a algorithm generated from XGRS-ORDER in O([n.sup.k][e.sup.lP]).

If G [element of] STRGRS it can be evaluated with the maximal cost of one of its strata.

PROOF. First assume that 6 has one stratum.

(1) Termination. If a fixpoint loop of XGRS-ORDER does not change the host graph, the algorithm stops.

(1a) G [element of] AGRS. Then each loop adds at least one edge to the termination-subgraph T in action (a5). When this is complete, the relation completeness condition (t3) is always false, and the algorithm terminates.

(1b) 6 [element of] SGRS. Then either action (a1) or (a2) will delete items from T. When it is empty, (t1) will not be true anymore, and the algorithm will terminate.

(2) Correctness. XGRS-ORDER enumerates all permutations of root nodes of possible redexes. For one single permutation of root nodes, RULETEST will find all redexes: starting from the root nodes, it traverses all neighbor sets, using the host graph paths which are induced by the edge-disjoint path cover. Thus one fixpoint loop of XGRS-ORDER will find all present redexes in the host graph. New redexes are handled in the next round. If G is convergent, XGRS-ORDER will compute the unique normal form; otherwise it will select an arbitrary solution, depending how the code of the concrete algorithm is generated.

(3) Cost of one fixpoint iteration.

(3a) Redex root node permutations. The enumeration of all redex root node permutations costs O([n.sup.k]): Loop 13 enumerates the cross product of k node domains, and k is chosen maximally for all rule equivalence classes v [element of] [subset] S. Loop 12 and 14 handle all rules of G exactly once. Both can be unrolled in a concrete algorithm.

(3b) Rule test. The cost for enumerating all redex nodes to a given set of root nodes is the cost of all nested path expression problems. This is the cost of a loop nesting which nests all loops that enumerate host graph paths induced by a path of the path cover (see Figure 26). Each node has at most e neighbors. Each loop nesting that enumerates the paths of a path expression problem has at most nesting depth l, because a path has maximal length l. Since p loops for p path expression problems have to be nested, this costs O([e.sup.lP]). The actions which have to be performed for rule tests of an XGRS all have constant effort with regard to a given set of redex nodes: the actions for the test of a single redex, namely rule test actions (t1)-(t3), the addition of new items, and the deletion of redex items ((a1), (a3)-(a5)), only depend on the form of the rule. Thus all relevant loops can be unrolled by an optimizer generator and yield constant cost in a generated algorithm.

The main difficulty is the deletion of hidden edges linking the redex and the rest graph (a2). This is not a problem if the host graph is implemented bidirectionally. Otherwise, the sources of incoming hidden edges must be looked up in the graph, which would mean additional cost O(n) for each incoming hidden edge. XGRS-ORDER reduces this to a constant factor per redex test: it marks a node invalid which must be deleted (a3) and tests during the rule test whether an invalid node must be neglected (t1). Hence the cost of RULETEST is O([e.sup.lP]).

(3c) Cost of initialization and cleaning up. At be beginning of the algorithm all nodes which may be deleted have to be marked valid (O(n)). At the end of the rewrite process the graph must be cleaned up, i.e., all invalid edges are finally deleted. To this end, all node domains and relations of deletable edges have to be traversed once (O(ne)). Deleted nodes may either be garbage collected or deallocated with a loop over all corresponding node domains (O(n)).

(4) Cost of fixpoint. If G [element of] AGRS with one stratum, each round of a fixpoint rule may add one edge to the termination-subgraph. Thus the fixpoint evaluation enlarges the costs with a factor O([n.sup.2]). If G [element of] SGRS with one stratum, each round of a fixpoint rule must subtract one item from the termination-subgraph. Thus the fixpoint evaluation enlarges the costs with a factor O([n.sup.2]).

For an AGRS which consists of one acyclic stratum, the fixpoint can be reached in one iteration. The RDG of such a system is acyclic and can be ordered depthfirst. When loop 13 and 14 are exchanged, code may be generated for all r [element of] v along the bottom-up ordering of the RDG.(9) This does not change the cost of the algorithm, because loop 14 is expanded in a concrete algorithm. Because the RDG is acyclic and the rules are applied in depth-first order, rules that appear further up in the RDG will never create redexes for rules that appear lower down in the RDG. Thus a new fixpoint iteration will not find new redexes.

For an SGRS with one acyclic stratum, the same argument holds. When its RDG is ordered depth-first and the system is evaluated along this ordering, each round of RULETEST eliminates all redexes of the current rule from the host graph. This is due to the fact that each rewrite step must delete some items from the termination subgraph, and is not allowed to add something to it. This implies that every reducible redex must have a subredex which is already part of the axiom's termination subgraph. Thus new redexes for other rules may be created, but as no new subredexes are created within the termination subgraph, new redexes for the rule itself are not created. Hence one application of RULETEST reduces all possible redexes of a rule, and a new fixpoint iteration will not find new redexes. After all rules have been handled in depth-first order of the RDG, evaluation stops.

If the graph rewrite system is stratifiable, its cost is the maximum of the cost of all strata.

The order of a graph rewrite system relies on the directions of the relations in the data model. Choosing different directions may result in a different order. If all relations are represented with bidirectional graphs, the order is always the maximal number of nonconnected components in a left-hand side, because bidirectional graphs do not have roots.

XGRS-ORDER can be used for graph rewrite systems because invariant embedding is used: items are modified only in the context of the tested and added nodes, and special care is taken of the deletion of hidden edges. Allowing general embedding [Nagl 1979] complicates the rule test and leads to context-sensitive search during rule application. This enlarges the cost for the evaluation of a graph rewrite system significantly.

Other enumeration strategies. In practice it can be important how redexes in the host graphs are enumerated (loop 15 in RULETEST). The following variants can be imagined; their details go beyond the scope of this paper:

--When join conditions axe tested as early as possible during host graph path enumerations, parts of the search space can be cut off safely. As the placement of join condition tests depends on the arrangement of the nesting, a clever arrangement strategy can be used to generate faster code.

--Enumeration of host graph paths may be replicated in order to cut down search space.

--When additional pattern-matching constraints on nodes are allowed, these can be used for cutting down the search space.

5.3 Order Evaluation for Our Examples

For small rewrite systems, XGRS-ORDER often results in linear or quadratic algorithms, since many relations have a fixed cardinality, are sparse, or are partitioned over some object domains. COLLECT-EXPRESSIONS has order 1. It is an EARS with one acyclic stratum. Because it adds only one edge and has one path in its left-hand side, the XGRS-ORDER algorithm collapses to a simple nested-loop over procedures, blocks, and statements (Figure 2). According to the evaluation theorem O(/Proc/[e.sup.3]) = O([Proc//Block//Stmt//Expr/) would result. However, because expressions are partitioned by procedures, blocks, and statements, O(/Expr/) results. For the same reason the cost of CREATE-REGISTERS is O(/Expr/) instead of O(/Proc//Expr/).

COMPUTE-EQUIVALENCE has order 2, because both rules' left-hand sides contain two roots. Although it has a cyclic stratum, a theorem in Assmann [1995] shows that because it works with expression trees the fixpoint computation can be avoided. Because Left and Right are 1:l-relations, 1 = 1 and p = 4 hold. Thus the cost O(/Expr/[sup.2][e.sup.4]) results. However, e is here much smaller than /Expr/ because the relation eq partitions Expr. If it is regarded to be constant, XGRS-ORDER results in a quadratic algorithm in the number of expressions.

LCM-ANALYSIS has order 1, because Block is the only root node domain of all rules. Because it has a cyclic stratum, fixpoint computation must be applied. Because l = 2, p = 2, the cost O(/{Block,ExprClass}/[sup.3][e.sup.4]) results. Additionally, if EARS have a form like LCM-ANALYSIS and relations are represented by bit-vectors, the standard round-robin iteration can be derived from XGRS-ORDER [Assmann 1994; 1995].

LCM-TRAFO is edge-subtractive. Because k = 1, l = 3, and p = 3, it would have cost O(/Block/[sup.1][e.sup.9]). Computes, Stmts-next, and InRegister are 1:1-relations. Since expressions are partitioned over the blocks, the generated algorithm has cost O(/Expr//ExprClass/).

Order evaluation may be used to solve all exhaustive graph rewrite systems. For convergent systems it delivers the unique normal form. For nondeterministic systems it delivers a correct, but arbitrarily selected result. Thus analyses and transformations may not only be specified, but also executed uniformly.

5.4 Evaluation with Other Methods

For a rewrite system with unique normal forms, an evaluation algorithm is allowed to interchange rule applications arbitrarily. This is the case for EARS, strata in stratified XGRS, but also for Datalog [Ceri et al. 1989b]. Hence the evaluation methods for Datalog can be used for stratifiable XGRS.

Several Datalog evaluation algorithms have been developed. The fundamental methods are bottom-up (forward-chaining, exhaustive) and top-down evaluation (backward-chaining). The former computes all solutions at once, while the later works incrementally and computes only one solution. In graph rewriting, the top-down evaluation corresponds to testing whether a single graph edge can be derived. Bottom-up evaluation corresponds to the construction of all derivable subgraphs and edges, i.e., to reduce to a normal form. Because this strategy has been used throughout the paper, order evaluation is a forward-chaining algorithm. However, depending on the context of a specification, each evaluation method has its advantages.

Order evaluation works naively, i.e., reconsiders old redexes in a new round of the fixpoint loop again. In Datalog, seminaive evaluation memorizes which parts of the database are old and considers only new parts for new rounds. Also tabling methods are prominent (OLDT resolution, query-subquery evaluation), because they reuse partial solutions to cut down the search space [Ceri et al. 1989b]. Also rule sets can be transformed with the magic set transformation which has the effect that tuples are preselected from the database before joins are performed. In graph rewriting, this corresponds to narrowing down the node domains of the host graph.

Since the main prerequisites for these techniques are termination and independence of rule applications, all of them can be used for exhaustive stratifiable graph rewrite systems. Order evaluation works best if the host graphs are sparse; otherwise a Datalog evaluation method may be more apt and more efficient. In summary, the specification method is orthogonal to the evaluation algorithm: a user can instruct the generator to alter the algorithm without changing the specification.

6. PRACTICAL EXPERIENCE

The last section is dedicated to practical experiences with the specification method. To show the validity of the approach, the optimizer generator OPTIMIX has been implemented [Assmann 1995; 1996]. OPTIMIX takes graph rewrite system specifications in textual form and generates order evaluation procedures in plain C, which can be embedded in compilers. The tool has been used to generate several components of a Modula-2 lazy code motion optimizer, using variants of the presented specifications. The generated optimizer parts work on the intermediate language CCMIR(10) for C, Fortran-90, Modula-2, and its parallel dialect Modula-P [Vollmer et al. 1993; Vollmer and Hoffart 1992].

The first important point is the size of the specifications. Table IV shows the approximate number of rules for several examples. Typically, a specification of an analysis has to handle a lot of cases and needs more rules than that of a transformation. Also, the considered analyses are complex. For instance, lazy code motion data-flow analysis contains seven equation systems.

Table IV. Number of Rules in Some Example Specifications.

component approx, number of rules expression equivalence classes 30 data-flow analysis reaching definitions 20 data-flow analysis live copies 20 data-flow analysis lazy code motion 40 transformation copy propagation 5 kernel of lazy code motion transformation 5

Also the relation of generated and hand-written parts in the optimizers is interesting. Because OPTIMIX still has a lot of restrictions, many parts of the optimizer are hand-written. However, it is estimated that with an industrial-strength implementation of the generator 70-80% of an optimizer can be generated. Due to the fact that specifications are very compact, the development time of an optimizer should be greatly reduced. Equation systems, such as LCM-ANALYSIS, can be typed in in a few hours. Also the generated optimizer algorithms can be validated quickly, supposed that the generator generates correct code. Thus it is estimated that about 50% savings in development time are possible.

6.1 Optimization Speed

To test the compilation speed of the generated components, the runtime of the Modula-2 compiler is compared with and without optimization. For this test the Stanford benchmark from Henessy and Nye was compiled, resulting in Figure 28. Column (1) depicts the runtime without optimizer. Column (2) adds the time of the analysis except global data-flow analysis and transformation. This adds factor 5 to the runtime of the compiler. Column (3) additionally contains the global data-flow analysis (factor 6). Column (4) contains the time with all generated components. The compiler slows down about factor 9. For the transformation phase there is an alternative implementation by hand, which results in column (5). This phase is not much faster than the generated transformation phase, since most of the time is spent in the generated analysis components. If this is compared to gcc -04 (6) and sun-cc -03 (7), running on an equivalent C program, they are much faster while performing more optimizations. Nevertheless, for an optimizer with generated parts the results are encouraging.

[Figure 28 ILLUSTRATION OMITTED]

6.2 The Speed of Expression Equivalence

Apparently the optimizer spents a lot of time in program analysis. This is due to the expression equivalence algorithm. Most of the procedures in the benchmark translate quickly, however, there are two which contain up to 1400 expressions, and these slow down the expression equivalence algorithm enormously. Compared to that the data-flow analysis is reasonably fast, also for large procedures. This is assured by Figure 29 in which several compilations of procedures with a different number of expressions are shown. Curve gen-1cm depicts the compilation time with all generated components, hand-1cm that with hand-written transformation phase, and no-trafo everything except transformation. If additionally the global data-flow analysis is skipped, curve no-dfa results. In all cases global data-flow analysis is fast.

[Figure 29 ILLUSTRATION OMITTED]

Apparently order evaluation is at least quadratic for COMPUTE-EQUIVALENCE because it has to compare expressions pairwise (due to the two root nodes in COMPUTE-EQUIVALENCE-1/2). It can be shown that an index structure can be used to speed it up [Assmann 1994]. The index can be applied, since OPTIMIX allows to annotate integer attributes to nodes. Because the constant objects of type IntConst and the LoadIntConst-nodes are in 1.1-relation, the integer objects can be modeled as attribute values of the LoadIntConst-nodes. For such attributes, OPTIMIX offers to construct indices, i.e., mappings between constant values and nodes. In our example, this index can be used for a LoadIntConst-node to look up other LoadIntConst-nodes with the same integer attribute. Hence the index simulates virtual edges between the two root nodes of COMPUTE-EQUIVALENCE-1/2 and reduces COMPUTE-EQUIVALENCE to order 1. The index-based algorithm is compared to order evaluation in Figure 30 (curve index-expr vs. curve order-expr). It turns out to be faster for medium-sized procedures. The algorithm uses a hash table as the index data structure. Its hash function plays an important role: using a inappropriate function may slow the whole compiler down about factor 2. Thus it is important to apply index structures as well as to tune their performance.

[Figure 30 ILLUSTRATION OMITTED]

Apart from that there are other metaoptimizations for COMPUTE-EQUIVALENCE. Currently, the generator uses a fixpoint evaluation algorithm because the system is recursive. Instead it could exploit the fact that the equivalence for expressions can be computed bottom-up in one pass because the relations Left and Right are tree-shaped [Assmann 1995]. Additionally, the semantic knowledge can be used that the constructed relation eq is an equivalence relation. With this knowledge the standard value-numbering algorithm can be generated (O(/Expr//ExpzClass/)) which walks over all expressions linearly and hashes them to their equivalence class [Chow et al. 1996; Assmann 1994; 1995]. Since normally the number of expression classes is small compared to the number of expressions, an almost linear behavior can be achieved.

6.3 The Speed of the Transformation

In our example, the run-time difference of the hand-written and the generated transformation is marginal. Actually, for this example it is very important how loop 15 in algorithm XGRS-ORDER is arranged, i.e., in which order the paths of a path cover are joined with each other. Compared to our other examples, the rule left-hand sides of the transformation are rather complex. In such cases, users can instruct OPTIMIX--with a simple annotation to the graph rewrite rules--to join several paths before others, so that the search space is minimized. The description of these annotations lies beyond the scope of this paper, and they can be looked up in the manual of OPTIMIX [Assmann 1996]. As an alternative, a cost-based query optimizer could be built which infers the optimal arrangement of the loop nestings. This technique is the standard technique in databases to speed up nested-loop joins.

These experiments indicate that other metaoptimization techniques from Datalog and databases should also be applied to OPTIMIX. Indices are important to speed up query evaluation. Advanced query optimizers can improve the arrangement of path enumerations in the RULETEST algorithm, to speed up the nested-loop join. Finally, a specific structure of the database can be exploited to apply faster evaluation strategies. To conclude, an industrial-strength tool which uses semantic knowledge, alterative evaluation strategies, and clever metaoptimizations, could very well generate optimizer parts which reach the velocity of hand-written ones.

7. RELATED WORK

Termination of graph rewrite systems was first tackled in [Plump 1995]. However, the method of forward closures relies on the features of a derivation, and cannot be proved regarding the rules alone. [Plump 1993b; 1993a] treat confluence of graph rewrite systems, but again based on criteria of the derivations. Also, confluence alone is not sufficient, since may optimization problems are inherently nonconfluent. Stratification goes beyond that: it can handle many overlaps automatically and yields normal forms which are rather natural.

Stratification was invented for Datalog?? to handle negation. In the meantime a lot of other methods have been developed [Ullman 1994]. It is an interesting question which of them can be carried over to graph rewrite systems. The idea that Datalog can been used to describe data-flow analysis has been discovered independently by Reps [1994] and Assmann [1994]. Also top-down evaluation of horn-clause logic can be used to specify program analysis [Dawson et al. 1996]. However, because our work is constrained to binary predicates, a special subclass of graph rewrite systems, EARS, result. These can be extended very easily to specify general transformations.

The algorithmic approach to graph rewrite systems is the most similar to ours. The language PROGRES is its most advanced representative [Schurr et al. 1995; Zurndorf 1995]. PROGRES provides an excellent user interface and interactive programming environment but is not designed for batch processing. Currently it does not allow for fixpoint computations; it is tied to an underlying database, and does not provide mechanisms for rule overlapped execution, which is indispensable for program optimization. UBS systems provide a subclass of algorithmic graph rewrite systems which can be handled more efficiently [Dorr 1995]. The major difference is that UBS systems memorize partially matched redexes during rewriting, whereas our work avoids memoization and tries to improve on the search strategy. To our knowledge the technique has not yet been applied to program optimizations, and future work should analyze in which cases which strategy leads to faster algorithms.

Attribute grammar systems such as FNC-2 [Jourdan and Parigot 1991], the Synthesizer Generator [Reps and Teitelbaum 1988], or AG [Grosch 1992] can help the user to specify program analysis, especially when they allow fixpoint evaluations on cyclic attribute dependencies [Jones 1989]. Higher-order attribute grammars also allow program transformations [Vogt et al. 1989], but produce a new tree while traversing an old one. Furthermore, attribute grammars are a tree-based paradigm, so that the handling of arbitrary graphs is not possible.

Several other systems have used relational specifications to describe semantical analysis. Horwitz and Teitelbaum [1986] describe a system that combines relations with attribute grammars. The system's specification language is based on the relational calculus and is able to express simple data-flow analysis. PSG [Bahlke and Snelting 1986] uses so-called context-relations to represent semantic conditions during incremental semantical analysis. When these relations are joined, better typing constraints for variables result, and users can be informed of programming errors early in the editing process. Several other works have used relations together with logic deduction to generate evaluators for noninteractive semantical analysis [Odersky 1993]. It seems that these systems have not yet been applied to program analysis.

Several other tools are known which can generate program analyses. Sharlit [Tjiang and Henessy 1992] and PAG [Alt and Martin 1995] generate global dataflow analyses from lattice-based specifications. With Sharlit users have to supply C code for the lattice elements and flow functions, so that specifications are not fully abstract. Both tools do not aid in the preparation of the global analysis. SPARE [Venkatesh 1989] follows the same approach, but is tied to the Synthesizer Generator. Steffen shows how data-flow analysis algorithms can be generated from modal logic specifications [Steffen 1991]. Although the powerful modal operators allow very short specifications, an application in a real-life compiler is not yet known. All these tools allow for the specification of more complex lattices and flow functions, but our method is more intuitive for the average programmer, as it relies on the familiar concept of graphs.

Only few approaches are known which integrate analyses and transformations. SPECIFY [Kock 1992] additionally provides a proof language but has never been implemented completely. GENESIS [Whitfield and Soffa 1991] provided one of the starting points of this work. It allows to specify preconditions in a variant of first-order logic and supports transformation specifications. However, because fixpoint computations cannot be specified, generation of data-flow analysis is not possible. Also the intermediate language is fixed, and the code generation scheme is ad hoc. Because our method is based on graph rewrite systems and Datalog, it provides a more solid basis for code generation schemes.

Although term rewriting can be used for program optimization purposes, there are only few works on it. Field applies it to program slicing [Field et al. 1995], an analysis technique which filters out code areas on which a particular instruction depends. However, general criteria for termination or normal forms are not given. Also implementations of compiler generators for algebraic specifications have been based on term rewriting [Bergstra et al. 1989]. In this work term rewriting is applied to several phases of compilations, but to our knowledge not yet to optimization.

Term graph rewriting is a technique to implement term rewriting by graph rewriting on shared subterms [Sleep et al. 1993; Kennaway et al. 1994]. A term graph is a graph with a single root which models a term by folding common subtrees. Term graph rewriting is mainly used for the implementation of functional programming languages, and elaborate tools exist [Van Eekelen and Plasmeijer 1995]. A good theoretical work, which is somewhat orthogonal to ours, is [Ariola and Arvind 1993]. It defines a criterion for strong confluence, and investigates semantic features of a graph rewrite system in order to prove the equivalence of two graphs. However, it seems that term graph rewriting has not yet been applied to program optimization.

8. CONCLUSION

The main contributions of the research described in this paper are the following. First several new rule-based termination criteria for graph rewrite systems have been developed which define exhaustive graph rewrite systems. To handle terminating, but nondeterministic systems, stratification for graph rewrite systems has been defined. Using examples from lazy code motion it has been demonstrated that stratified graph rewrite systems are apt to specify important parts of classical program optimizations. Next, a generic evaluation algorithm for all exhaustive graph rewrite systems has been presented, which often leads to linear and quadratic concrete algorithms. It has been shown that our method works in practice by giving some numbers of a concrete implementation of lazy code motion. Thus this paper proposes a new method for the specification and evaluation of program analyses and transformations.

Using graph rewrite systems, a uniform view results on local analysis, global analysis, and transformation. This facilitates the modular composition of optimizers, allowing coalescing and separation at the specification level. Moreover, the stratification heuristic helps to structure complex specifications automatically. Although the methodology does not cover all analyses and transformations, it can deal with a broad range of problems. Since it is independent of the source and intermediate languages, and leads to efficiently executing optimizers, it should greatly facilitate the development of optimizers.

This work associates program optimization, Datalog, and graph rewrite systems. Program analysis can be seen as a process to query the intermediate representation, or as a process to infer explicit knowledge implicit in the intermediate representation. On the other hand, exhaustive graph rewrite systems extend Datalog in a straightforward way to handle transformations. Moreover, the fact that they can be evaluated by Datalog algorithms hints to a new approach for efficient execution of general graph rewrite systems.

Future work will consider nondeterministic graph rewrite systems. It is interesting to investigate how a rule-based cost function can be used to select better or best derivations. As in the case of weighted term rewrite systems for code-generator-generators [Emmelmann 1992] such a method would lead to generated optimizers which automatically improve the quality of their manipulated code. Next, it is likely that new static termination criteria can be found. Finally, stratified rewrite systems provide a powerful rule-based programming paradigm in which the control flow is automatically computed as a result of the stratification. In contrast to logic programming, stratified graph rewrite systems allow general data transformations. Hence they could be used to extend standard programming languages.

ACKNOWLEDGMENTS

I would like to thank my colleagues from the COMPARE group at the University of Karlsruhe: Claus-Thomas Buhl, Helmut Emmelmann, Thomas Muller, Jurgen Vollmer, Andreas Ludwig, and Prof. Gerhard Goos. The group OSCAR at INRIA Rocquencourt, headed by Martin Jourdan, provided an excellent environment to write major parts of this paper. Berthold Hoffmann (University of Bremen) read the paper carefully, and proposed the term exhaustive graph rewrite systems to replace an earlier name. The anonymous reviewers made a lot of important suggestions. Carol-Ina Trudel and Laura Olson helped me improve my English. OPTIMIX is publicly available for non-commercial use [Assmann 1996]. Besides with CoSy, it can collaborate with the Cocktail compiler toolbox or the standard Java environment.

(1) OPTIMIX assumes that all relations, also AllExprs, have been initialized correctly. To this end, another graph rewrite system can be applied before COLLECT-EXPRESSIONS.

(2) For simplicity, edge labels are not overloaded, i.e., if they relate different node types, the labels have to have a distinct name.

(3) To simplify, often a rule set S is also called graph rewrite system although features such as termination or confluency always depend on the axiom.

(4) This step can be left out. Then an extra transformation rule for roots of AC is required later on.

(5) We refrain from calling it a normal form.

(6) Also the cost of the algorithm in Section 5 does not change, because node label matching in a single inheritance hierarchy can be performed in constant time [Blostein et al. 1994a].

(7)According to the data model nodes with label IntConst do not belong to the expressions.

(8) As isolation analysis is rather difficult to understand, it is recommended that this section be read together with Knoop [1994].

(9) Loop 13 and 14 are not exchanged in other cases because then several traversals in loop 13 are economized.

(10) Common COMPARE Medium Intermediate Representation

REFERENCES

ALT, M., ASSMANN, U., AND VAN SOMEREN, H. 1994. Cosy Compiler Phase Embedding with the CoSy Compiler Model. In Compiler Construction (CC), P. A. Fritzson, Ed. Lecture Notes in Computer Science, vol. 786. Springer Verlag, Heidelberg, 278-293.

ALT, M. AND MARTIN, F. 1995. Generation of efficient interprocedural analyzers with PAG. In Static Analysis Symposium, A. Mycroft, Ed. Number 983 in Lecture Notes in Computer Science. Springer Verlag, Heidelberg.

ARIOLA, Z. AND ARVIND. 1993. Graph rewriting systems for efficient compilation. In Term Graph Rewriting--Theory and Practice, M. R. Sleep, M. J. Plasmeijer, and M. C. van Eekelen, Eds. John Wiley and Sons Ltd, New York, Chapter 6, 77-90.

ASSMANN, U. 1994. On Edge Addition Rewrite Systems and Their Relevance to Program Analysis. In 5th Int. Workshop on Graph Grammars and Their Application To Computer Science, Williamsburg, J. Cuny, H. Ehrig, G. Engels, and G. Rozenberg, Eds. Lecture Notes in Computer Science, vol. 1073. Springer Verlag, Heidelberg, 321-335.

ASSMANN, U. 1995. Generierung von Programmoptimierungen mit Graphersetzungssystemen. Oldenbourg, Munchen. GMD-Bericht 262.

ASSMANN, U. 1996. OPTIMIX Language Manual (for OPTIMIX 2.0). Tech. Rep. RT-0195, INRIA Rocquencourt. Available at http://www-rocq.inria.fr/doc/ or http://i44www.info.unikarlsruhe.de/~assmann[/optimix.html].

BAHLKE, R. AND SNELTING, G. 1986. The PSG System: From Formal Language Definitions to Interactive Programming Environments. ACM Transactions on Programming Languages and Systems 8, 4 (October), 547-576.

BERGSTRA, J., HEERING, J., AND KLINT, P. 1989. Algebraic Specification. ACM Press/Addison-Wesley, New York/Reading, MA.

BINKLEY, D. 1994. Interprocedural Constant Propagation using Dependence Graphs and a Data-Flow Model. In Compiler Construction (CC), P. Fritzson, Ed. Lecture Notes in Computer Science, vol. 786. Springer Verlag, Heidelberg, 473-388.

BLOSTEIN, D., FAHMY, H., AND GRBAVEC, A. 1994a. Practical Use of Graph Rewriting. Technical Report Queens University, Kingston, Ontario.

BLOSTEIN, D., FAHMY, H., AND GRBAVEC, A. 1994b. Practical Use of Graph Rewriting. In 5th Workshop on Graph Grammars and Their Application To Computer Science, J. Cuny, Ed. Lecture Notes in Computer Science, vol. 1073. Springer Verlag, Heidelberg, 38-55.

CERI, S., GOTTLOB, G., AND TANCA, L. 1989a. Logic Programming and Databases. Springer Verlag, Heidelberg.

CERI, S., GOTTLOB, G., AND TANCA, L. 1989b. What You Always Wanted to Know About Datalog (And Never Dared to Ask). IEEE Transactions on Knowledge And Data Engineering 1, 1 (Mar.), 146-166.

CHOW, F., CHAN, S., LIU, S.-M., LO, R., AND STREICH, M. 1996. Effective Representation of Aliases and Indirect Memory Operations in SSA Form. In Compiler Construction (CC), T. Gyimothy, Ed. Lecture Notes in Computer Science, vol. 1060. Springer Verlag, Heidelberg, 253-267.

DAWSON, S., RAMAKRISHNAN, C. R., AND WARREN, D. S. 1996. Practical program analysis using general purpose logic programming systems -- A case study. ACM SIGPLAN Notices 31, 5 (May), 117-126.

DERSHOWITZ, N. AND JOUANNAUD, J.-P. 1990. Rewrite systems. In Handbook of Theoretical Computer Science, J. van Leeuwen, Ed. Elsevier Science Publishers, Amsterdam, 243-320.

DORR, H. 1995. Efficient Graph Rewriting and Its Implementation. Lecture Notes in Computer Science, vol. 922. Springer Verlag, Heidelberg.

EHRIG, H., KORFF, M., AND LOWE, M. 1990. Tutorial introduction to the algebraic approach of graph grammars based on double and single pushouts. In 4th International Workshop On Graph Grammars and Their Application to Computer Science, H. Ehrig, H.-J. Kreowski, and G. Rozenberg, Eds. Lecture Notes in Computer Science, vol. 532. Springer Verlag, Heidelberg, 24-37.

EMMELMANN, H. 1992. Code selection by regularly controled term rewriting. In Code Generation -- Concepts, Tools, Techniques, R. Giegerich and S. Graham, Eds. Workshops in Computing. Springer Verlag, Heidelberg.

FIELD, J., RAMALINGAM, G., AND TIP, F. 1995. Parametric Program Slicing. In Conference Record of the 22nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL'95). Vol. 22. ACM, San Francisco, California, 379-392.

GROSCH, J. 1992. AG -- An Attribute Evaluator Generator. Tech. rep., Gesellschaft fuer Mathematik und Datenverarbeitung, Forschungstelle Karlsruhe. Aug. Language Manual.

HAREL, D. 1988. On visual formalisms. Communications of the ACM 31, 5 (May), 514-530.

JONES, L. G. 1989. Efficient evaluation of circular attribute grammars. ACM Transactions on Programming Languages and Systems 12, 3 (July), 429-462.

JOURDAN, M. AND PARIGOT, D. 1991. The FNC-2 System User's Guide and Reference Manual. INRIA Rocquencourt.

KENNAWAY, J. R., KLOP, J. W., SLEEP, M. R., AND DE VRIES, F. J. 1994. On the adequacy of graph rewriting for simulating term rewriting. ACM Transactions on Programming Languages and Systems 16, 3, 493-523.

KNOOP, J., RUTHING, O., AND STEFFEN, B. 1994. Optimal code motion: Theory and practice. Transactions on Programming Languages and Systems 16, 4 (July), 1117-1155.

KNOOP, J. AND STEFFEN, B. 1992. The interprocedural coincidence theorem. In Compiler Construction (CC), U. Kastens and P. Pfahler, Eds. Lecture Notes in Computer Science, vol. 641. Springer Verlag, Heidelberg, 125-140.

KOCK, G. 1992. Spezifikation und Verifikation von Optimierungsalgorithmen. GMD Berichte, vol. 201. Oldenbourg, Munchen.

MARLOWE, T. J. AND RYDER, B. G. 1990. Properties of Data Flow Frameworks. Acta Informatica 28, 121-161.

MOREL, E. AND RENVOISE, C. 1979. Global optimization by suppression of partial redundancies. Communications of the ACM 22, 2, 96-103.

NAGL, M. 1979. Graph-Grammatiken, Theorie, Implementierung, Anwendungen. Vieweg, Braunschweig.

NEWMAN, M. 1942. On theories with a combinatorial definition of equivalence. Annals of Mathematics 43, 2, 223-243.

ODERSKY, M. 1993. Defining context-dependent syntax without using contexts. ACM Transactions on Programming Languages and Systems 15, 3 (July), 535-562.

PINTER, S. S. AND PINTER, R. Y. 1994. Program Optimization and Parallelization Using Idioms. ACM Transactions on Programming Languages and Systems 16, 3 (May), 305-327.

PLUMP, D. 1993a. Evaluation of Functional Expresions by Hypergraph Rewriting. Ph.D. thesis, Universitat Bremen.

PLUMP, D. 1993b. Hypergraph rewriting: Critical pairs and undecidability of confluence. In Term Graph Rewriting--Theory and Practice, M. R. Sleep, M. J. Plasmeijer, and M. C. van Eekelen, Eds. John Wiley and Sons Ltd, New York, Chapter 15, 201-214.

PLUMP, D. 1995. On Termination of Graph Rewriting. In Graph Theoretic concepts in Computer Science. Lecture Notes in Computer Science. Springer Verlag, Heidelberg.

REPS, T., HORWITZ, S., AND SAGIV, M. 1995. Precise interprocedural dataflow analysis via graph reachability. In ACM Symposium on Principles of Programming Languages. Vol. 22. ACM, San Francisco, California, 49-61.

REPS, T. W. AND TEITELBAUM, T. 1988. The Synthesizer Generator: A System for Constructing Language-Based Editors. Springer Verlag, Heidelberg.

SCHURR, A. 1991. Operationales Spezifizieren mit programmierten Graphersetzungssystemen. Ph.D. thesis, Universitat Aachen, Deutscher Universitats-Verlag.

SCHURR, A., WINTER, A. J., AND ZURNDORF, A. 1995. Graph Grammar Engineering with PROGRES. In European Software Engineering Conference ESEC 5. Lecture Notes in Computer Science, vol. 989. Springer Verlag, Heidelberg, 219-234.

SHARIR, M. AND PNUELI, A. 1981. Two approaches to interprocedural data flow analysis. In Program Flow Analysis, S. Muchnick and N. Jones, Eds. Prentice Hall, Englewood Cliffs, NJ.

SLEEP, M. R., PLASMEIJER, M. J., AND VAN EEKELEN, M. C. 1993. Term Graph Rewriting--Theory and Practice. John Wiley and Sons Ltd, New York.

STEFFEN, B. 1991. Data flow analysis as model checking. In Proceedings of Theoretical Aspects of Computer Software (TACS), T. Ito and A. R. Meyer, Eds. Lecture Notes in Computer Science, vol. 526. Springer Verlag, Heidelberg, 346-364.

TARJAN, R. E. 1981. A unified approach to path problems. Journal of the ACM 28, 3 (July), 577-593.

TJIANG, S. W. K. AND HENESSY, J. L. 1992. Sharlit -- A tool for building optimizers. SIGPLAN Notices 27, 7 (July), 82-93.

ULLMAN, J. D. 1994. Assigning an appropriate meaning to database logic with negation. In Computers as Our Better Partners, H. Yamada, Y. Kambayashi, and S. Ohta, Eds. World Scientific Press, Singapore, 216-225.

VAN EEKELEN, M. C. J. D. AND PLASMEIJER, R. M. J. 1995. Constructing Medium Sized Efficient Functional Programs in Clean. Lecture Notes in Computer Science 925, 183-227.

VENKATESH, G. A. 1989. A Framework for Construction and Evaluation of High-Level Specifications for Program Analysis Techniques. In SIGPLAN Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices 24, 7 (July), 1-12.

VOGT, H. H., SWIERSTRA, S. D., AND KUIPER, M. F. 1989. Higher order attribute grammars. ACM SIGPLAN Notices 24, 7 (July), 131-145.

VOLLMER, J. AND HOFFART, R. 1992. Modula-P, a language for parallel programming: Definition and implementation on a transputer network. In Proceedings of the 1992 International Conference on Computer Languages ICCL'92, Oakland, California. IEEE, IEEE Computer Society Press, Los Alamitos, CA, 54-64.

VOLLMER, J., VAN SOMEREN, H., AND LIBOUREL, M. 1993. CCMIR Definition. Tech. rep., COMPARE Consortium. Contact info@ace.nl.

WHITFIELD, D. AND SOFFA, M. L. 1991. Automatic Generation of Global Optimizers. In SIGPLAN Conference on Programming Language Design and Implementation. SIGPLAN Notices 26, 6 (June), 120-129.

ZURNDORF, A. 1995. Eine Entwicklungsumgebung fur PROgrammierte GRaphErsetzungsSysteme. Ph.D. thesis, Universitat Aachen (RWTH).

Received July 1996; revised July 1997, March 1998, June 1998, Oct 1998, July 1999, and March 2000; accepted August 2000.

This work has partly been supported by Esprit project No. 5399 COMPARE. Author's address: U. Assmann, Universitat Karlsruhe, Institut fur Programmstrukturen und Datenorganisation, Kaiserstrasse 12, 76128 Karlsruhe, Germany. New address in 2001: PELAB, IDA, Linkoepings Universitet, S-58183 Linkoeping, Sweden.

Printer friendly Cite/link Email Feedback | |

Author: | ASSMANN, UWE |
---|---|

Publication: | ACM Transactions on Programming Languages & Systems |

Geographic Code: | 1USA |

Date: | Jul 1, 2000 |

Words: | 24601 |

Previous Article: | Java Bytecode Compression for Low-End Embedded Systems. |

Next Article: | Program Transformation and Runtime Support for Threaded MPI Execution on Shared-Memory Machines. |

Topics: |