# An Algebraic Approach to Static Analysis of Active Database Rules.

1. INTRODUCTIONAn active database system is a conventional database system extended with a facility for managing active rules (or triggers). Incorporating active rules into a conventional database system has raised considerable interest both in the scientific community and in the commercial world. A number of prototypes that incorporate active rules into relational and object-oriented database systems have been developed to explore different features of active rule languages; triggers are now available in many commercial relational products [Widom and Ceri 1996]. Rules may be activated either by the occurrence of events (event-condition-action or ECA rules), or by the occurrence of particular database states (condition-action or CA rules). When activated, rules perform actions that may range from database manipulation operations to external procedure calls. Active rules provide a powerful mechanism for database system functions that were previously performed by user applications, for example, general integrity constraint maintenance [Ceri and Widom 1990; Ceri et al. 1994] and materialized view maintenance [Ceri and Widom 1991]. Active rules can also be used for advanced features of complex database applications, for example, workflow management [Dayal et al. 1990].

Active database systems are very powerful, but developing even small active rule applications can be a difficult task due to the unstructured and unpredictable nature of rule processing. During rule processing, rules can trigger and "untrigger" each other, and the intermediate and final states of the database can depend upon which rules are triggered and executed in which order. It has been observed in the past [Aiken et al. 1995; Karadimce and Urban 1994; van der Voort and Siebes 1993] that two important and desirable properties of active rule behavior are termination and confluence. A rule set is guaranteed to terminate if, for any database state and initial modification, rule processing cannot continue forever (i.e., rules cannot activate each other indefinitely). A rule set is confluent if, for any database state and initial modification, the final database state after rule processing is independent of the order in which activated rules are executed.

Previous work on active rule analysis such as Aiken et al. [1995], Karadimce and Urban [1994], van der Voort and Siebes [1993], and Weik and Heuer [1995] has developed compile-time techniques that allow a rule programmer to predict in advance aspects of rule behavior such as termination and confluence. These techniques are used to statically analyze a set of rules before installing the rules in the database. Thus, static rule analysis can form the basis of a design methodology and programming environment for active database systems. A different approach has been used in commercial systems, where a runtime limit prevents infinite rule execution (see, e.g., Oracle [1999]). Of course, this approach may abort rule processing when execution would have terminated successfully in a few more steps. Baralis et al. [1995b] describe a technique to improve runtime termination analysis.

In this article we present new techniques for performing static analysis of both condition-action and event-condition-action rules. Our analysis techniques are based on a generally applicable algorithm that is used to determine when the action of one rule can affect the condition of another rule and to determine when rule actions commute. The algorithm uses an extension of relational algebra to model rule conditions and actions. Essentially, the algorithm "propagates" one rule's action through another rule's condition (or action) to determine how the action may affect the condition (or other action); hence we call our algorithm the propagation algorithm. The propagation algorithm is useful for analyzing termination because it can determine when one rule may activate another rule. The propagation algorithm is also useful for analyzing confluence because it can determine when the execution order of two rules is significant. The propagation algorithm determines these properties much more accurately than previous methods (for example, Aiken et al. [1999], Hellerstein and Hsu [1991], Zhou and Hsu [1990]). In addition, since we take a general approach based on relational algebra and since we consider both event-condition-action and condition-action rules, our method is applicable to most active database systems that use the relational model and to triggers expressed in the language of the SQL:1999 standard [SQL3 1999; Eisenberg and Melton 1999].

1.1 Outline of the Article

In Section 2 we describe the algebraic language we use as the basis of our work. Section 3 contains the description of the propagation algorithm, examples of its application, and a proof of its correctness. In Section 4 we describe our event-condition-action (ECA) and condition-action (CA) rule languages, and then show how SQL:1999 triggers are represented in our ECA rule language. Sections 5 and 6 describe how the propagation algorithm is applied to the analysis of termination and confluence for these languages; several examples are included. In Section 7 we present several improvements to our termination and confluence analysis techniques. Section 8 discusses the relationship of our techniques with previous work, while Section 9 draws conclusions and outlines future work.

2. ALGEBRAIC QUERY AND MODIFICATION REPRESENTATION

In this section we describe the extensions to relational algebra that are required to represent general database query and modification operations. This representation is used in the following sections for rule conditions, which are queries on the database, and rule actions, which are data modifications; both are represented by relational algebra expressions. We first define the operators in our extended relational algebra. Then we give the representation of modification operations, and finally some examples of both queries and modifications.

2.1 Algebraic Operators

Based on Ceri and Gottlob [1985] and Klug [1982], we define an extension to relational algebra that allows us to represent any queries that are expressible in SQL (or Quel), with the exception of the handling of duplicates and ordering conditions. We also introduce an extension that allows us to represent the SQL data modification operations insert, delete, and update.

Our extended relational algebra includes the basic relational algebra operators select ([Sigma]), project ([Pi]), cross-product (x), natural join (??), union ([union]), and difference (-), which we do not elaborate on here; see Ullman [1989]. The first two lines of Table I present useful operators derived from the basic operators, while the next three lines present additional operators that we use. In the table, X and A denote attributes; B, [A.sub.1], and [A.sub.2] denote attribute lists; a is an aggregate function; and expr is an expression (explained below). In line 1, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; in line 3, [Alpha] renames the attributes in list [A.sub.1] as [A.sub.2]. In the remainder of the article we adopt the shorthand notation [E.sub.1] ?? [E.sub.2] and [E.sub.1] [bar]?? [E.sub.2] to denote [E.sub.1] [??.sub.p] [E.sub.2] and [E.sub.1] [[bar]??.sub.p] [E.sub.2] when predicate p equates all attributes with the same name in both schema([E.sub.1]) and schema([E.sub.2]) (similar to the natural join). We now discuss the other operators in more detail; then we present the modification operations.

Table I. Additional Algebraic Operators Operator Description [??.sub.p] semijoin with predicate p [[bar]??.sub.p] antisemijoin with predicate p MATHEMATICAL EXPRESSION attribute rename NOT REPRODUCIBLE IN ASCII] E[X = expr] attribute extension and expression evaluation A[X = a(A);B] attribute extension and aggregate function evaluation

2.1.1 Antisemijoin. The antisemijoin operator [[bar]??.sub.p] is introduced to concisely express negative subqueries as they are expressed in SQL (e.g., not exists); negative subqueries frequently appear in rule definitions [Ceri and Widom 1990]. The antisemijoin operator is defined as

[E.sub.1] [[bar]??.sub.p] [E.sub.2] = [E.sub.1] - ([E.sub.1] [??.sub.p] [E.sub.2]).

Note that we could instead define the relational difference operator in terms of antisemijoin: [E.sub.1] - [E.sub.2] = [E.sub.1] [bar]?? [E.sub.2] (with renaming of attributes in [E.sub.1] and [E.sub.2] as necessary). For convenience, we consider only the antisemijoin and not the difference operator in the remainder of the article.

2.1.2 Aggregate Functions and Expression Evaluation. The attribute extension operators allow us to extend a relational expression E with a new attribute; this approach is used for aggregate functions and for modification operations. We have

--the E operator, which computes expressions applied to each tuple of E;

--the A operator, which computes aggregate functions (e.g., max, min, avg, sum, count) over partitions of E.

E is a unary operator applied to a relational expression E, producing a result with schema schema(E) [union] {X}. Recall from Table I that the [Epsilon] operator is expressed as

E[X = expr]E.

expr is an expression evaluated over each tuple t of E (a conventional expression involving attributes of t and constants) yielding one value for each tuple; this value is entered into the new attribute X for each tuple of E. For details of similar operators, see Ceri et al. [1990]; examples are given in later sections.

A is also a unary operator applied to a relational expression E producing a result with schema schema(E) [union] {X}. Recall from Table I that the A operator is expressed as

A[X = a(A);B]E.

B defines a set of attributes on which the result of E is partitioned; each group in the partition contains all tuples with the same B value, a is an aggregate function that is applied to the (multiset of) values contained in the projection of each partition on attribute A, yielding one value for each partition; this value is entered into the new attribute X for each tuple of the partition. The attributes B are optional: when B is omitted, no grouping is performed, and the aggregate function a is applied to the entire result of E, yielding one value; that value is entered into the new attribute X for each tuple of E. For details, see Ceri and Gottlob [1985].

2.1.3 Modification Operations. We represent data modification operations in relational algebra by characterizing the operations in terms of the database state they produce. Table II presents inserts, deletes, and updates by indicating the algebraic expressions that are used to denote the operations and the way in which these expressions are applied to a relation R to produce a new value for R. In the table, [A.sub.u] denotes the attributes of R that are updated, [A'.sub.u], denotes primed versions of these attributes (explained below), and [A.sub.r] = schema(R) - [A.sub.u].

Table II. Algebraic Description of Insert, Delete, and Update Operations

Algebraic Operation Expression New Database State insert [E.sub.ins] R [union] [E.sub.ins] delete [E.sub.del] R [bar]?? [E.sub.del] update [E.sub.upd] MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII

2.1.3.1 Insert Operation. An insert operation is denoted by a relational expression [E.sub.ins]. [E.sub.ins] produces the tuples to be inserted (either a constant tuple or the result of an algebraic expression). The schema of [E.sub.ins] must coincide with the schema of R.

2.1.3.2 Delete Operation. A delete operation is denoted by a relational expression [E.sub.del]. [E.sub.del] produces the tuples to be deleted. The schema of [E.sub.del] must coincide with the schema of R.

2.1.3.3 Update Operation. An update operation is denoted by a relational expression [E.sub.upd]. [E.sub.upd] has schema schema(R) [union] [A'.sub.u], where attributes [A'.sub.u] contain the new values for the updated attributes [A.sub.u]. As convention, the new values for the updated attributes are always assigned the corresponding "primed" attribute names. That is, if attribute A [element of] [A.sub.u] is updated, then the new value for A is assigned to attribute A'. The update operation only operates on tuples already present in the relation to be updated. Thus, [[Pi].sub.schema(R)][E.sub.upd] [subset or equal to] R. [E.sub.upd] is expressed as

[E.sub.upd] = E[[A'.sub.u1] = [expr.sub.1]]E[[A'.sub.u2] = [expr.sub.2]] ... E[[A'.sub.un] = [expr.sub.n]][E.sub.c],

where [E.sub.c] is an expression producing the tuples to be updated (i.e., the "selection condition" of the update operation). The schema of [E.sub.c] must coincide with the schema of R. E[[A'.sub.ui] = [expr.sub.i]] evaluates expression [expr.sub.i] on each tuple of [E.sub.c] and assigns the result to the new attribute [A'.sub.ui].

As specified in Table II, the new state of R after the update operation is the union of two terms.

(1) The first term R [bar]?? [E.sub.upd] includes in the result all tuples in R that are not modified by the update operation.

(2) The second term [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] includes in the result the original values for the nonupdated attributes of the modified tuples and the new values for the modified attributes, with the primed attribute names replaced by the original attribute names.

Given a relational expression E with schema schema(R) [union] [A'.sub.u], we often need the corresponding expression that is compatible in the schema with R and contains either the preupdated (old) or the updated (new) values for the modified attributes. For convenience, we use the abbreviations

[[Rho].sub.old](E) = [[Pi].sub.schema(R)](E)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

2.2 Examples

Examples throughout the article are based on the relations:

ACCOUNT (num, name, balance, rate) CUSTOMER (name, address, city) LOW-ACC (num, start , end).

Relation ACCOUNT contains information on a bank's accounts, while relation CUSTOMER contains information on the bank's customers. Relation LOW-ACC contains a history of all time periods in which an account had a low balance, including the date on which the balance became low and the date on which it became high again (the latter value can be null, indicating that the account is still low). We assume that the italicized attributes are a key for the corresponding relation, although our method does not rely on keys.

As examples of our language we give the algebraic representation of a query and a modification operation.

Example 1. Select the information on all San Francisco (SF) customers' accounts with a balance greater than 5000 and an interest rate less than 3%. In our algebraic language this query is expressed as

([[Sigma].sub.balance [is greater than] 5000 [conjunction] rate [is less than] 3]ACCOUNT) ?? ([[Sigma].sub.city=`SF']CUSTOMER).

Example 2. Raise the interest rate to 2% for all accounts that have an interest rate greater than 1% but less than 2%. In our algebraic language this modification is expressed as

E[rate' = 2][[Sigma].sub.rate [is greater than] 1 [conjunction] rate [is less than] 2]ACCOUNT.

3. THE PROPAGATION ALGORITHM

In this section we describe a general algorithm, which we call the propagation algorithm, that uses syntactic analysis to determine how a database query can be affected by the execution of a data modification operation. We first describe the structure of the algorithm's output. We then present the algorithm itself, along with several examples of its application. Finally, we discuss the correctness of the algorithm.

3.1 Output of the Algorithm

The input to the propagation algorithm is a query and a modification, expressed in our algebraic language. The output of the algorithm is zero or more of each of the operations insert, delete, and update, characterizing how the result of the query may change due to the execution of the modification: if the algorithm produces an insert operation, then the query may contain more data after the modification; if the algorithm produces a delete operation, then the query may contain less data after the modification; if the algorithm produces an update operation, then the query may contain updated data after the modification; if no operations are produced, then the result of the query cannot change due to the modification. The operations produced by our algorithm are represented as relational expressions in the same way that we algebraically represent data modification operations in Section 2.1.3, except here the modifications apply to arbitrary relational expressions instead of only to single relations.

The output of the propagation algorithm may contain more than one each of the insert, delete, and update operations. Let the propagation algorithm, applied to a query Q and a modification M, produce [n.sub.I] insert operations [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], [n.sub.U] update operations [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [n.sub.D] delete operations [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We also consider the changes to the result of Q due to the execution of M as represented by negative modifications [E.sup.-] and positive modifications [E.sup.+], defined on the basis of the propagation algorithm's output as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

[E.sup.-] (Q, M) and [E.sup.+] (Q, M) are, respectively, the changes subtracted from and added to the result of Q. In practice, the number of modifications of each type, that is, [n.sub.I], [n.sub.D], and [n.sub.U], are almost always zero or one (see the examples in Section 3.4).

3.2 The Algorithm

The propagation algorithm takes as input a query Q and a modification M, both expressed in extended relational algebra as defined in Section 2. As an initial filter, if the query Q does not reference the relation modified by M, then clearly M cannot affect the result of Q. Otherwise, M is "propagated" through a tree representation of query Q. The leaves of the tree are relations, and one of these leaves corresponds to the relation R that is modified by M. In order to simplify the propagation process, we assume there is only one reference to R in query Q.(1) Modification M is propagated from the affected relation up the query tree, and it may be transformed into one or more different modification operations during the propagation process. To describe the propagation, we give formal rules specifying how arbitrary modifications are propagated through arbitrary nodes of the tree. After each propagation through a node in the tree, the modifications obtained are checked for "consistency" (explained next). Inconsistent modifications are discarded, while consistent modifications are propagated further. The propagation process continues until the root of the query tree is reached or all modifications have been discarded as inconsistent. At each point during the propagation process, the modifications associated with a node N in the tree indicate the modifications that may occur to N's subtree as a result of performing the original modification M. Hence, the consistent modifications that reach the root of the tree indicate how the original modification M may affect query Q.

A modification produced by the propagation process is consistent when the algebraic expression representing the modification does not contain contradictions; that is, it is satisfiable. Satisfiability of relational expressions is undecidable in the general case, so we can give sufficient but not necessary conditions for satisfiability of the propagated expressions. However, for most expressions that arise in practice, either we can see trivially that the expression is not satisfiable (as in the examples in Section 3.4), or we can verify satisfiability using the tableau method in Ullman [1989].(2)

Figure 1 illustrates the propagation of an insert operation (represented by expression [E.sub.ins]) on relation [R.sub.3] through the nodes of the query tree representing the query Q = ([[Sigma].sub.p1][R.sub.1]) ?? ([[Sigma].sub.p2][R.sub.2] ?? [R.sub.3]). The bold line represents the propagation path of the [E.sub.ins] modification: the [E.sub.ins] modification is first substituted for the affected relation [R.sub.3]. Then, starting from the [bar]??.sub.p3] node, for each node with an operand affected by the [E.sub.ins] modification, the corresponding propagated expression is computed. At the end of the propagation process, a delete operation [E".sub.del] is obtained at the root. As the reader may verify, an insert operation on [R.sub.3] may only cause data satisfying Q to be deleted.

[Figure 1 ILLUSTRATION OMITTED]

The execution time of the propagation algorithm depends on both the time required to expand the propagation rules and the time required to check the satisfiability of the propagated modifications. Although both times in the worst case (when aggregates are present) may grow exponentially with the depth of the query tree, recall that the propagation algorithm is used for compile-time analysis of queries and modifications, not runtime, and that the relational expressions to which it is applied are normally fairly small. Hence, even though the complexity of the propagation algorithm can be exponential, we expect performance to be acceptable in practice. This supposition has been substantiated by our experience applying a subset of the techniques presented in this article to the analysis of actual rule sets expressed in the Chimera language [Widom and Ceri 1996], in the context of the Esprit project IDEA; please see Baralis et al. [1995a] for details.

3.3 The Rules for Propagation

The rules for propagation are given in tables based on the kind of incoming modification: insert and delete modifications in Tables III and IV, respectively, and update modifications in Tables V and VI. Each row in the tables contains the propagated modification(s) [E.sup.out] as a function of the incoming modification [E.sup.in] and the relational operator in the query tree. The column labeled "Applicability Condition" specifies when different propagation rules are used for different cases. In the tables, [A.sub.1], [A.sub.2], and B are attribute lists, [A.sub.jn] = schema([E.sub.1]) [intersection] schema([E.sub.2]), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = schema([E.sub.2]), [A.sub.u] are the updated attributes, [A.sub.p] and [A.sub.e] are the attributes involved in predicate p and expression expr, respectively, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] expr, p(B) equates all attributes in list B, and p' ([A.sub.uB]) equates all attributes in [A'.sub.uB] with the corresponding B attributes. Since the natural join, Cartesian product, and union operators are symmetric, without loss of generality we assume that the first operand is modified; analogous rules apply for modifications to the second operand. Observe that aggregate functions require, in addition to the incoming modification, the entire relational expression E to which the aggregate function is applied.

[TABULAR DATA III-VI NOT REPRODUCIBLE IN ASCII]

To provide intuition for the structure of the propagation rules, we give an example of how a rule is built. Consider the rule for the propagation of an insert modification [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] through the node A[X = a(A);B], when B [is not equal to] 0 (i.e., grouping is present), which is given in Table III. The propagation of this modification may yield certain output modifications. The first, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is the insertion of the new tuples (contained in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) extended with the appropriate value for the new attribute X, obtained by computing aggregate function a(A) on the relational expression E [union] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] grouped on attribute B. The second modification, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], performs the update of attribute X for all tuples whose B value was equal to some B value in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. (These tuples are selected by the expression [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].) Again, the new value of attribute X is obtained by computing aggregate function a(A) on the relational expression [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] grouped on attribute B.

The formulas given in Table V do not take into account the internal structure of selection predicates and update expressions. In the case of simple predicates (comparisons between an attribute and a constant(3)) and simple arithmetic update expressions (addition or subtraction of constants from an attribute), it is often possible to eliminate some of the propagated modifications. As an example, consider a query Q = [[Sigma].sub.A=k]E. Suppose attribute A in E is updated by an arbitrary arithmetic expression. From Table V, the propagation through node [[Sigma].sub.A=k] of the incoming update [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] would yield three modifications [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. However, the update operation can only cause the insertion of tuples into Q (that previously did not satisfy the selection predicate), and the deletion of tuples from Q that now do not satisfy the predicate (but did before the update). Tuples that formerly satisfied the predicate and still do cannot change the value of attribute A, so the update modification, when propagated through the selection node [[Sigma].sub.A=k], cannot generate a propagated update modification. As a second example, consider the propagation of the update A = A + 1 through the node [[Sigma].sub.A [is greater than] k]. This update will never cause tuples to be deleted from the expression rooted in [[Sigma].sub.A [is greater than] k]. Thus, the propagated delete operation can be eliminated. Table VII shows the modifications that can be eliminated in the different cases. In the table, "other" indicates an arbitrary arithmetic expression, which in the case of an equality predicate still allows the update operation to be eliminated (as illustrated above).

[TABULAR DATA VII NOT REPRODUCIBLE IN ASCII]

3.4 Examples

We give two examples of applying the propagation algorithm using queries and modifications expressed on the relations defined in Section 2.2. In each example, we fully describe the propagation process and the satisfiability test.

Example 3. Query Q selects the balance and rate of all accounts that have balance [is less than] 500 and rate [is greater than] 0%. Update operation M increases by 1% the rate of all San Francisco customers having accounts with balance [is greater than] 5000 and rate [is less than] 3%. The input to the algorithm is

Q = [[Pi].sub.balance, rate]([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0] ACCOUNT)

M = [E.sub.upd] = E[rate' = 2]([[Sigma].sub.rate [is greater than] 1 [conjunction] rate [is less than] 3] (ACCOUNT ?? ([[Sigma].sub.city =' SF]CUSTOMER))).

Using Table V, the propagation of [E.sub.upd] through the selection operation in Q yields insert and update operations (the delete operation is eliminated; see Table VII). We have

[E'.sub.ins] = [[Rho].sub.new](([[Sigma].sub.balance, [is less than] 500 [conjunction] rate' [is greater than] 0][E.sub.upd]) [bar]?? ([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0] [E.sub.upd]).

[E'.sub.upd] = ([[Sigma].sub.balance [is greater than] 500 [conjunction] rate' [is greater than] 0] [E.sub.upd]) ?? [[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0 [E.sub.upd]).

In both cases, predicates balance [is less than] 500 and balance [is greater than] 5000 (the latter from [E.sub.upd]) are contradictory, so both expressions [E'.sub.ins] and [E'.sub.upd] are unsatisfiable. Intuitively, modification M operates on data not read by query Q. We conclude that modification M cannot affect query Q.

Example 4. Query Q selects accounts with balance [is less than] 500 and interest rate [is greater than] 0%. Update operation M sets to 2% the interest rate of all accounts with rate between 1 and 2%. The input to the algorithm is

Q = [[Pi].sub.balance, rate]([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0] ACCOUNT)

M = [E.sub.upd] = E[rate' = 2]([[Sigma].sub.rate [is greater than] 1 [conjunction] rate [is less than] 2] ACCOUNT).

The propagation of [E.sub.upd] through the selection operation in Q yields insert, delete, and update operations

[E'.sub.ins] = ([[Rho].sub.new] (([[Sigma].sub.balance [is less than] 500 [conjunction] rate' [is greater than] 0] [E.sub.upd]) [bar]?? ([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0 [E.sub.upd]))

[E'.sub.del] = [[Rho].sub.old]([[Sigma].sub.balance [is less than] 500 [conjunction] rate' [is greater than] 0] [E.sub.upd]) [bar]?? ([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0 [E.sub.upd]))

[E'.sub.upd] = ([[Sigma].sub.balance [is less than] 500 [conjunction] rate' [is greater than] 0] [E.sub.upd]) ?? ([[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0 [E.sub.upd]).

These expressions do not contain contradictory predicates, so they may be satisfiable and the propagation continues.(4) The propagation of [E'.sub.ins], [E'.sub.del], and [E'.sub.upd] through the projection operation in Q yields

[E".sub.ins] = [[Pi].sub.balance, rate] [E'.sub.ins]

[E".sub.del] = [[Pi].sub.balance, rate] [E'.sub.del]

[E".sub.upd] = [[Pi].sub.balance, rate, rate' [E'.sub.upd].

All three expressions may be satisfiable, thus modification M can affect the result of query Q. Furthermore, [E".sub.ins], [E".sub.del], and [E".sub.upd] describe the modifications that may be performed on Q as a result of the execution of M.

3.5 Correctness of the Algorithm

Given a query Q and a modification M as input, the propagation algorithm produces as output a set of modifications that characterize how the result of Q changes due to the execution of M. Intuitively, we say that the propagation algorithm is correct if, applied to an arbitrary query Q and modification M, its output operations produce all modifications on Q caused by performing the original modification M, on any database state. Let Q(d) denote the result of evaluating query Q on database state d. For example, if for some database state d the execution of M causes the insertion of a tuple t into Q(d), then the propagation algorithm outputs an insert operation that produces tuple t on state d. However, the output operations may also yield some modifications that are not actually performed on Q(d), due to the execution of M on d. For example, the algorithm may output an insert operation that, when evaluated in state d includes tuple t, but tuple t is already in Q(d), thus t is not actually inserted. Intuitively, we say that the propagation algorithm is accurate if there is no state d such that its output operations produce a modification that is not performed on Q(d).

The propagation algorithm is correct: it determines if the execution of a modification M may produce a change in Q--if the result is empty, we know Q is always unchanged by M. However, the propagation algorithm is not accurate: it may produce changes that cannot actually occur. In practice, we have found that the slight inaccuracy of the algorithm does not compromise the effectiveness of our analysis techniques.

We proceed to prove that the propagation Aagorithm is correct but not accurate. We first introduce notation that allows us to formalize the intuitive definitions of correctness and accuracy given above.

Definition 1. Let Q be a query and M a modification. Let d be an arbitrary database state. The positive changes [Q.sup.+] and negative changes [Q.sup.-] to Q(d) based on the execution of M are given by

[Q.sup.+](M, d) = Q(d') - Q(d)

[Q.sup.-](M, d) = Q(d) - Q(d'),

where d' is the database state obtained from d by applying modification M.

Now we can formally define the correctness and accuracy of the propagation algorithm.

Definition 2. The propagation algorithm is correct if the following holds. Let the propagation algorithm be applied to an arbitrary query Q and modification M and produce as output a set [E.sup.out] of database modifications. Let [E.sup.+](Q, M) and [E.sup.-](Q, M) be the positive and negative modifications corresponding to [E.sup.out], as defined in Section 3.1. Consider an arbitrary database state d, and let [Q.sup.+](M, d) and [Q.sup.-](M, d) be as in Definition 1. Then [E.sup.-](Q, M)(d) [contains or equal to] [Q.sup.-](M, d) and [E.sup.+](Q, M)(d) [contains or equal to] [Q.sup.+](M, d), where [E.sup.-](Q, M)(d) denotes the evaluation of expression [E.sup.-](Q, M) on database state d (and similarly for [E.sup.+](Q, M)).

Definition 3. The propagation algorithm is accurate if the following holds. Let the propagation algorithm be applied to an arbitrary query Q and modification M and produce as output a set [E.sup.out] of modifications. Let [E.sup.+](Q, M) and [E.sup.-](Q, M) be the positive and negative modifications corresponding to [E.sup.out], as defined in Section 3.1. Consider an arbitrary database state d, and let [Q.sup.+](M, d) and [Q.sup.-](M, d) be as in Definition 1. Then [E.sup.-](Q, M)(d) [subset or equal to] [Q.sup.-](M, d) and [E.sup.+](Q, M)(d) [subset or equal to] [Q.sup.+](M, d), where [E.sup.-](Q, M)(d) denotes the evaluation of expression [E.sup.-](Q, M) on database state d (and similarly for [E.sup.+](Q, M)).

The following theorem proves the correctness of the propagation algorithm. The proof proceeds step by step for each propagation rule given in Tables III-VI, and the proof technique is analogous for all rules. Hence, we outline the proof procedure for only one propagation rule; see Baralis [1994] for a complete proof.

THEOREM 1. The propagation algorithm, based on the propagation rules in Tables III-VI, is correct.

PROOF. (Sketch). Let the propagation algorithm be applied to an arbitrary query Q and modification M and produce as output a set [E.sup.out] of modifications. Let [E.sup.+] (Q, M) and [E.sup.-] (Q, M) be the positive and negative modifications corresponding to [E.sup.out], as defined in Section 3.1. Consider an arbitrary database state d and let [Q.sup.+] (M, d) and [Q.sup.-] (M, d) be as in Definition 1. By Definition 2, to prove correctness we must show that [E.sup.-] (Q, M)(d) [contains or equal to] [Q.sup.-] (M, d) and [E.sup.+] (Q, M)(d) [contains or equal to] [Q.sup.+] (M, d). The proof proceeds by induction on the depth of query tree Q.

Base Case: Q is a single node representing the modified relation R. Consider M = [E.sub.upd], so [E.sup.+] (Q, M) = [[Rho].sub.new] ([E.sub.upd]), [E.sup.-] (Q, M) = [[Rho].sub.old]([E.sub.upd]). The execution of M produces a new database state d' in which Q(d') = (R(d) [bar]?? [[Rho].sub.old]([E.sub.upd])(d)) [union] [[Rho].sub.new]([E.sub.upd])(d). Hence, by Definition 1, [Q.sup.-](M, d) = [[Rho].sub.old]([E.sub.upd])(d) and [Q.sup.+](M, d)= [[Rho].sub.new]([E.sub.upd])(d). Thus, [E.sup.+] (Q, M)(d) = [Q.sup.+] (M, d) and [E.sup.-] (Q, M)(d) = [Q.sup.-] (M, d). M = [E.sub.ins] and M = [E.sup.del] are proved similarly.

Induction Step: Let the root of Q be a unary (resp., binary) relational operator op over a subtree S with incoming modifications [E.sup.in] (resp., over two subtrees S and S', of which S has incoming modifications [E.sup.in]). By the induction hypothesis, we assume the positive and negative modifications [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], obtained by the modifications in [E.sup.in], are such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where [S.sup.+] and [S.sup.-] are the positive and negative changes performed on the expression rooted in S as a result of executing the original modification M on d. We must show that if we apply the appropriate propagation rule for op to obtain [E.sup.out] from [E.sup.in], then [E.sup.+] (Q, M) and [E.sup.-] (Q, M) obtained from [E.sup.out] are such that, for all database states d, [E.sup.+] (Q, M)(d) [contains or equal to] [Q.sup.+] (M, d) and [E.sup.-] (Q, M)(d) [contains or equal to] [Q.sup.-] (M, d). As an example, let op be a selection [[Sigma].sub.p] performed over an arbitrary subtree S, and consider an update operation [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] associated with S and performed on an attribute in p. Applying our propagation rules from the second line of Table V, we obtain a triple [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], corresponding to tuples added to, deleted from, and updated in the result of Q = [[Sigma].sub.p]S. Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

The execution of M produces a new database state d' in which

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Hence, by Definition 1:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

After several algebraic manipulations, we obtain [E.sup.+] (Q, M)(d) [contains or equal to] [Q.sup.+] (M, d) and [E.sup.-] (Q, M)(d) [contains or equal to] [Q.sup.-] (M, d). The other propagation rules are verified similarly. []

The following theorem proves that the propagation algorithm is not accurate. The proof gives a counterexample: a query Q, modification M, and database state d such that [E.sup.-] (Q, M)(d) [contains] [Q.sup.-] (M, d).

THEOREM 2. The propagation algorithm is not accurate.

PROOF. Consider a database d containing a single relation R with attributes A and B, and let R(d) = {(1, 1), (-1, 1)}. By applying the propagation algorithm to query Q = [[Pi].sub.B]R and delete operation [E.sub.del] = [[Sigma].sub.A [is greater than] 0]R, we obtain the propagated modification [E'.sub.del] = [[Pi].sub.B][E.sub.del]. Before the execution of [E.sub.del], Q(d) = {(1)}. After [E.sub.del] is executed, in the new database state d', R(d') = {(-1, 1)} and Q(d')= {(1)}. Then, [Q.sup.+] ([E.sub.del], d) = [Q.sup.-]([E.sub.del], d) = 0. But [E.sup.-](Q, [E.sub.del])(d) = {(1)}. So [E.sup.-] (Q, [E.sub.del])(d) [contains] [Q.sup.-] ([E.sub.del], d). []

4. THE ACTIVE RULE MODEL

In their most general form, active database rules consist of a set of events, an optional condition and an action. An active rule is triggered (i.e., is eligible for evaluation), when any event specified in the rule's event set occurs. If the rule includes a condition, the condition must be true on the current database state for the rule's action to be executed.

Active rules following this paradigm are denoted event-condition-action (or ECA) rules. In the next section we describe the syntax and semantics of a general language for the specification of ECA rules, based on the algebra presented in Section 2. Several active database prototypes allow the specification of rules without triggering events, which are denoted condition-action (or CA) rules. In Section 4.2 we present a variation of our general model that allows the representation of CA rules. Finally, in Section 4.3 we describe the trigger language in the SQL:1999 standard [SQL3 1999; Kulkarni et al. 1998] and discuss how it maps to our ECA rule language.

4.1 Event-Condition-Action Rules

An event-condition-action rule in our language is defined as {T} : C [right arrow] A where:

--{T} denotes the set of triggering events;

--C states the rule's condition as an expression (query) in our extended relational algebra;

--A states the rule's action as a data modification operation expressed using [E.sub.ins], [E.sub.del], or [E.sub.upd] as in Table II.(5)

The possible triggering events correspond to data modifications: ins R (insert into R), del R (delete from R), and upd R.A (update R's attribute A).(6) A rule is triggered when any of its triggering events occurs. When a triggered rule is evaluated, its condition C is true if and only if C [is not equal to] 0 on the current database state. (The rule condition may be omitted, in which case it is always true.) This interpretation of rule conditions is used by several active database rule languages, for example, A-RDL, Chimera, Postgres, and Starburst, as well as commercial trigger systems [Widom and Ceri 1996] and triggers in the SQL:1999 standard [Kulkarni et al. 1998]. The action A is a normal data modification operation executed on the current database state.

Rule processing is invoked after a user or application has performed modifications on the database. In particular, we assume that the basic rule processing algorithm given in Figure 2 is invoked at specific rule processing points, which are determined by the system's rule processing granularity. Rule processing is an iterative loop in which a triggered rule is selected (Step (2) in Figure 2) and, if its condition is true (Step (3)), its action is executed (Step (4)).

Fig. 2. Rule processing algorithm for ECA rules.

(1) repeat until no rule is triggered: (2) select a triggered rule r; (3) if r's condition is true (4) execute r's action;

In our model, we allow the granularity of rule processing invocation with respect to user modifications [Widom and Ceri 1996] to be either tuple or statement. In the former case, a rule is executed once for each tuple affected by the triggering modification (instance-oriented execution), while in the latter it is executed once for all modified tuples (set-oriented execution). When actions perform database changes larger than the rule processing granularity (e.g., when the rule processing granularity is tuple and the action modifies several tuples), the algorithm described in Figure 2 is executed recursively. Different rule granularities may cause substantial differences in the final database state produced by a rule program execution. However, since we do not consider actual database states in our analysis techniques, it turns out that the rule processing granularity is not relevant in the context of rule analysis.

Event-condition coupling describes the transactional relationship between a rule's triggering event and the evaluation of its condition. Our model accommodates both immediate coupling, in which the rule condition is evaluated immediately after the triggering event takes place, and deferred coupling, in which the rule condition is evaluated before the transaction commits. Condition-action coupling describes the transactional relationship between the evaluation of a rule's condition and the execution of its action. In our model, we only consider immediate condition-action coupling. Except for detached coupling, which we do not consider here, coupling modes have no impact on our analysis techniques.

When a rule is triggered, triggering occurs with respect to a database transition -- the changes that have occurred since some previous database state.

Definition 4. A rule's transition denotes the changes to the database that have occurred since the rule was last selected, or since the state prior to the initial user modifications if the rule has not yet been selected during rule processing.

Definition 5. Let [C.sup.old] and C be the results of evaluating the rule's condition in the states before and after the rule's transition (where [C.sup.old] = 0 if the rule is evaluated for the first time during rule processing). The condition's transition [Delta]C is given by

[Delta]C = C - [C.sup.old].

Many ECA rule languages allow the condition and/or the action to refer explicitly to the database changes that caused the rule to become triggered.

Definition 6. A rule's delta relations (also called transition tables) encapsulate the changes that occurred during the rule's transition. For each relation R referenced in the triggering events, the following (self-explanatory) delta relations are available: inserted(R), deleted(R), old-updated(R), and new-updated(R).

The changes stored in a delta relation reflect the logical net effect [Hanson 1992; Simon et al. 1992; Widom and Finkelstein 1990] of the sequence of actual changes made to the database during the rule's transition. (For example, the insertion and subsequent update of the same tuple is considered as the insertion of the updated tuple.)

In this article we do not consider the effect of a conflict resolution policy for selecting among multiple rules with true conditions [Widom and Ceri 1996]. However, as an extension to our framework, it is possible to incorporate conflict resolution using rule priorities; see Section 9.

4.1.1 Examples of ECA Rules. Recall the relations described in Section 2.2. The following rules, which we use for examples throughout the remainder of this article, automatically enforce some of the bank's policies for managing customers and accounts. In the following we give both an informal description of the rules and their corresponding algebraic representations.

[r.sub.1]: Rule bad-account states that if an account has a balance less than 500 and an interest rate greater than 0%, then that account's interest rate is lowered to 0%. The rule is expressed in our language as {T}: C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [Delta] (ACCOUNT) = inserted(ACCOUNT) [union] new-updated(ACCOUNT).

[r.sub.2]: Rule raise-rate states that if an account has an interest rate greater than 1% but less than 2%, then that account's interest rate is raised to 2%. The rule is expressed in our language as {T}: C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [Delta](ACCOUNT) = inserted(ACCOUNT) [union] new-updated(ACCOUNT).

[r.sub.3]: Rule SF-bonus states that when the number of customers living in San Francisco exceeds 1000, then the interest rate of all San Francisco customers' accounts with a balance greater than 5000 and an interest rate less than 3% is increased by 1%. The rule is expressed in our language as {T} : C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.4]: Rule start-bad states that if an account has a balance less than 500 and is not yet recorded with a null end date in the LOW-ACC relation, then the account is inserted into the LOW-ACC relation with the current date as start date and a null end date. The rule is expressed in our language as {T} : C [right arrow] [E.sub.ins], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [Delta](ACCOUNT) = inserted(ACCOUNT) [union] new-updated(ACCOUNT), and today() is a system-defined function returning the current date.

[r.sub.5]: Rule end-bad states that if an account with a null end date in the LOW-ACC relation has a balance of at least 500 in the ACCOUNT relation, then its end date is set to the current date. The rule is expressed in our language as {T} : C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and [Delta](ACCOUNT) = new-updated(ACCOUNT).

[r.sub.6]: Rule decrease-bad states that if the total number of low days for an account (as recorded in the LOW-ACC relation) is greater than 50 and its current balance is between 500 and 1000, then its interest rate is set to 1% in the ACCOUNT relation. The rule is expressed in our language as {T} : C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

For simplicity, we suppose that attributes ACCOUNT.num and LOW-ACC.num are not updatable, so we omit the corresponding update events from T.

4.2 Condition-Action Rules

In condition-action (CA) rules, triggering events are omitted. A CA rule is triggered implicitly whenever data are modified so that new data satisfy the rule's condition. Thus, instead of triggering events needing to be defined by the rule programmer, the burden of detecting when a condition may become true is on the active system. For this reason, CA rules are generally considered more "declarative" than ECA rules. As we show in Section 5, CA rules permit more general analysis techniques as well. A condition-action rule in our language is defined as C [right arrow] A where

--C states the rule's condition as an expression in our extended relational algebra of Section 2;

--A states the rule's action as a data modification operation expressed using [E.sub.ins], [E.sub.del], or [E.sub.upd] as in Table II of Section 2.1.3.

When rule C [right arrow] A is evaluated, the condition C is true if and only if the condition's transition [Delta]C (recall Definition 5) is nonempty. That is, the condition is true whenever the query produces "new" tuples. This is identical to the interpretation of conditions in the CA rules of, for example, Ariel [Hanson 1992], RPL [Delcambre and Etheredge 1989], and set-oriented adaptations of OPS5 [Gordin and Pasik 1991]. We observe that this interpretation of CA rule conditions represents the most important difference in how rule conditions are evaluated in the ECA rule model described in the previous section.

The action A is a data modification operation executed on the current database state, identical to ECA rules. In some active database systems, for example, Gordin and Pasik [1991] and Hanson [1992], a rule's action implicitly operates only on the data selected by the condition, rather than on the entire database. We could use a similar rule model here, but it would complicate the syntax and semantics and has no bearing on our analysis methods; see Section 9 for further discussion.

Similar to ECA rules, rule processing is invoked after some set of user or application modifications to the database, according to the chosen rule processing granularity. The basic algorithm for rule processing is given in Figure 3. Analogous to ECA rules, we do not consider a conflict resolution policy for selecting among multiple triggered rules; see Section 9.

Fig. 3. Rule processing algorithm for CA rules.

(1) repeat until no rule has a true condition: (2) select a rule r with a true condition; (3) execute r's action

4.2.1 Examples of CA Rules. We now use our CA rule language to express the six rules described in Section 4.1.1.

[r.sub.1]: Rule bad-account is expressed in our CA rule language as C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.2]: Rule raise-rate is expressed in our CA rule language as C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.3]: Rule SF-bonus is expressed in our CA rule language as C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.4]: Rule start-bad is expressed in our CA rule language as C [right arrow] [E.sub.ins], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and today() is a system-defined function returning the current date.

[r.sub.5]: Rule end-bad is expressed in our CA rule language as C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.6]: Rule decrease-bad is expressed in our CA rule language as C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

4.3 SQL:1999 Triggers

The specification of triggers in SQL:1999 (formerly known as SQL3) has migrated significantly over the years (e.g., see the SQL3 draft descriptions in Widom and Ceri [1996] versus Kulkarni et al. [1998]). We base our description of SQL:1999 triggers on the specification of the trigger language presented in Kulkarni et al. [1998].

The syntax for the definition of an SQL:1999 trigger is given in Figure 4. A trigger can be triggered by a single event type: an insert, update, or delete operation on a single table. In the case of the update event, it is optional to specify a list of updated columns. The (optional) trigger condition can be an arbitrarily complex SQL predicate. If the condition is omitted, it is considered always true. The trigger action is a list of SQL statements within a BEGIN/END block. Any SQL:1999 statement, except those dealing with connections, sessions, and transactions, are allowed in the trigger action. Trigger actions may also be calls to stored procedures.

Fig. 4. SQL:1999 trigger definition syntax.

<trigger definition> ::= CREATE TRIGGER <trigger name> { BEFORE | AFTER } <trigger event> ON <table name> [ REFERENCING <old or new values alias list> ] [ FOR EACH {ROW | STATEMENT} ] [ <trigger condition ] <trigger action> <trigger event> ::= INSERT | DELETE | UPDATE [ OF <column name list> ] <old or new values alias> ::= {OLD | NEW} [AS] <identifier> | {OLD_TABLE | NEW_TABLE} [AS] <identifier>

Each trigger may be considered for execution before or after the triggering operation is actually executed. Each trigger can include a processing granularity: a statement-level trigger executes once for each triggering operation (set-oriented execution), while a row-level trigger executes once for each row affected by the triggering operation (instance-oriented execution). Both the trigger condition and the trigger action can access the current database state. In addition, the old and new values of the modified rows are available by referencing transition tables OLD_TABLE and NEW_TABLE. For row-level triggers, the OLD and NEW correlation variables defined on the corresponding transition tables are also available.

The execution of trigger actions may trigger other triggers (including the same trigger). Earlier versions of the SQL3 draft specified compile-time syntactic restrictions on trigger definition to guarantee termination of trigger execution [Widom and Ceri 1996]. However, these restrictions were found to limit the expressive power of the trigger language excessively, so they have now been removed. Consequently, it is the responsibility of the trigger programmer to prevent infinite trigger execution.

Since more than one trigger with the same triggering event on the same table can be defined, it is possible for more than one trigger to be eligible for execution at the same time. SQL:1999 enforces an ordering based on creation time of triggers.

4.3.1 Examples of SQL:1999 Triggers. We now use the SQL:1999 trigger language to express the six rules described in Section 4.1.1.

[r.sub.1]: Rule bad-account corresponds to two SQL:1999 triggers, bad-account-ins defined for the insert event and bad-account-upd defined for the update events.

create trigger bad-account-ins after insert on ACCOUNT referencing new as NA when (NA.balance<500 and NA.rate>0) for each row begin update ACCOUNT set rate=0 where balance<500 and rate>0; end create trigger bad-account-upd after update of balance, rate on ACCOUNT referencing new as NA when (NA.balance<500 and NA.rate>0) for each row begin update ACCOUNT set rate=0 where balance<500 and rate>0; end

[r.sub.2]: Rule raise-rate corresponds to two SQL:1999 triggers, raise-rate-ins defined for the insert event and raise-rate-upd defined for the update event.

create trigger raise-rate-ins after insert on ACCOUNT referencing new as NA when (NA.rate>1 and NA.rate<2) for each row begin update ACCOUNT set rate=2 where rate>1 and rate<2; end create trigger raise-rate-upd after update of rate on ACCOUNT referencing new as NA when (NA.rate>1 and NA.rate<2) for each row begin update ACCOUNT set rate=2 where rate>1 and rate<2; end

[r.sub.3]: Rule SF-bonus corresponds to two SQL:1999 triggers, SF-bonus-ins defined for the insert event and SF-bonus-upd defined for the update event.

create trigger SF-bonus-ins after insert on CUSTOMER when (1000<select count(name) from CUSTOMER where city=`SF') for each statement begin update ACCOUNT set rate=rate+1 where balance>5000 and rate<3 and name in (select name from CUSTOMER where city=`SF'); end create trigger SF-bonus-upd after update of city on CUSTOMER when (1000<select count(name) from CUSTOMER where city=`SF') for each statement begin update ACCOUNT set rate=rate+1 where balance>5000 and rate<3 and name in (select name from CUSTOMER where city=`SF'); end

[r.sub.4]: Rule start-bad corresponds to two SQL:1999 triggers, start-bad-ins defined for the insert event and start-bad-upd defined for the update event.

create trigger start-bad-ins after insert on ACCOUNT referencing new as NA when (NA.balance<500 and not exists (select * from LOW-ACC where num=NA.num and end is NULL)) for each row begin insert into LOW-ACC(num, start,end) (select num, today() ,NULL from ACCOUNT A where balance<500 and not exists (select * from LOW-ACC where num=A.num and end is NULL)); end create trigger start-bad-upd after update of balance on ACCOUNT referencing new as NA when (NA.balance<500 and not exists (select * from LOW-ACC where num=NA.num and end is NULL)) for each row begin insert into LOW-ACC(num, start, end) (select num, today() ,NULL from ACCOUNT A where balance<500 and not exists (select * from LOW-ACC where num=A.num and end is NULL)); end

[r.sub.5]: Rule end-bad corresponds to the following SQL:1999 trigger.

create trigger end-bad after update of balance on ACCOUNT referencing new as NA when (NA.balance>=500 and exists (select * from LOW-ACC where num=NA.num and end is NULL)) for each row begin update LOW-ACC set end=today() where end is NULL and num IN (select num from ACCOUNT where balance>=500); end

[r.sub.6]: Rule decrease-bad corresponds to four SQL:1999 triggers, decrease-bad-ins-acc and decrease-bad-ins-low defined for the insert events on the two tables ACCOUNT and LOW-ACC, respectively, and decrease-bad-upd-acc and decrease-bad-upd-low defined for the update events.

create trigger decrease-bad-ins-acc after insert on ACCOUNT when (exists (select * from ACCOUNT where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))) for each statement begin update ACCOUNT set rate=1 where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))); end create trigger decrease-bad-upd-acc after update of balance, rate on ACCOUNT when (exists (select * from ACCOUNT where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))) for each statement begin update ACCOUNT set rate=1 where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))); end create trigger decrease-bad-ins-low after insert on LOW-ACC when (exists (select * from ACCOUNT where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))) for each statement begin update ACCOUNT set rate=1 where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))); end create trigger decrease-bad-upd-low after update of start, end on ACCOUNT when (exists (select * from ACCOUNT where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))) for each statement begin update ACCOUNT set rate=1 where rate>1 and balance>=500 and balance<=1000 and num IN (select num from LOW-ACC group by num having sum(end-start)>50))); end

4.3.2 Mapping SQL:1999 Triggers to ECA Rules. The algebraic language presented in Section 2 can express complex SQL data manipulation (DML) statements. SQL:1999 triggers, whose actions contain a sequence of SQL/DML statements, can usually be represented by our ECA rule language in a straightforward way, as shown below. However, when the trigger's action includes stored procedure calls or other SQL/PSM procedural constructs, the trigger cannot be translated directly to our algebraic language.(7)

Note that our algebraic ECA language does not incorporate the notion of before-triggers as specified in SQL:1999. Before-triggers introduce certain subtleties in semantics and behavior, and therefore are often restricted in practice. The incorporation of general before-triggers into our framework is a subject for future work.

Consider the examples of SQL:1999 triggers in Section 4.3.1. All of the triggers can be represented in our ECA rule language as follows:

[r.sub.1]: The two triggers bad-account-ins ([r.sub.1a]) and bad-account-upd ([r.sub.1b]) are expressed in our language as {[T.sub.1a]} : [C.sub.1a] [right arrow] [E.sub.upd] and {[T.sub.1b]}: [C.sub.1b] [right arrow] [E.sub.upd], respectively, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.2]: The two triggers raise-rate-ins ([r.sub.2a]) and raise-rate-upd ([r.sub.2b]) are expressed in our language as {[T.sub.2a]}: [C.sub.2a] [right arrow] [E.sub.upd] and {[T.sub.2b]}: [C.sub.2b] [right arrow] [E.sub.upd], respectively, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.3]: The two triggers SF-bonus-ins ([r.sub.3a]) and SF-bonus-upd ([r.sub.3b]) are expressed in our language as {[T.sub.3a]} : C [right arrow] [E.sub.upd] and {[T.sub.3b]} : C [right arrow] [E.sub.upd], respectively, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.4]: The two triggers start-bad-ins ([r.sub.4a]) and start-bad-upd ([r.sub.4b]) are expressed in our language as {[T.sub.4a]} : [C.sub.4a] [right arrow] [E.sub.ins] and {[T.sub.4b]} : [C.sub.4b] [right arrow] [E.sub.ins], respectively, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.5]: Trigger end-bad is expressed in our language as {T} : C [right arrow] [E.sub.upd], where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[r.sub.6]: The four triggers decrease-bad-ins-acc ([r.sub.6a]) , decrease-bad-ins-low ([r.sub.6b]), decrease-bad-upd-acc ([r.sub.6c]), and decrease-bad-upd-low ([r.sub.6d]) are expressed in our language as {[T.sub.6a]} : C [right arrow] [E.sub.upd], {[T.sub.6b]} : C [right arrow] [E.sub.upd], {[T.sub.6c]} : C [right arrow] [E.sub.upd], and {[T.sub.6d]} : C [right arrow] [E.sub.upd], respectively, where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

5. TERMINATION ANALYSIS

Termination for a rule set is guaranteed if rule processing always reaches a state in which no rule is eligible for execution. In the case of CA rules, termination is guaranteed if rule processing always reaches a state in which no rule has a true condition. Notice that, according to the semantics in Section 4.2, after the first execution of each rule r, r's condition is true again if and only if new data satisfy its condition. Hence, informally, rule processing does not terminate if and only if rules provide new data to each other indefinitely. This observation leads to our new technique for termination analysis presented in Section 5.1, which is based on detecting when a rule's action may cause another rule's condition to become true.

By contrast, ECA rules are characterized by the explicit definition of triggering events and by the evaluation of the rule's condition on the current database state (recall Section 4.1), which can yield substantially different rule behavior. Consider, for example, the CA rule [r.sub.3] defined in Section 4.2.1. After the rule has been considered once, its condition cannot become true again until either a new customer living in San Francisco is inserted into the database, or an existing customer moves to San Francisco. Consider now its ECA version defined in Section 4.1.1. The rule is triggered when any new customer is inserted or any existing customer moves to a different city. If its condition is true on the current database state (e.g., if more than 1000 customers already live in San Francisco), then its action is executed. Hence, although [r.sub.3]'s condition is syntactically identical in both rule versions, the ECA version of [r.sub.3] may be executed when its CA version would not. (On the contrary, when the CA version of [r.sub.3] is executed, the ECA version is always executed as well.) Hence, it is not possible to apply the analysis techniques for CA rules directly to ECA rules. However, it is possible to identify a common class of ECA rules, which we call quasi-CA rules, to which the analysis techniques for CA rules can be applied.

Quasi-CA rules are characterized in Section 5.2, where we also determine whether each rule in the ECA rule set presented in Section 4.1.1 is quasi-CA. Then, a general technique for analyzing termination of ECA rules is presented. This technique exploits our CA analysis technique for quasi-CA rules, and degenerates to a weaker analysis technique similar to Aiken et al. [1995] otherwise.

5.1 Termination Analysis for CA Rules

Termination analysis for CA rules is based on the notion of rule activation. We say that a rule [r.sub.i] may activate a rule [r.sub.j] if executing [r.sub.i]'s action may cause new data to satisfy [r.sub.j]'s condition. More precisely:

Definition 7. Consider two CA rules [r.sub.i]: [C.sub.i] [right arrow] [A.sub.i] and [r.sub.j]: [C.sub.j] [right arrow] [A.sub.j]. [r.sub.i] may activate [r.sub.j] if the execution of action [A.sub.i] can change the database from a state in which [Delta][C.sub.j] = 0 to a state in which [Delta][C.sub.j]. [is not equal to] 0.

We analyze termination by building an activation graph. In the graph, nodes represent rules, and directed edges indicate that one rule may activate the other. If there are no cycles in the graph, then rule processing is guaranteed to terminate [Baralis et al. 1993]. Hence, the core of termination analysis is determining when an edge should be included in the graph, that is, when one rule may activate another rule. The more accurately we can make this decision, the more accurately we can analyze termination.

We use our propagation algorithm to decide when an edge [r.sub.i] [right arrow] [r.sub.j] belongs in the activation graph. Note that rules may activate themselves, so [r.sub.i] = [r.sub.j] is included in the analysis. To determine if [r.sub.i] may activate [r.sub.j], we apply the propagation algorithm to [r.sub.j]'s condition C and [r.sub.i]'s action A. If the propagation algorithm yields insert or update operations, then the execution of [r.sub.i] may result in new data satisfying [r.sub.j]'s condition. Thus, [r.sub.i] may activate [r.sub.j], and the edge [r.sub.i] [right arrow] [r.sub.j] belongs in the graph. If only delete operations or no operations are produced by the propagation algorithm, then the execution of [r.sub.i] cannot result in new data for [r.sub.j]'s condition, and the edge is not included in the graph.

5.1.1 Examples. In this section we describe in detail the analysis of two rule pairs from the rule set in Section 4.2.1. We defer analysis of the complete rule set to Section 5.3.1.

Example 5. Consider rules bad-account ([r.sub.i]) and raise-rate ([r.sub.2]). Both rule conditions reference attribute rate and both rule actions update rate. Hence, intuitively, the two rules might activate each other indefinitely. We have shown in Example 4 that [r.sub.2]'s action may provide data to [r.sub.1]'s condition (since insert and update operations are produced by the propagation algorithm), thus the edge [r.sub.2] [right arrow] [r.sub.1] belongs in the activation graph. Now we use the propagation algorithm to determine if [r.sub.1] may activate [r.sub.2]. The input to the algorithm is

C = [[Pi].sub.rate] ([[Sigma].sub.rate [is greater than] 1 [conjunction] rate [is less than] 2]ACCOUNT) A = [E.sub.upd] = E[rate' = 0] [[Sigma].sub.balance [is less than] 500 [conjunction] rate [is greater than] 0]ACCOUNT).

The propagation of [E.sub.upd] through the selection operation in C yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since predicates rate' [is greater than] 1 and (assignment) rate' = 0 are contradictory,(8) expressions [E'.sub.ins] and [E'.sub.upd] are not satisfiable and hence are discarded. The propagation of [E'.sub.del] through the projection operation in C yields

[E".sub.del] = [[Pi].sub.rate][E'.sub.del],

which is satisfiable. Thus, [r.sub.1]'s action may result in a deletion of tuples from [r.sub.2]'s condition. However, since neither an insert nor an update action is produced, [r.sub.1] cannot activate [r.sub.2], and the edge [r.sub.1] [right arrow] [r.sub.2] is not included in the activation graph. Hence, our analysis correctly concludes that [r.sub.1] and [r.sub.2] cannot activate each other indefinitely.

Example 6. Consider rules start-bad ([r.sub.4]) and end-bad ([r.sub.5]). Here again, the two rules might activate each other indefinitely. We use the propagation algorithm to determine if [r.sub.4] may activate [r.sub.5]. The input to the algorithm is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The propagation of [E.sub.ins] through the semijoin operation in C yields:

[E'.sub.ins] = ([Epsilon][start = today()]E[end = null]([[Pi].sub.num](low-bal [bar]?? ([[Sigma].sub.end=null]LOW-ACC)))) ?? high-bal

where low-bal is an abbreviation for [[Sigma].sub.balance [is less than] 500]ACCOUNT and high-bal is an abbreviation for [[Sigma].sub.balance [is greater than or equal to] 500]ACCOUNT. This expression is not satisfiable, since it requires a tuple with a given num value to satisfy both predicates balance [is less than] 500 and balance [is greater than or equal to] 500. Hence, [r.sub.4] cannot activate [r.sub.5], and the edge [r.sub.4] [right arrow] [r.sub.5] is not included in the activation graph. We conclude that [r.sub.4] and [r.sub.5] cannot activate each other indefinitely.

5.2 Quasi-CA Rules

Recall from Section 4.2 that a CA rule is activated when its condition becomes true as a result of modifications occurring during the rule's transition. We say that an ECA rule is activated when (a) it is triggered by an event produced during the rule's transition, and (b) its condition is true on the current database state. In order to apply to ECA rules the analysis technique for CA rules described in Section 5.1, ECA rules must behave identically to CA rules with respect to rule activation. We identify a class of ECA rules, which we call quasi-CA rules, that, despite having ECA execution semantics, are activated identically to CA rules. We feel that many (if not most) ECA rules are quasi-CA in practice, justified by active database case studies in Ceri and Widom [1990] and Baralis et al. [1995a].

Definition 8. An ECA rule r is quasi-CA if the following conditions hold for all possible rule transitions and the associated database states.

(1) If [Delta]C [is not equal to] 0, then r is triggered by the transition. (That is, rule r is always triggered when a corresponding CA rule would be activated.)

(2) If r is triggered by the transition and C [is no equal to] 0, then [Delta]C [is not equal to] 0. (That is, if rule r is triggered and its condition C is true on the current state, then a corresponding CA rule's condition would be true on the transition.)

We now identify classes of ECA rules that are guaranteed to be quasi-CA. We consider separately rules that do or do not reference delta relations in their condition. First consider an ECA rule r: {T}: C [right arrow] A that does not reference delta relations. For (1) to hold, the set of triggering events in r must include all modifications that may cause condition C to become true, that is, that may cause [Delta]C to be nonempty. We call these events the relevant events e(C) of condition C. Given a condition C, it is possible to derive the set e(C) from C; a technique to do so is described, for example, in Ceri and Widom [1990]. To guarantee (2), C must be nonempty only when it produces new tuples with respect to its previous evaluation during rule processing. Recall from Section 4.1 that C is evaluated on the current database state, independent of the previous (pretransition) state. Thus, it is not guaranteed that when r is triggered, [Delta]C [is not equal to] 0.(9) For (2) to hold, we must ensure that all data selected by C are modified or deleted by the transition. This would guarantee that if C [is not equal to] 0, then C is satisfied by new data, that is, [Delta]C [is not equal to] 0. The only rule we can be sure is executed during the transition is r. Hence, (2) holds if r's action A modifies or deletes all data selected by C.

To verify that A modifies or deletes all data selected by C, we apply the propagation algorithm to C and A: if the negative modifications [E.sup.-] (recall Section 3.1) produced by the propagation algorithm contain C (again, using the standard notion of query containment [Ullman 1989]), then we are guaranteed that all data in the result of C's query are deleted or modified by A. Recall from Section 3.5 that the propagation algorithm applied to a query Q and a modification M can produce a superset of the modifications on the result of Q due to the execution of M. However, the following theorem proves that if [E.sup.-] [contains or equal to] Q, then all data in the result of query Q are actually deleted or modified by M. This says that the "conservativeness" of the algorithm is not a problem when verifying (2) above.

THEOREM 3. Let M be a modification on a relation R, Q a query, and [E.sup.-] the negative modifications obtained by applying the propagation algorithm to Q and M. For all database states d, if M(d) [subset or equal to] R(d) (more precisely, if M = [E.sub.del], then [E.sub.del](d) [subset or equal to] R(d), while if M = [E.sub.upd], then [[Rho].sub.old]([E.sub.upd])(d) [subset or equal to] R(d)) and [E.sup.-](d) [contains or equal to] Q(d), then [E.sup.-](d) = Q(d).

PROOF. The proof is given by induction on the structure of Q.

Base Case: Q = R. Let M = [E.sub.del]. Then, [E.sup.-] = [E.sub.del]. If [E.sup.-](d) [contains or equal to] Q(d), then [E.sub.del](d) [contains or equal to] Q(d). By hypothesis, [E.sub.del](d) [subset or equal to] R(d). Thus [E.sub.del](d) = R(d). The case of M = [E.sub.upd] is proved analogously.

Induction Step: Let the root of Q be a unary (resp., binary) relational operator op over a subtree S with incoming negative modification [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (resp., over two subtrees S and S', of which S has incoming negative modifications [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). By the induction hypothesis, we assume that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We must show that if [E.sup.-](d) [contains or equal to] Q(d), then [E.sup.-](d) = Q(d). As an example, consider Q = [[Sigma].sub.p]S. Then, Q(d) = [[Sigma].sub.p](S(d)). Applying the propagation rule from the first line in Table IV, we obtain [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Hence, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. By the induction hypothesis, [E.sup.-](d) = [[Sigma].sub.p](S(d)) = Q(d). The other relational operators are verified similarly. []

The following theorem formally states when an ECA rule that does not reference delta relations is guaranteed to be quasi-CA.

THEOREM 4. Consider an ECA rule r : {T} : C [right arrow] A that does not reference delta relations. Let e(C) be C's relevant events, and let the application of the propagation algorithm to C and A produce negative modifications [E.sup.-]. If (a) e(C) [subset or equal to] T, and (b) C [subset or equal to] [E.sup.-], then rule r is quasi-CA.

PROOF. We must prove that when (a) and (b) hold, then both (1) and (2) in Definition 8 hold as well. Consider an arbitrary pair of database states and the transition between them. When [Delta]C [is not equal to] 0, by definition an event in e(C) is generated. By (a) above, e(C) [subset or equal to] T, hence r is always triggered. This guarantees (1). Consider (2). If r has not yet been executed, [C.sup.old] = 0. Hence, if r is triggered and C [is not equal to] 0, then [Delta]C [is not equal to] 0. If r has been executed, let r's transition be represented as [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], where d and d' are the database states immediately before and after r's previous execution, d" is the state on which r is evaluated, [t.sub.1] is the transition due to the execution of r's action A, and [t.sub.2] contains all database changes leading from d' to d". Then, by (b), all data selected by C were either deleted or updated by A in transition [t.sub.1]. Let C' be the result of evaluating r's condition on d'. Either (i) all data in [C.sup.old] have been deleted and C' = 0, or (ii) the changed data do not satisfy r's condition, so C' = 0, or (iii) the changed data still satisfy r's condition, and C' [is not equal to] 0, but also [Delta]C' = C'- [C.sup.old] [is not equal to] 0, since the data satisfying [C.sup.old] have been modified. In Cases (i) and (ii), if r is triggered and C [is not equal to] 0, then C is satisfied by tuples produced by [t.sup.2], and [Delta]C [is not equal to] 0. In Case (iii), by Condition (1) in Definition 8 (proved above), r must be triggered by A and Point (2) in Definition 8 is guaranteed. []

We now discuss when ECA rules containing references to delta relations are quasi-CA. We say that a condition C is incremental if it references delta relations and the following conditions hold (a) If C contains union operators, then all operands of unions are delta relations. (b) If C contains antisemijoin operators, then the first operand is a delta relation. These restrictions guarantee that, after C's evaluation, since its delta relations effectively become empty, C = 0. Let the set of relevant delta events [e.sub.[Delta]](C) for condition C be all events corresponding to delta relations that appear "positively" in C, that is, that do not occur within the second operand of an antisemijoin. If C is incremental, then C [is not equal to] 0 only if some delta relation positive in C is nonempty. Thus, to guarantee Point (1) in Definition 8, the set of triggering events in r must include [e.sub.[Delta]](C). If Condition C is incremental, then after C's evaluation C = 0; furthermore, only changes occurring after C is evaluated may cause C [is not equal to] 0 again. Thus (2) holds.

The following theorem states formally when an ECA rule that references delta relations is guaranteed to be quasi-CA.

THEOREM 5. Consider an ECA rule r: {T}: C [right arrow] A that references delta relations in its condition C. Let [e.sub.[Delta]](C) be C's relevant delta events. If (a) C is incremental, and (b) [e.sub.[Delta]](C) [subset or equal to] T, then rule r is quasi-CA.

PROOF. We must prove that when (a) and (b) hold, then both (1) and (2) in Definition 8 hold as well. After C is evaluated, all delta relations referenced by it effectively become empty. Since C is incremental, [Delta]C [is not equal to] 0 only if some tuple is produced in a delta relation positive in C. By (b) above, [e.sub.[Delta]](C) [subset or equal to] T, hence r is triggered. This guarantees (1) in Definition 8. Since C is incremental, Point (2) is also guaranteed: since [C.sup.old] = 0 at the beginning of the transition, when C [is not equal to] 0 we have [Delta]C [is not equal to] 0. []

5.2.1 Examples of Quasi-CA Rules. We now determine which of the ECA rules presented in Section 4.1.1 are quasi-CA.

[r.sub.1]: Rule bad-account is quasi-CA, since C is incremental and T = [e.sub.[Delta]](C).

[r.sub.2]: Rule raise-rate is quasi-CA, since C is incremental and T = [e.sub.[Delta]](C).

[r.sub.3]: Rule SF-bonus is not quasi-CA. Although T = e(C), action [E.sub.upd] only modifies data in relation ACCOUNT, which is not referenced in C. (Thus, the result of the propagation algorithm on C and [E.sub.upd] is trivially empty.)

[r.sub.4]: Rule start-bad is quasi-CA, since C is incremental and T = [e.sub.[Delta]](C).

[r.sub.5]: Rule end-bad is quasi-CA, since C is incremental and T = [e.sub.[Delta]](C).

[r.sub.6]: Rule decrease-bad is quasi-CA, since (1) C does not reference delta relations, (2) T = e(C),(10) and (3) [E.sub.upd] propagated on C produces an action [E'.sub.del] that contains C.

5.3 Termination Analysis for ECA Rules

Identical to CA rules, a quasi-CA rule is activated by changes to the database during the rule's transition. Hence, the propagation algorithm can be applied to verify activation of quasi-CA rules exactly as it was applied for CA rules (recall Section 5.1). For ECA rules that are not quasi-CA, the rule's condition may be true independent of the modification that caused the rule to become triggered. Hence, when verifying rule activation for these rules, it is necessary to conservatively assume that when a rule is triggered its condition is true. Thus, the activation of ECA rules in the general case can be verified only by considering triggering events and types of modifications. We call this analysis technique, which is described in Aiken et al. [1995] and Ceri and Widom [1990], event-action analysis. Our analysis approach--when applicable--is far less conservative than event-action analysis, since it exploits the algebraic structure of conditions and actions to accurately determine when edges belong in the graph. Hence, we propose a "mixed" termination analysis technique that exploits the stronger method provided by the propagation algorithm for quasi-CA rules and degenerates to the weaker event-action technique for ECA rules that are not quasi-CA.

Assume we have determined which rules in the rule set are quasi-CA. The activation graph is built similarly to Section 5.1, by considering each rule pair in the rule set and analyzing if the first rule activates the second rule. Consider an arbitrary potential edge [r.sub.i] [right arrow] [r.sub.j].

(1) If [r.sub.j] is quasi-CA, then the propagation algorithm can be used to detect activation, as justified in Section 5.2. If the propagation algorithm applied to [r.sub.i]'s action and [r.sub.j]'s condition returns insert or update actions, the edge is included in the activation graph.

(2) If [r.sub.j] is not quasi-CA, the edge is included in the activation graph if [r.sub.i] performs a modification type included in [r.sub.j]'s set of triggering events.

Note that in Case (1) we may need to apply the propagation algorithm to expressions containing delta relations. Incorporating the actual semantics of delta relations into the propagation algorithm appears to be very complex and not particularly effective. Thus, before applying the propagation algorithm, we substitute all references to delta relations by references to the corresponding relations. Because our analysis is performed at compile-time and does not consider actual database states, this substitution, although conservative, allows us to perform a correct analysis of quasi-CA rules.

5.3.1 Examples. Consider the rule set in Section 4.1.1. In Section 5.2.1, we have shown that rules [r.sub.1], [r.sub.2], [r.sub.4], [r.sub.5], and [r.sub.6] are quasi-CA, while rule [r.sub.3] is not. However, since rule [r.sub.3] has no incoming edges, the entire activation graph can be built using our analysis technique. We thus obtain the activation graph shown in Figure 5(a). By contrast, Figure 5(b) shows the corresponding triggering graph that would have been obtained by applying the event-action analysis technique. Observe that the activation graph generated with our technique does not contain any cycles; we correctly determine that the rules cannot activate each other indefinitely. However, in Figure 5(b) there are numerous cycles. Thus, by applying our technique, it is possible to conclude that rule execution terminates for this rule set, while the technique in Aiken et al. [1995] is unable to determine the same property.

[Figure 5 ILLUSTRATION OMITTED]

The same result presented in Figure 5(a) would have been obtained by applying our analysis technique to the complete CA rule set presented in Section 4.2.1.

6. CONFLUENCE ANALYSIS

At each iteration of the rule processing loop, there may be multiple rules eligible for execution: in the case of ECA rules more than one rule may be triggered, while in the case of CA rules more than one rule may have a true condition. A rule set is confluent if the final state of the database does not depend on which eligible rule is chosen for execution at any iteration.

To formally describe confluence and confluence analysis, we introduce the notion of a rule execution state and a rule execution sequence. Let R be the set of rules under consideration.

Definition 9. A rule execution state S is a pair (d, [R.sub.e]), where d is a database state and [R.sub.e] [subset or equal to] R is a set of eligible rules.

Definition 10. A rule execution sequence is a sequence [Sigma] of rule execution states linked by (executed) rules.

Definition 11. A rule execution sequence is complete if the last state is (d, 0); that is, the last state has no eligible rules.

Definition 12. A rule execution sequence is valid if it represents a correct execution sequence: only eligible rules are executed, and pairs of adjacent states properly represent the effect of executing the corresponding rule.

We now define confluence in terms of execution sequences.

Definition 13. A rule set is confluent if, for every initial rule execution state S, every valid and complete rule execution sequence beginning with S has the same final state.

We cannot use this definition directly to analyze confluence, since it requires the exhaustive verification of all possible execution sequences for all possible initial states. We give sufficient conditions for confluence based on the commutativity of rule pairs. Two rules [r.sub.i] and [r.sub.j] commute if, starting with any rule execution state S, executing [r.sub.i] followed by [r.sub.j] produces the same rule execution state as executing [r.sub.j] followed by [r.sub.i]. We formalize the concepts of rule "deactivation" and commutativity of rule actions, then we give conditions for rule commutativity.

We say that a rule [r.sub.i] may deactivate a rule [r.sub.j] when [r.sub.i]'s action may delete all data that satisfies [r.sub.j]'s condition.

Definition 14. Consider two CA rules [r.sub.i] : [C.sub.i] [right arrow] [A.sub.i] and [r.sub.j] : [C.sub.j] [right arrow] [A.sub.j]. [r.sub.i] may deactivate [r.sub.j] if the execution of action [A.sub.i] can change the database from a state in which [Delta][C.sub.j] [is not equal to] 0 to a state in which [Delta][C.sub.j] = 0. Now consider two ECA rules [r'.sub.i] : {[T.sub.i]} : [C.sub.i] [right arrow] [A.sub.i] and [r'.sub.j] : {[T.sub.j]} : [C.sub.j] [right arrow] [A.sub.j]. [r'.sub.i] may deactivate [r'.sub.j] if the execution of action [A.sub.i] can change the database from a state in which [C.sub.j] [is not equal to] 0 to a state in which [C.sub.j] = 0.

We also define when two rule actions commute.

Definition 15. Let [A.sub.i] and [A.sub.j] be two data modification operations (i.e., rule actions). [A.sub.i] and [A.sub.j] commute iff, for all database states, the execution of [A.sub.i] followed by [A.sub.j] and the execution of [A.sub.j] on followed by [A.sub.i] produce the same final database state.

We state sufficient conditions for rule commutativity in the following Lemma, whose proof is straightforward and only sketched here. For a complete proof, see Baralis [1994].

LEMMA 1. Distinct rules [r.sub.i] and [r.sub.j] commute if (1) [r.sub.i] cannot activate [r.sub.j]; (2) [r.sub.i] cannot deactivate [r.sub.j]; (3) Conditions (1) and (2) with i and j reversed; (4) [r.sub.i]'s action and [r.sub.j]'s action commute.

PROOF. (Sketch) Let [S.sub.1] = ([d.sub.1], [R.sub.e1]) be an arbitrary rule execution state in which rules [r.sub.i], [r.sub.j] [element of] [R.sub.e1]. We must prove that if Conditions (1) through (4) hold, then rule execution sequences [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] yield the same final execution state; that is, [S.sub.2] = [[bar]S.sub.2]. Let [S.sub.2] = ([d.sub.2], [R.sub.e2]) and [[bar]S.sub.2] = ([[bar]d.sub.2], [[bar]R.sub.e2]). [S.sub.2] = [[bar]S.sub.2] if [R.sub.e2] = [[bar]R.sub.e2] and [d.sub.2] = [[bar]d.sub.2]. [R.sub.e2] = [[bar]R.sub.e2] is guaranteed if (a) execution of rule [r.sub.i] does not reactivate [r.sub.j] after [r.sub.j] has already executed (guaranteed by Condition (1)); (b) execution of rule [r.sub.i] does not prevent [r.sub.j] from executing (guaranteed by Condition (2)); (c) same reversing [r.sub.i] and [r.sub.j] (guaranteed by Condition (3)); and (d) the set of rules activated and deactivated by the execution of [r.sub.i] and [r.sub.j] does not depend on their execution order (Condition (4) guarantees this by guaranteeing that the rule actions commute). Finally, [d.sub.2] = [[bar]d.sub.2] is guaranteed by Condition (4), which states that the order in which the two rules' actions are executed does not affect the final database state. []

Note that even though Conditions (1) through (4) are not necessarily satisfied when [r.sub.i] = [r.sub.j], it is the case that a rule always commutes with itself.

We now prove two lemmas, followed by the main theorem on confluence. The first lemma states, under the assumption of commutative rules, that two execution sequences with the same initial state and executed rules have the same final state. The second lemma states, again under the assumption of commutativity, that two sequences with the same initial state must have the same executed rules.

LEMMA 2. Let all pairs of rules in R commute. Let [[Sigma].sub.1] and [[Sigma].sub.2] be two valid and complete rule execution sequences with the same initial state, such that the same rules are executed in [[Sigma].sub.1] and [[Sigma].sub.2], although not necessarily in the same order. Then [[Sigma].sub.1] and [[Sigma].sub.2] have the same final state.

PROOF. Since [[Sigma].sub.1] and [[Sigma].sub.2] have the same executed rules, we can "permute" [[Sigma].sub.1] so that its rules are considered in the same order as [[Sigma].sub.2]. We exchange adjacent rules in [[Sigma].sub.1] one pair at a time; with each exchange of a pair [r.sub.i], [r.sub.j], there is no change to the outer two execution states (the states immediately preceding the execution of [r.sub.i] and following the execution of [r.sub.j]) due to commutativity. Hence, since [[Sigma].sub.1] and [[Sigma].sub.2] have the same initial state, they must have the same final state. []

LEMMA 3. Let all pairs of rules in R commute. Let [[Sigma].sub.1] and [[Sigma].sub.2] be two valid and complete rule execution sequences with the same initial state. Then the same rules are executed in [[Sigma].sub.1] and [[Sigma].sub.2].

PROOF. We again use commutativity to permute sequences without affecting outer execution states. In each sequence, we exchange rules one pair at a time until the rules appear in "sorted" order according to some criterion (the criterion is irrelevant as long as the same criterion is used for [[Sigma].sub.1] and [[Sigma].sub.2]). Suppose, for the sake of contradiction, that [[Sigma].sub.1] and [[Sigma].sub.2] have different executed rules, and consider the first point of divergence, that is, where a rule r appears in [[Sigma].sub.1] but a different rule r' appears in [[Sigma].sub.2]. Let S be the execution state preceding these rules; S is the same in [[Sigma].sub.1] and [[Sigma].sub.2], so r and r' are both activated in S. Without loss of generality, assume that r precedes r' in the sorted order. Then r cannot appear in [[Sigma].sub.2] beyond S. Consequently, execution of some rule other than r in [[Sigma].sub.2] must deactivate r. But this contradicts Condition (1) on commutativity. []

Based on these lemmas, the following theorem presents a sufficient condition to guarantee confluence of a rule set.

THEOREM 6. A rule set R is confluent if all pairs of rules in R commute.

PROOF. Suppose all rule pairs commute, and consider two valid and complete execution sequences [[Sigma].sub.1] and [[Sigma].sub.2] with the same initial state. By Lemma 6, [[Sigma].sub.1] and [[Sigma].sub.2] have the same set of executed rules. Then, by Lemma 6, [[Sigma].sub.1] and [[Sigma].sub.2] have the same final state. []

The requirement for confluence in Theorem 6 may seem rather strong, but there is no way to weaken this requirement without a more sophisticated conflict resolution policy or priorities among rules. We believe this argues for the importance of rule priorities. Notice also that, in the case where no rule can activate itself, the confluence requirement as stated in Theorem 6 trivially implies termination, since the pairwise commutativity of all rules includes the requirement that no rule activates another rule. However, if one or more rules can activate themselves, then confluence does not imply termination.

Commutativity of rule pairs forms the basis of most methods for analyzing confluence of database rules; see, for example, Aiken et al. [1995] and van der Voort and Siebes [1993]. The remainder of this section describes our technique for determining commutativity of rule pairs: CA rules are considered in Section 6.1 and ECA rules in Section 6.2. Needless to say, we use our propagation algorithm to analyze commutativity, exploiting the algebraic description of rule conditions and actions to yield a much more accurate analysis technique than, for example, Aiken [1995].

6.1 Commutativity Analysis for CA Rules

To guarantee commutativity of two rules [r.sub.i] and [r.sub.j], we must verify Conditions (1) through (4) in Lemma 1. For (1), we determine that [r.sub.i] cannot activate [r.sub.j] exactly as we have done for termination; recall Section 5.1. To verify Condition (2), which requires that [r.sub.i] cannot deactivate [r.sub.j], we must show that [r.sub.i]'s action [A.sub.i] cannot "take away" data from [r.sub.j]'s condition [C.sub.j]. It is easy to see that action [A.sub.i] can take away data from Condition [C.sub.j] only if the propagation algorithm applied to [A.sub.i] and [C.sub.j] produces a delete operation. Hence, one application of the propagation algorithm is sufficient for verifying (1) and (2). For (3), we reverse the roles of [r.sub.i] and [r.sub.j] in the analysis of (1) and (2).

For (4), we must determine if [r.sub.i]'s action [A.sub.i] can change the effect of [r.sub.j]'s action [A.sub.j] and vice versa. We first transform action [A.sub.j] into a query [C.sub.Aj] such that if the result of query [C.sub.Aj] cannot be affected by the execution of [A.sub.i], then [A.sub.i] cannot change the effect of action [A.sub.j]. We then apply the propagation algorithm to analyze [A.sub.i] and [C.sub.Aj]: if the algorithm produces 0, then [A.sub.i] cannot change the effect of [A.sub.j]; if the algorithm produces one or more insert, delete, or update operations, then [A.sub.i] may change the effect of [A.sub.j]. Then we reverse the roles of [A.sub.i] and [A.sub.j]. If the algorithm produces 0 again, then [A.sub.i] and [A.sub.j] commute.

Consider how query [C.sub.Aj] is derived from action [A.sub.j]. If [A.sub.j] is an insert operation, then [A.sub.j] = [E.sub.ins] is a query representing the inserted data, hence we let [C.sub.Aj] = [E.sub.ins]. Similarly, if [A.sub.j] is a delete operation, then [A.sub.j] = [E.sub.del] is a query representing the deleted data, and we let [C.sub.Aj] = [E.sub.del]. Suppose [A.sub.j] is an update operation on attribute A, defined by [E.sub.upd] = E[A' = expr][E.sub.c].(11) We start with the "selection condition" [E.sub.c]. [C.sub.Aj] is the projection of [E.sub.c] onto all attributes referenced within [E.sub.c], together with all attributes referenced in the E operation (both A and the attributes referenced in expr). If any of these attributes can be affected by the execution of [A.sub.i], then [A.sub.i] may change the effect of [A.sub.j]'s update; if not, then [A.sub.i] cannot change the effect of [A.sub.j]. By using the projection here, rather than the entire expression [E.sub.c], we ignore modifications to attributes that do not affect the evaluation of [E.sub.c] or the assignment of the new values to the updated attribute.

6.1.1 Examples. We apply our technique to analyze rule commutativity for the rule set defined in Section 4.2.1. In this section, we describe in detail the analysis of two rule pairs. We defer analysis of the complete rule set to Section 6.2.1.

Example 7. Consider rules bad-account ([r.sub.1]) and SF-bonus ([r.sub.3]). We first analyze the effect of [r.sub.1]'s action on [r.sub.3]. Since [r.sub.3]'s condition does not reference the relation updated by [r.sub.1], [r.sub.1]'s action trivially cannot affect [r.sub.3]'s condition. We use the propagation algorithm to analyze the effect of [r.sub.1]'s action on the query corresponding to [r.sub.3]'s action: [[Pi].sub.balance, rate, name, city] [E.sub.c]. The input to the algorithm is

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The propagation of [E.sub.upd] through the semijoin operation in C yields

[E'.sub.upd] = [E.sub.upd] ?? ([[Sigma].sub.city='SF']CUSTOMER).

The propagation of [E'.sub.upd] through the selection operation in C yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In all three expressions, predicates balance [is greater than] 5000 and balance [is less than] 500 (the latter from [E.sub.upd]) are contradictory, so the expressions are unsatisfiable. Hence, the propagation algorithm produces no actions, and we conclude that executing [r.sub.1]'s action cannot change the effect of [r.sub.2]'s action.

A similar analysis reveals that [r.sub.3]'s action cannot affect [r.sub.1]'s action, and we have already shown in Example 3 that [r.sub.3]'s action cannot affect [r.sub.1]'s condition. Hence, we conclude that rules [r.sub.1] and [r.sub.3] commute.

Example 8. Consider rules start-bad ([r.sub.4]) and end-bad ([r.sub.5]). We have already shown in Example 6 that [r.sub.4]'s action cannot affect [r.sub.5]'s condition. An analogous analysis shows that [r.sub.4]'s action cannot affect the query corresponding to [r.sub.5]'s action: [[Pi].sub.num, end][E.sub.c]. Consider the effect of rule [r.sub.5] on rule [r.sub.4]. We first apply the propagation algorithm to [r.sub.5]'s action and [r.sub.4]'s condition. The input to the algorithm is

C = [[Pi].sub.num, balance (([[Sigma].sub.balance[is less than]500]ACCOUNT) [bar]?? ([[Sigma].sub.end=null] LOW-ACC))

A = [E.sub.upd]=E[end' = today()](([[Sigma].sub.end=null]LOW-ACC) ?? ([[Sigma].sub.balance[is greater than or equal to]500]ACCOUNT)).

The propagation of [E.sub.upd] through the selection operation yields

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

[E'.sub.ins] and [E'.sub.upd] are unsatisfiable, due to the contradiction of predicates end' = null and (assignment) end' = today(). The propagation of [E'.sub.del] through the ?? operation in C yields

[E'.sub.ins] = low-bal ?? (([[Sigma].sub.end=null]LOW-ACC) ?? high-bal),

where low-bal is an abbreviation for [[Sigma].sub.balance[is less than]500]ACCOUNT and high-bal is an abbreviation for [[Sigma].sub.balance[is greater than or equal to]500]ACCOUNT. This expression is not satisfiable, since it requires a tuple with a given num value to satisfy both predicates balance [is less than] 500 and balance [is greater than or equal to] 500. Thus, [r.sub.5]'s action cannot affect [r.sub.4]'s condition. An analogous analysis shows that [r.sub.5]'s action cannot affect the query corresponding to [r.sub.4]'s action, [E.sub.ins]. Hence, we conclude that rules [r.sub.4] and [r.sub.5] commute.

6.2 Commutativity Analysis for EGA Rules

Similar to termination analysis, we propose a "mixed" analysis technique to analyze commutativity of ECA rules, based on the propagation algorithm for quasi-CA rules and on a weaker event-action analysis for ECA rules that are not quasi-CA. Assume we have determined which rules in the rule set are quasi-CA. Commutativity is verified for a rule pair [r.sub.i], [r.sub.j] by checking the four conditions in Lemma 1. To verify Condition (1), we determine whether [r.sub.i] may activate [r.sub.j] exactly as we have done for termination analysis. For Condition (2), which requires that [r.sub.i] cannot deactivate [r.sub.j], we consider the type of rule [r.sub.j]: if [r.sub.j] is quasi-CA, we can apply the propagation algorithm to [r.sub.j]'s condition and [r.sub.i]'s action; if the algorithm does not produce a delete operation, then Condition (2) is satisfied. If [r.sub.j] is not quasi-CA, then similar to Section 5.3, it is necessary to conservatively assume that when [r.sub.j] is triggered its condition is true. Thus, a simple syntactic analysis is used to check if [r.sub.i] deletes from any relation for which insert or update modifications are included in [r.sub.j]'s triggering events. For Condition (3), we reverse the roles of [r.sub.i] and [r.sub.j] in the analysis of (1) and (2). For Condition (4), we must determine if [r.sub.i]'s action may change the effect of [r.sub.j]'s action and vice versa. Since action execution is identical in CA and ECA rules, the analysis here is identical as well.

6.2.1 Examples. Consider the rule set described in Section 4.1.1. Only rule [r.sub.3] is not quasi-CA, and for this rule event-action analysis is applied for the commutativity test. The result of the pairwise commutativity analysis of the entire rule set is presented in Figure 6. Figure 6(a) shows commutative (Y) and noncommutative (N) rule pairs obtained by applying our mixed analysis technique, while Figure 6(b) shows the result obtained by applying the event-action analysis technique in Aiken et al. [1995]. Note that by using our technique it is possible to determine the commutativity of several rule pairs (e.g., ([r.sub.1], [r.sub.3]), ([r.sub.4], [r.sub.5])) whose commutativity could not be determined with the method in Aiken et al. [1995]. Nevertheless, it is the case that some rule pairs do not commute and the rule set is not confluent.

[Figure 6 ILLUSTRATION OMITTED]

The same result presented in Figure 6(a) would have been obtained by applying our analysis technique to the complete CA rule set presented in Section 4.2.1.

7. IMPROVING THE ANALYSIS

In this section we describe improvements to the termination and confluence analysis techniques presented in the previous sections. In particular, in Section 7.1, we describe a technique to improve the application of the propagation algorithm, while in Section 7.2 we discuss how to improve the analysis of rule action commutativity when the propagation algorithm fails to detect commutativity.

7.1 Improving the Application of the Propagation Algorithm

Recall that a basic step in analyzing both termination and confluence is determining whether one rule can activate or deactivate another rule. This test is performed by applying the propagation algorithm to a rule [r.sub.1]'s action and a rule [r.sub.2]'s condition. The test can be made more accurate by taking into account the fact that--during rule processing--when [r.sub.1]'s action is executed, [r.sub.1]'s condition must be true. This improvement is described in Section 7.1.1. In addition, recall that analyzing confluence of a rule set requires checking commutativity for all pairs of rules (Section 6). Note, however, that if two rules [r.sub.1] and [r.sub.2] can never be eligible for execution at the same time (because their conditions are mutually exclusive), or if one action is guaranteed to have no effect when both rules are eligible, then we need not guarantee commutativity for these rules. This improvement is described in Section 7.1.2. We discuss the improvements in the context of CA rule analysis, but the same improvements apply to quasi-CA rules directly.

7.1.1 Activation and Deactivation. Recall from Section 3 that the propagation algorithm determines the effect on a query Q of executing a modification M. In performing termination and confluence analysis, we have used the propagation algorithm to determine the effect of executing a rule's action on a rule's condition. From the rule processing algorithm in Figure 3, we know that when a rule's action is executed its condition must be true, and we can exploit this knowledge in our analysis. More specifically, consider two CA rules [r.sub.1] : [C.sub.1] [right arrow] [A.sub.1] and [r.sub.2] : [C.sub.2] [right arrow] [A.sub.2]. We extend our application of the propagation algorithm to determine the effect of performing action [A.sub.1] on condition [C.sub.2] when [A.sub.1] is executed on a database state in which condition [C.sub.1] is true (and similarly for [A.sub.2] and [C.sub.1]). The knowledge that [C.sub.1] is true when [A.sub.1] executes may be used to determine that some of the actions obtained by the propagation of [A.sub.1] on [C.sub.2] are not satisfiable. Hence, it is sometimes possible to determine that, in the context of rule processing, [r.sub.1] cannot activate (resp., deactivate) [r.sub.2], even though action [A.sub.1] may cause condition [C.sub.2] to become true (resp., false) in the general case.

To incorporate this improvement, we first observe a fact about rule condition evaluation (recall Section 4.2). Informally, a rule's condition C is true whenever its transition [Delta]C [is not equal to] 0; that is, it produces tuples that were not produced at its last evaluation during rule processing. Since [Delta]C = C - [C.sup.old] by Definition 5, the tuples in [Delta]C are a subset of the tuples produced by the evaluation of C on the database state after the rule's transition.(12)

The improved application of the propagation algorithm is presented in Figure 7. This algorithm takes a query (condition) [C.sub.2] as input, a modification [A.sub.1], and a query [C.sub.1] known to be nonempty when [A.sub.1] is executed. Figure 7 uses the propagation algorithm described in Section 3 as a subroutine, denoted PA. Step (1) performs the propagation analysis described in Section 3, applying the propagation algorithm to query [C.sub.2] and modification [A.sub.1] to produce a set of modifications A'. The set of modifications A' can now be reduced by discarding modifications that are unsatisfiable when [C.sub.1] is nonempty. Before discarding modifications, the following applicability conditions are verified.

Fig. 7. Improved activation and deactivation analysis.. (1) A'= PA (C2,A1) (2) if not contains(CA1, C1) then (3) if PA (C1, A1) does not include delete modifications (4) for each Ai' in A' do (5) if contains (not (C1), CAi) then A' = A' - Ai'

--If modification [A.sub.1] modifies all data in condition [C.sub.1], then the set A' obtained by applying the propagation algorithm cannot be reduced. (This case occurs when, e.g., a rule's action modifies all the data selected by the rule's condition, a relatively common scenario.) We can verify this condition by testing if modification [A.sub.1] contains query [C.sub.1], using the classical notion of query containment [Ullman 1989]. This case is checked in Step (2), where [CA.sub.1] is the query obtained from modification [A.sub.1] as described in Section 6.1. Known algorithms [Levy and Sagiv 1993] are applied for the containment test, which is denoted as subroutine contains.(13)

--We cannot discard modifications if the execution of [A.sub.1] may cause the result of query [C.sub.1] to become empty. This case is checked in Step (3) by applying the propagation algorithm to [C.sub.1] and [A.sub.1] and verifying that the output does not include delete modifications.

The reduction, performed by Steps (4) and (5) in Figure 7, is based on the fact that a modification [A.sub.i] [element of] A' has no effect if the condition [CA.sub.i] obtained from [A.sub.i] (see Section 6.1) is not satisfiable. Thus, if [C.sub.1] [is not equal to] 0 ?? [CA.sub.i] = 0 holds, then [A.sub.i] will have no effect and it is eliminated from the output set A'. Checking if [C.sub.1] [is not equal to] 0 ?? [CA.sub.i] = 0 holds can again use query containment: [C.sub.1] [is not equal to] 0 ?? [CA.sub.i] = 0 when the negation of query [C.sub.1] contains query [CA.sub.i]. Step (5) in Figure 7 performs the containment test, which is repeated for all modifications in A'.

Example 9. Consider the following queries and modification that use our example relations from Section 2.2.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where {1} is a constant relation containing only the element 1. Query [C.sub.1] is satisfied if there are no tuples with num [is greater than] 10 in relation ACCOUNT. (The relational expression for this query illustrates how negation at the outermost level can be represented in our algebraic language using Cartesian product and antisemijoin.) Modification [A.sub.1] deletes from relation LOW-ACC all tuples with num [is greater than] 10. Query [C.sub.2] selects all accounts in relation ACCOUNT for which there are no corresponding tuples in relation LOW-ACC. We show how the improved application of the propagation algorithm exploits the knowledge that [C.sub.1] is nonempty to determine that [A.sub.1] cannot affect [C.sub.2]; refer to Figure 7. The propagation algorithm applied to [C.sub.2] and [A.sub.1] (Step (1)) yields

[A'.sub.1] = [E'.sub.ins] = ACCOUNT ?? ([[Sigma].sub.num [is greater than] 10] LOW-ACC)

The condition associated with modification [E.sub.del] is the expression [E.sub.del] itself. It is trivial to see that [C.sub.1] ?? [E.sub.del] (Step (2)). Furthermore, the application of the propagation algorithm to [C.sub.1] and [A.sub.1] yields 0 (Step (3)). The condition associated with [E'.sub.ins] is [E'.sub.ins] itself. The negated version of [C.sub.1] contains [E'.sub.ins]. Hence, action [E'.sub.ins] is unsatisfiable (Steps (4) and (5)) and is discarded. Thus, modification [A.sub.1], executed from a state in which [C.sub.1] is nonempty, cannot affect query [C.sub.2].

7.1.2 Rule Commutativity. To analyze confluence of a rule set, each pair of rules in the set is tested for commutativity by verifying the conditions described in Section 6. However, it is not necessary for commutativity to hold if two rules cannot be eligible for execution together, that is, if their conditions cannot be true at the same time. Furthermore, even if both rule conditions are true, it may be that one rule's action is guaranteed to have no effect when the other rule's condition is true. Here again it is not necessary to test commutativity of the rule pair.

Consider two CA rules [r.sub.1] : [C.sub.1] [right arrow] [A.sub.1] and [r.sub.2] : [C.sub.2] [right arrow] [A.sub.2]. It is not necessary to perform the commutativity test for [r.sub.1] and [r.sub.2] when either (1) [C.sub.1] and [C.sub.2] cannot be true at the same time, or (2) if [C.sub.1] and [C.sub.2] are both true, then one of [A.sub.1] or [A.sub.2] has no effect. Condition (1) can be verified by checking if [C.sub.1] [conjunction] [C.sub.2] is satisfiable. If not, then [C.sub.1] and [C.sub.2] cannot be true at the same time. Checking Condition (2) is similar to the improvement described in Section 7.1.1: we verify whether either [C.sub.1] [is not equal to] 0 ?? [C.sub.A2] = 0 or [C.sub.2] [is not equal to] 0 ?? [C.sub.A1] = 0, where [C.sub.A1] and [C.sub.A2] are the conditions corresponding to [A.sub.1] and [A.sub.2], respectively. Consider [C.sub.1] [is not equal to] 0 ?? [C.sub.A2] = 0. As in Section 7.1.1, we must first check that the execution of [A.sub.1] cannot cause query [C.sub.1] to become empty. We do this by applying the propagation algorithm to [C.sub.1] and [A.sub.1] and verifying that its output does not include delete actions. Then, [C.sub.1] [is not equal to 0 ?? [C.sub.A2] = 0 holds if the negation of query [C.sub.1] contains query [C.sub.A2.]. [C.sub.2] [is not equal to] 0 ?? [C.sub.A1] = 0 is similarly verified.

Example 10. Consider two rules [r.sub.1] and [r.sub.2] with the following conditions, using our example relations from Section 2.2.

[C.sub.1] = [[Sigma].sub.rate [is greater than] 3]ACCOUNT

[C.sub.2] = {1} [bar]?? [[Pi].sub.1]({1} x ([[Sigma].sub.rate [is greater than] 4]ACCOUNT)),

where {1} is a constant relation containing only the element 1. Query [C.sub.1] selects all accounts in relation ACCOUNT with rate [is greater than] 3. Query [C.sub.2] is nonempty if there are no tuples with rate [is greater than] 4 in relation ACCOUNT. Clearly, the negation of [C.sub.1] contains [C.sub.2]. Thus, in analyzing confluence of a rule set containing [r.sub.1] and [r.sub.2], commutativity is not required for this rule pair.

7.2 Improving the Analysis of Action Commutativity

One important component in determining whether two rules are commutative is to test whether the rule's actions commute. Recall from Section 6 that we can use the propagation algorithm to check commutativity of rule actions, which are database modifications, as follows. To determine if two modifications [M.sub.1] and [M.sub.2] commute, we must check if they may operate on the same data. We first rewrite [M.sub.1] as a query selecting data to be modified (recall Section 6.1) and we propagate data modification [M.sub.1] on it. Then we reverse the roles of [M.sub.1] and [M.sub.2]. If the propagation algorithm returns an empty result in both cases, modifications [M.sub.1] and [M.sub.2] are guaranteed to commute. Our propagation algorithm approach is sufficient for guaranteeing commutativity of modifications. However, there are several special cases in which, although the modifications [M.sub.1] and [M.sub.2] may modify the same data, and thus applying the propagation algorithm produces a "may not commute" answer, [M.sub.1] and [M.sub.2] actually do commute. We consider cases when (1) [M.sub.1] and [M.sub.2] are both delete operations; (2) [M.sub.1] is an update operation and [M.sub.2] is a delete operation; and (3) [M.sub.1] and [M.sub.2] are both update operations.

Intuitively, an important requisite for two modifications [M.sub.1] and [M.sub.2] performed on the same relation R to commute is that [M.sub.1] will not read tuples modified by [M.sub.2] to select the tuples on which it operates (although it may modify these tuples), and vice versa. To guarantee commutativity of two delete operations, it is sufficient that neither modification evaluates aggregates on R, while more stringent conditions are required in the case of the other operation pairs. The conditions are formalized below.

The remaining data modification pairs, which always involve an insert operation, do not commute whenever the modifications may operate on overlapping data. Thus, there are no opportunities for improvement over the commutativity analysis techniques already presented. In the remainder of this section, we describe how commutativity analysis can be refined in the other cases mentioned above, and we provide some examples. Proofs are given in Appendix A.

7.2.1 Pair of Delete Modifications. Consider two delete operations [E.sub.del1] and [E.sub.del2], both defined on the same relation R. If [E.sub.del1] and [E.sub.del2] do not contain aggregates on attributes of R, then [E.sub.del1] and [E.sub.del2] commute, even though in some cases the outcome of applying the propagation algorithm is "may not commute." The following example provides intuition.

Example 11. Consider relation ACCOUNT from Section 2.2 and the following delete operations:

[E.sub.del1] = [[Sigma].sub.balance [is less than]0]ACCOUNT

[E.sub.del2] = [[Sigma].sub.rate=0]ACCOUNT.

The application of the propagation algorithm to the query obtained from [E.sub.del1], which is [E.sub.del1], and the delete operation [E.sub.del2] yields

[E'.sub.del2] = [[Sigma].sub.balance [is less than] 0][[Sigma].sub.rate=0]ACCOUNT,

which is satisfiable. An analogous result is produced by reversing the roles of [E.sub.del1] and [E.sub.del2]. This leads to the conclusion that the two delete operations may not commute. However, the operations do commute. Suppose [E.sub.del1] is performed first. It deletes from ACCOUNT all tuples with balance [is less than] 0. Then [E.sub.del2] is executed. [E.sub.del2] selects for deletion all original tuples with rate = 0 (which it would have selected if it executed first) minus the tuples in [E.sub.del1] [intersection] [E.sub.del2] (tuples with both balance [is less than] 0 and rate = 0), which have already been deleted by [E.sub.del1]. Hence, after [E.sub.del2] is executed, the tuples in [E.sub.del1] [union] [E.sub.del2] (i.e., the tuples that either have balance [is less than] 0 or rate = 0) are deleted from ACCOUNT. The situation is analogous if we reverse the execution order of [E.sub.del1] and [E.sub.del2]. Thus, the two modifications commute.

The following theorem is proved in Appendix A.

THEOREM 7. Let [E.sub.del1] and [E.sub.del2] be two delete operations on the same relation R. If neither operation contains aggregations over attributes of R, then [E.sub.del1] and [E.sub.del2] commute.

7.2.2 Pair of Update and Delete Modifications. Consider a delete operation [E.sub.del] and an update operation [E.sub.upd] on the same relation R, neither including any aggregates on attributes of R. If the attributes updated by [E.sub.upd] are not referenced in any predicate in [E.sub.del], then [E.sub.del] and [E.sub.upd] commute, even though in some cases the outcome of applying the propagation algorithm is "may not commute." The following example provides intuition.

Example 12. Consider relation ACCOUNT from Section 2.2 and the following delete and update operations:

[E.sub.del] = [[Sigma].sub.balance [is less than] 0]ACCOUNT

[E.sub.upd] = E[rate' = rate + 1][[Sigma].sub.num [is less than]100]ACCOUNT.

The application of the propagation algorithm to the query obtained from [E.sub.upd] ([[Pi].sub.num, rate][[Sigma].sub.num [is less than] 100]100]ACCOUNT) and to modification [E.sub.del] yields

[E'.sub.del] = ([[Pi].sub.num, rate][[Sigma].sub.num [is less than] 100][[Sigma].sub.balance [is less than] 100]ACCOUNT),

which is satisfiable. This leads to the conclusion that the two operations may not commute, while they actually do commute. Let us analyze the effect of applying the two modifications in either order. Let [E.sub.del] be performed first and delete from ACCOUNT all tuples with balance [is less than] 0. [E.sub.upd] then selects for the update operation the original tuples it would have selected if it executed first (with num [is less than] 0), minus the tuples with num [is less than] 0 and balance [is less than] 0, which have already been deleted by [E.sub.del]. Thus, after both modifications have been performed, the tuples with balance [is less than] 0 are deleted from the database, while the tuples with num [is less than] 0 but balance [is greater than or equal to] 0 have been updated. If [E.sub.upd] is executed first, the tuples with num [is less than] 0 are updated. [E.sub.del] then selects the tuples to be deleted without reading attribute rate, which has been updated by [E.sub.upd]. Thus, it selects the same tuples (with balance [is less than] 0) it would have selected if it executed first, and it deletes them from relation ACCOUNT. Hence the two modifications commute.

The following theorem is proved in Appendix A.

THEOREM 8. Let [E.sub.del] be a delete operation and [E.sub.upd] = E[[A'.sub.u], = e][E.sub.c] an update operation expressed on the same relation R. [E.sub.del] and [E.sub.upd] commute if(a) [A.sub.u] [is not an element of] [A.sub.p], where [A.sub.p] is the list of attributes referenced by [E.sub.del] and [A.sub.u] is the updated attribute,(14) and (b) [E.sub.del] and [E.sub.upd] contain no aggregates on attributes of R.

7.2.3 Pair of Update Modifications. Consider two update operations [E.sub.upd1] and [E.sub.upd2] on the same attribute A in the same relation R such that neither operation includes aggregates on attributes of R. If neither condition [E.sub.c1] or [E.sub.c2] reads attribute A and the two modifications to A can be applied in either order, then [E.sub.upd1] and [E.sub.upd2] commute, even though in some cases the outcome of applying the propagation algorithm is "may not commute." The following example provides intuition.

Example 13. Consider relation ACCOUNT from Section 2.2 and the following update operations:

[E.sub.upd1] = E[rate' = rate + 2][E.sub.c1], where [E.sub.c1] = [[Sigma].sub.balnce [is less than] 2000]ACCOUNT

[E.sub.upd2] = E[rate' = rate - 1][E.sub.c2], where [E.sub.c2] = [[Sigma].sub.num [is less than] 0]ACCOUNT.

The application of the propagation algorithm to the query obtained from [E.sub.upd2] ([[Pi].sub.num, rate][E.sub.c2]) and [E.sub.upd1] yields

[E'.sub.upd] = [[Pi].sub.num, rate][[Sigma].sub.num [is less than]0][E.sub.upd1],

which is satisfiable. (A similar result is obtained reversing the roles of [E.sub.upd1] and [E.sub.upd2].) This leads to the conclusion that the two operations may not commute, while they actually do commute. Consider the effect of performing the two updates. Let [E.sub.upd1] be performed first. It updates all tuples with balance [is greater than] 2000 (selected by its condition [E.sub.c1]), increasing their rate by 2. Then [E.sub.upd2] is performed. Its condition [E.sub.c2] neither reads the updated attribute rate nor performs aggregates on attributes of ACCOUNT. Thus, it selects for update the same tuples with num [is less than] 10 it would have selected if it executed first. [E.sub.upd2] decrements the rate by 1 for all selected tuples. After the two updates have been performed, the tuples with num [is less than] 10 and balance [is greater than] 2000 have been incremented by 1, the tuples with num [is greater than or equal to] 10 and balance [is greater than] 2000 have been incremented by 2, and the tuples with num [is less than] 10 and balance [is less than or equal to] 2000 have been decremented by 1. Reversing the roles of [E.sub.upd1] and [E.sub.upd2] yields the same final result, thus the two update operations commute.

The following theorem is proved in Appendix A.

THEOREM 9. Let [E.sub.upd1] and [E.sub.upd2] be two update operations on relation R, where

[E.sub.upd1] = E[A' = [f.sub.1](A, [B.sub.1], ..., [B.sub.n])][E.sub.c1]

[E.sub.upd2] = E[A' = [f.sub.2](A, [B.sub.1], ..., [B.sub.m])][E.sub.c2]

and where [E.sub.c1] and [E.sub.c2] are relational expressions selecting the data to be updated in R. [E.sub.upd1] and [E.sub.upd2] commute if (a) [f.sub.1]([f.sub.2](A, [B.sub.1], ..., [B.sub.m]), [B.sub.1], ..., [B.sub.n]) = [f.sub.2]([f.sub.1](A, [B.sub.1], ..., [B.sub.n]), [B.sub.1] ..., [B.sub.m]); (b) the updated attribute [A [is not an element of] [A.sub.c1] and A [is not an element of] [A.sub.c2], where [A.sub.c1] and [A.sub.c2] are lists of attributes referenced by [E.sub.c1] and [E.sub.c2], respectively; and (c) [E.sub.c1] and [E.sub.c2] do not contain aggregates on attributes of R.

8. PREVIOUS RELATED WORK

Most previous work on rule analysis in active database systems has focused either on CA rules or on ECA rules, and much of the work has been tailored to specific rule languages. Our work covers both ECA and CA rules, and it relies on relational algebra rather than a particular language.

In Hellerstein and Hsu [1991] and Zhou and Hsu [1990] methods are given for analyzing condition-action rules. The goal of their work is to impose restrictions on rule sets so that confluence (a "unique fixed point" in their model) is guaranteed. By contrast, we provide techniques for analyzing the behavior of arbitrary rule sets. In addition, in Aiken et al. [1995] it has been shown that the methods in Hellerstein and Hsu [1991] and Zhou and Hsu [1990] are subsumed by their techniques, which in turn are subsumed by the methods presented here. The methods in Aiken et al. [1995] are developed in the context of the Starburst rule system, which uses an ECA rule model. As discussed in Sections 5 and 6, the technique in Aiken et al. [1995] for analyzing rule interaction relies on a shallow comparison of the type of modification performed by one rule's action and the triggering events, condition, and action of another rule (event-action analysis). Our analysis technique is computationally more expensive, since it requires detecting the (un)satisfiability of relational expressions, while the event-action analysis only requires comparing sets of events. However, the methods presented in this article allow us to significantly improve the accuracy of the result (see Sections 5 and 6) by using a formal algebraic model to analyze the interaction between rules.

In an initial report [Baralis et al. 1993] we applied our approach only to termination analysis of CA rules. Subsequently, in Baralis and Widom [1994] we refined the techniques in Baralis et al. [1993]--still limited to CA rules--and we considered both termination and confluence. In this article we have presented a complete framework for the analysis of both CA and ECA rules. Furthermore, we have extended the analysis techniques from Baralis et al. [1993] to explicitly take into account the knowledge that a rule's condition is satisfied when the rule's action executes, and we have extended our algorithm for determining when rule actions commute. We have also provided a proof of the correctness of the propagation algorithm.

In other related work, van der Voort and Siebes [1993] analyze rule behavior in the context of object-oriented active database systems. Their work focuses on differences between instance-oriented and set-oriented rules and on decidability properties for rule analysis. Their rule model is fairly restrictive, in that rule actions (methods in their model) can only modify data selected by the corresponding rule condition, and deletions and insertions appear to be disallowed. The properties of confluence and of termination within some fixed number of steps are shown to be decidable using an approach based on "typical databases"; a typical database contains all possible data instances that could affect the outcome of rule processing. The rule set is "run" over the typical database and the outcome is checked for the desired properties. This approach is clearly infeasible in practical applications, so lower complexity algorithms are proposed, but the details and applicability of these algorithms are not clarified.

A rather different approach to active rule analysis is taken in Karadimce and Urban [1994], where ECA rules are reduced to term rewriting systems, and known analysis techniques for termination and confluence of term rewriting systems are applied. The analysis in Karadimce and Urban [1994] is based on an object-oriented data model and an instance-oriented rule execution model. Their approach is powerful, since it exploits the body of work on conditional term rewriting systems (CTRS), but its implementation appears to be complex even for small rule applications. Because the details of the rule language are not presented, it is unclear whether a general relational rule model such as ours can be expressed as a term rewriting system. Termination of a CTRS is proven by finding a well-founded ordering on terms in the CTRS, which can be a difficult task. In fact, in Karadimce and Urban [1994] it was not possible to prove termination for a rule set that enforces referential integrity constraints; termination of the same rule set (rewritten in relational algebra) can be proven with our technique. Confluence analysis is based on finding contextual critical pairs (terms that can be rewritten in two different ways, equivalent to rules triggered by the same events on the same classes) and analyzing their feasibility. Infeasibility is determined by detecting that two rule conditions cannot be true at the same time, which in most cases is harder to prove than rule commutativity.

Work in Baralis et al. [1996] is focused on termination analysis. Active rules are grouped into modules; termination of rule execution within each module is assumed, and intermodule termination is analyzed. The termination analysis techniques in this article are complementary to those presented in Baralis et al. [1996], since the techniques presented here would be applied to the analysis of rules within a module. In the degenerate case of a single module, the techniques in Baralis et al. [1996] are strictly weaker than those presented in this article.

In Baralis et al. [1998] a technique is proposed to exploit the complementary information provided by triggering graphs and activation graphs to analyze termination of ECA rule sets. Techniques for constructing the graphs are assumed. Although building a triggering graph is straightforward, building an accurate activation graph is not. The techniques presented in this article for analyzing rule activation can be used to build the activation graph needed in Baralis et al. [1998].

Work in Weik and Heuer [1995] presents a technique for termination analysis of ECA rules in the context of OSCAR, an object-oriented active database system. Although the data and rule model in OSCAR are quite different from ours, the analyzed rules correspond roughly to our quasi-CA rules with delta relations (see Section 5.2). The analysis technique in Weik and Heuer [1995] consists of two stages. The results obtained in the first stage are slightly less accurate than ours (e.g., the technique in Weik and Heuer [1995] fails to detect nonactivation of delete actions on select-project-join conditions). The second stage does better than our technique by detecting nonactivation due to incremental (resp., decremental) updates with an upper (resp., lower) bound. A simple modification to our algorithms could detect this type of condition. Note, however, that going beyond analyzing pairs of rules using the extended technique is rather complex. In fact, there are cases in which the technique described in Weik and Heuer [1995] fails to detect unbounded execution of a rule r when two other rules loop infinitely and periodically reactivate r.

A refinement technique for triggering graphs is proposed in Karadimce and Urban [1996] and Tschudi et al. [1997] in the context of the CDOL active, deductive, and object-oriented database language. Given two rules [r.sub.1] and [r.sub.2], a triggering formula is defined by considering both [r.sub.1] and [r.sub.2]'s conditions, the type of action performed by [r.sub.1], and the event of [r.sub.2]. Then, if this formula is not satisfiable, the graph edge from [r.sub.1] to [r.sub.2] is removed. This approach is subject to the strong limitation that variables in the formula when connecting [r.sub.1] and [r.sub.2]'s conditions must be database-independent, since other rule executions may take place between [r.sub.1] and [r.sub.2]. This limitation is overcome in some very specific cases. Furthermore, in Karadimce and Urban [1996] and Tschudi et al. [1997], the algebraic structure of the executed action is not taken into account. Hence, as suggested in Karadimce and Urban [1996] and Tschudi et al. [1997], this approach can be exploited as a preparatory step, before computing our activation graph to further reduce the set of edges on which to perform the computation.

A further extension in the same direction is provided in Lee and Ling [1998], where a path-removing technique for reducing triggering graphs is proposed. The method considers together the conditions of long triggering sequences, called activation formulas. Similarly to Karadimce and Urban [1996] and Tschudi et al. [1997], it is necessary to guarantee that the execution of rules outside the triggering sequence cannot unpredictably change the database state. Hence, only nonupdatable predicates (predicates that cannot be updated by any trigger execution) can be included in the activation formula. Since this condition severely limits the applicability of the technique, the selection of "finitely-updatable predicates" (i.e., predicates that can be updated only a finite number of times by trigger processing) is proposed. This approach, although powerful, is applied to a rule language that is less powerful than ours, since rule actions cannot be generic SQL DML statements.

In Comai and Tanca [1997], active rules are seen as a derivative of deductive rules. ECA rules are translated into logical clauses, taking into account their execution semantics, and known results for termination and confluence of extended Datalog programs are applied. With this approach, it is possible to show the effect of a specific execution semantics in the analysis of termination and confluence. To allow the translation of active rules into logical clauses, some limitations on the expressive power of the rule language being considered are imposed (e.g., a rule's action cannot contain a condition).

In traditional expert systems, that is, production rule systems such as OPS5 [Brownston et al. 1985], predicting properties such as termination and confluence appears to be of less importance than in the database environment; consequently, there has been less work on rule analysis in traditional expert systems than in active database systems. Work in Tsai and Cheng [1994] addresses termination analysis in the context of realtime rule-based systems, with the goal of guaranteeing bounded response time. In this work, a method based on an enable-rule graph is proposed to analyze OPS5 rule sets. The enable-rule graph is analogous to our activation graph for condition-action rules [Baralis et al. 1993; Baralis and Widom 1994]. In Tsai and Cheng [1994], a further step is performed: when a cycle is detected, the "enabling condition" for the cycle is used to automatically generate extra rules that break execution cycles. Cycles are broken either by rolling back or deleting from the working memory all instances causing the loop. While the basic analysis technique in Tsai and Cheng [1994] is similar to ours, it applies to a more restricted rule language.

Our propagation algorithm is closely related to the problem of independence of queries and updates, addressed in, for example, Elkan [1990] and Levy and Sagiv [1993]. Levy and Sagiv [1993], which subsumes Elkan [1990], gives an algorithm for determining if the outcome of a query, expressed as a Datalog program, can be affected by a given insertion or deletion. The algorithm described in Levy and Sagiv [1993] applies to more general queries than we consider here (e.g., recursive queries), but it returns a Boolean "affects/does-not-affect" answer, while our propagation algorithm returns a relational expression describing the actual effect. For analyzing active database rules, when a query and update are not independent, we need to know whether the update adds to, removes from, or modifies the result of the query. With appropriate extensions, this information could be returned by the algorithm in Levy and Sagiv [1993] as well, in which case their algorithm could be used in our rule analysis techniques in place of our propagation algorithm.

Finally, our propagation algorithm is somewhat related to incremental evaluation, as in, for example, Baralis and Widom [1995]; Libkin and Trickey [1997]; Qian and Wiederhold [1991]; and Rosenthal et al. [1989]. Both problems address the effect of a data modification on a relational expression. However, incremental evaluation techniques are designed for runtime, when the actual modifications are known, while our techniques apply at compile-time, when the modifications are represented as relational expressions.

9. CONCLUSIONS

We have developed a general framework for static analysis of both condition-action (CA) and event-condition-action (ECA) active database rules. We have provided a representation of active rules based on a generic extended relational algebra, which allows us to encode rules from most relational active database rule languages and the SQL:1999 trigger language. We have presented our propagation algorithm for analyzing the interactions between queries and modifications and discussed its correctness. We have shown how the propagation algorithm can be applied to check termination and confluence for sets of CA and ECA rules, including a number of refinements to its most straightforward application. Our techniques improve considerably upon previous methods because our formal approach allows us to exploit the semantics of conditions and actions to analyze the interaction between rules. Note that the methods we present are also applicable to rule languages that "pass data" from the condition to the action (e.g., Gordin and Pasik [1991] and Hanson [1992]), since our algorithm detects the actual modifications to rule conditions (inserts, deletes, and updates), not simply the transition between true and false. As in Aiken et al. [1995], our analysis techniques identify the responsible rules when termination or confluence is not guaranteed. Hence, our techniques can be used as the kernel of an interactive development tool that helps rule designers develop sets of rules that are guaranteed to have the desired properties[Baralis et al. 1995a].

Possible future topics include the following:

--Extending our rule model and analysis techniques to incorporate conflict resolution based on rule priorities [Widom and Ceri 1996]. Priorities restrict the possible execution sequences of rules, making the analysis more complex but sometimes more precise. Coupling our accurate analysis of rule interactions with the priority-based methods in Aiken et al. [1995] should immediately produce a quite powerful analysis method for prioritized rules.

--Using the propagation algorithm to determine when it is guaranteed that a rule [r.sub.1] always activates another rule [r.sub.2] (i.e., when executing [r.sub.1]'s action always inserts into or updates the result of [r.sub.2]'s condition). This analysis is a straightforward modification of the algorithm for improving rule analysis presented in Section 7.1. The results of this analysis would be useful, for example, in performing compile-time and runtime optimizations of rule processing.

--Extending our technique to incorporate the semantics of before-triggers as in SQL:1999.

The techniques presented in this article have been used as the basis of an initial prototype rule analysis tool, implemented in the context of the Chimera project [Widom and Ceri 1996]. While the tool does not include all of the analysis techniques covered in this article, it does use a number of our methods and will be extended in the future; see Baralis et al. [1995a] for details.

ACKNOWLEDGMENTS

We are grateful to members of the Stanford Database Group, especially Ashish Gupta, Dallan Quass, and Jeff Ullman, for lively and useful discussions about this work, and to Stefano Ceri for providing the technical impetus and enabling the collaboration.

REFERENCES

AIKEN, A., HELLERSTEIN, J. M., AND WIDOM, J. 1995. Static analysis techniques for predicting the behavior of active database rules. ACM Trans. Database Syst. 20, 1 (Mar.), 3-41.

BARALIS, E. 1994. An algebraic approach to the analysis and optimization of active database rules. Ph.D. Dissertation.

BARALIS, E., CERI, S., AND PARABOSCHI, S. 1995a. ARACHNE: A tool for the analysis of active rules. In Proceedings of the Second International Conference on Applications of Databases (San Jose, CA, Dec.).

BARALIS, E., CERI, S., AND PARABOSCHI, S. 1995b. Run-time detection of non-terminating active rule systems. In Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (Singapore, Dec.). Springer-Verlag, New York, NY, 38-54.

BARALIS, E., CERI, S., AND PARABOSCHI, S. 1996. Modularization techniques for active rules design. ACM Trans. Database Syst. 21, 1, 1-29.

BARALIS, E., CERI, S., AND PARABOSCHI, S. 1998. Compile-time and run-time analysis of active behaviors. IEEE Trans. Knowl. Data Eng. 10, 3 (May-June), 353-370.

BARALIS, E., CERI, S., AND WIDOM, J. 1993. Better termination analysis for active databases. In Proceedings of the First International Workshop on Rules in Database Systems (Edinburgh, Scotland, Aug.). 163-179.

BARALIS, E. AND WIDOM, J. 1994. An algebraic approach to rule analysis in expert database systems. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'94, Santiago, Chile, Sept.). VLDB Endowment, Berkeley, CA, 475-486.

BARALIS, E. AND WIDOM, J. 1995. Using delta relations to optimize condition evaluation in active databases. In Proceedings of the Second International Workshop on Rules in Database Systems (Athens, Greece, Sept.). 292-308.

BROWNSTON, L., FARRELL, R., KANT, E., AND MARTIN, N. 1985. Programming Expert Systems in OPS5: An Introduction to Rule-Based Programming. Addison-Wesley series in artificial intelligence. Addison-Wesley Longman Publ. Co., Inc., Reading, MA.

CERI, S., CRESPI-REGHIZZI, S., LAMPERTI, L., LAVAZZA, L., AND ZICARI, R. 1990. Algres: An advanced database for complex applications. IEEE Software.

CERI, S., FRATERNALI, P., PARABOSCHI, S., AND TANCA, L. 1994. Automatic generation of production rules for integrity maintenance. ACM Trans. Database Syst. 19, 3 (Sept.), 367-422.

CERI, S. AND GOTTLOB, G. 1985. Translating SQL into relational algebra: optimization, semantics, and equivalence of SQL queries. IEEE Trans. Softw. Eng. SE-11, 4 (Apr.), 324-345.

CERI, S. AND WIDOM, J. 1991. Deriving production rules for incremental view maintenance. In Proceedings of the 17th Conference on Very Large Data Bases (Barcelona, Spain, Sept.). VLDB Endowment, Berkeley, CA, 577-589.

CERI, S. AND WIDOM, J. 1990. Deriving production rules for constraint maintenance. In Proceedings of the 16th International Conference on Very Large Data Bases (VLDB, Brisbane, Australia, Aug. 13-16), D. McLeod, R. Sacks-Davis, and H. Schek, Eds. Morgan Kaufmann Publishers Inc., San Francisco, CA, 566-577.

COMAI, S. AND TANCA, L. 1997. Using the properties of datalog to prove termination and confluence in active databases. In Proceedings of the Third International Workshop on Rules in Database Systems (Skovde, Sweden, June). 100-117.

DAYAL, U., HSU, M., AND LADIN, R. 1990. Organizing long-running activities with triggers and transactions. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90, Atlantic City, NJ, May 23-25), H. Garcia-Molina, Chair. ACM Press, New York, NY, 204-214.

DELCAMBRE, L. AND ETHEREDGE, J. 1989. The relational production language: A production language for relational databases. In Proceedings from the Second International Conference on Expert Database Systems (Redwood City, CA), L. Kerschberg, Ed. Benjamin/Cummings, Redwood City, CA, 333-351.

EISENBERG, A. AND MELTON, J. 1999. SQL: 1999, formerly known as SQL3. SIGMOD Rec. 28, 1 (Mar.).

ELAN, C. 1990. Independence of logic database queries and update. In Proceedings of the Ninth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (PODS '90, Nashville, TN, Apr. 2-4), D. J. Rosenkrantz and Y. Sagiv, Chairs. ACM Press, New York, NY, 154-160.

GORDIN, D. AND PASIK, A. 1991. Set-oriented constructs: From Rete rule bases to database systems. In Proceedings of the 1991 ACM SIGMOD International Conference on Management of Data (SIGMOD '91, Denver, CO, May 29-31), J. Clifford and R. King, Eds. ACM Press, New York, NY, 60-67.

HANSON, E. 1992. Rule condition testing and action execution in Ariel. In Proceedings of the ACM-SIGMOD Conference on Management of Data (ACM SIGMOD, San Diego, CA, June). ACM, New York, NY, 49-58.

HELLERSTEIN, J. AND HSU, M. 1991. Determinism in partially ordered production systems. Report RJ 8009 (March).

KARADIMCE, A. AND URBAN, S. 1996. Refined triggering graphs: A logic-based approach to termination analysis in an active object-oriented database. In Proceedings of the 12th IEEE International Conference on Data Engineering (ICDE '97, New Orleans, LA, Feb.). IEEE Press, Piscataway, NJ, 384-391.

KARADIMCE, A. AND URBAN, S. 1994. Conditional term rewriting as a formal basis for analysis of active database rules. In Proceedings of the Fourth International Workshop on Research Issues in Data Engineering (RIDE-ADS '94, Houston, TX, Feb.). 156-162.

KLUG, A. 1982. Equivalence of relational algebra and relational calculus query languages having aggregate functions. J. ACM 29, 3 (July), 699-727.

KULKARNI, K., MATTOS, N., AND COCHRANE, R. 1998. Active database features in SQL3. In Active Database Systems, N. Paton, Ed. Springer-Verlag, Berlin, Germany.

LEE, S. AND LING T. 1998. A path removing technique for detecting trigger termination. In Proceedings of the Sixth International Conference on Extending Database Technology (Valencia, Spain, Mar.), H. -J. Schek, F. Saltor, I. Ramos, and G. Alonso, Eds. 341-355.

LEVY, A. AND SAGIV, Y. 1993. Queries independent of updates. In Proceedings of the Nineteenth International Conference on Very Large Data Bases (VLDB '93, Dublin, Ireland, Aug.). Morgan Kaufmann Publishers Inc., San Francisco, CA, 171-181.

LIBKIN, T. G. L. AND TRICKEY, H. 1997. An improved algorithm for the incremental recomputation of active relational expressions. IEEE Trans. Knowl. Data Eng. 9, 3 (May/June), 508-511.

ORACLE. 1999. ORACLE 8i Reference Manual. Oracle Corp., Redwood City, CA.

QIAN, X. AND WIEDERHOLD, G. 1991. Incremental recomputation of active relational expressions. IEEE Trans. Knowl. Data Eng. 3, 3 (Sept.), 337-341.

ROSENTHAL, A., CHAKRAVARTHY, U. S., BLAUSTEIN, B., AND BLAKELY, J. 1989. Situation monitoring for active databases. In Proceedings of the 15th International Conference on Very Large Data Bases (VLDB '89, Amsterdam, The Netherlands, Aug 22-25), P. G. Apers and G. Wiederhold, Eds. Morgan Kaufmann Publishers Inc., San Francisco, CA, 455-464.

SIMON, E. AND DE MAINDREVILLE, C. 1992. Implementing high level active rules on top of a relational DBMS. In Proceedings of the 18th International Conference on Very Large Data Bases (Vancouver, B.C., Aug.). VLDB Endowment, Berkeley, CA, 315-326.

SQL3. 1999. ISO/IEC 9075: 1999, Information Technology-Database Languages-SQL.

TSAI, H.-Y. AND CHENG, A. M. K. 1994. Termination analysis of OPS5 expert systems. In Proceedings of the 12th National Conference on Artificial Intelligence (vol. 1) (AAAI '94, Seattle, WA, July 31-Aug. 4), B. Hayes-Roth and R. E. Korf, Chairs. Amer. Assn. for Artificial Intelligence, Menlo Park, CA, 193-198.

TSCHUDI, M., URBAN, S., DIETRICH, S., AND KARADIMCE, A. 1997. An implementation and evaluation of the refined triggering graph method for active rule termination analysis. In Proceedings of the Third International Workshop on Rules in Database Systems (Skovde, Sweden, June). 133-147.

ULLMAN, J. D. 1988. Principles of Database and Knowledge-Base Systems. Computer Science Press, Inc., New York, NY.

VAN DER VOORT, L. AND SIEBES, A. 1993. Termination and confluence of rule execution. In Proceedings of the 2nd International Conference on Information and Knowledge Management (CIKM '93, Washington, DC, Nov. 1-5), B. Bhargava, T. Finin, and Y. Yesha, Eds. ACM Press, New York, NY, 245-255.

WEIK, T. AND HEUER, A. 1995. An algorithm for the analysis of termination of large trigger sets in an OODBMS. In Proceedings of the International Workshop on Active and Real-Time Database Systems (Skovde, Sweden, June).

WIDOM, J. AND CERI, S., EDS. 1996. Active Database Systems: Triggers and Rules for Advanced Database Processing. Morgan Kaufmann Publishers Inc., San Francisco, CA.

WIDOM, J. AND FINKELSTEIN, S. 1990. Set-oriented production rules in relational database systems. In Proceedings of the 1990 ACM SIGMOD International Conference on Management of Data (SIGMOD '90, Atlantic City, NJ, May 23-25), H. Garcia-Molina, Chair. ACM Press, New York, NY, 259-270.

ZHOU, Y. AND HSU, M. 1990. A theory for rule triggering systems. In Proceedings of the International Conference on Advances in Database Technology (EDBT '90, LNCS 416, Venice, Italy, Mar. 26-30), F. Bancilhon, C. Thanos, and D. Tsichritzis, Eds. Springer Lecture Notes in Computer Science. Springer-Verlag, New York, NY, 407-421.

Received: December 1995; revised: March 1998; accepted: November 1999

APPENDIX

A. COMMUTATIVITY PROOFS

We give proofs of commutativity for the pairs of modifications described in Section 7.2. We use Q([R.sub.1], ..., [R.sub.n]) to denote the result of evaluating query Q on relations [R.sub.1], ..., [R.sub.n]. For a modification M, M(R) denotes the result of evaluating the condition that selects the data on which modification M operates. For instance, [E.sub.del](R) denotes the tuples in relation R that are selected by the delete operation characterized by relational expression [E.sub.del].

In the following lemma, which is preliminary to the proof of the main theorem on commutativity for a pair of delete operations, we prove that relational difference is distributive with respect to a query Q when Q does not contain aggregates.

LEMMA 4. Let R' [subset or equal to] R be relations and Q(R, [S.sub.1], ..., [S.sub.n]) a query over relations R, [S.sub.1], ..., [S.sub.n] such that (a) schema (Q) = schema (R), (b) Q(R, [S.sub.1], ..., [S.sub.n]) [subset or equal to] R, (c) Q contains no aggregates on attributes of R, and (d) R occurs only once in Q. Then, Q(R, [S.sub.1], ..., [S.sub.n]) - Q(R', [S.sub.1], ..., [S.sub.n]) = Q(R - R', [S.sub.1], ..., [S.sub.n]).

PROOF. We use the abbreviation Q(R) for Q(R, [S.sub.1], ..., [S.sub.n]), and analogously for Q(R') and Q(R"). The proof proceeds by induction on the cardinality of R'.

Base Case: Let card (R') = 0. Then Q(R) - Q(R') = Q(R) - Q(0) = Q(R). Also Q(R - R') = Q(R - 0) = Q(R).

Induction Step: Let R" = R' [union] {t} for some tuple t [element of] R such that R' [intersection] t = 0 and R" [subset or equal to] R. We want to prove that Q(R) - Q(R") = Q(R - R").

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Now two cases are given: (a) Q({t}) = 0, which is trivial, and (b) Q({t}) = {t}. In Case (b), since t [is not an element of] (R - R"), we have Q(R - R") [intersection] Q({t}) = 0. Thus, Q(R) - Q(R") = Q(R - R"). []

THEOREM 7. Let [E.sub.del1] and [E.sub.del2] be two delete operations on the same relation R. If neither operation contains aggregations over attributes of R, then [E.sub.del1] and [E.sub.del2] commute.

PROOF. Let [E.sub.del1] execute first. Its execution on R will produce a new relation R' = R - [E.sub.del1](R). Then [E.sub.del2] executes on R' and produces a new relation R".

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Now let [E.sub.del2] execute first, producing [bar]R' = R - [E.sub.del2](R), and [E.sub.del1] execute next, producing [bar]R".

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

The following two lemmas prove properties of update and delete modifications that are needed in the proofs of the subsequent commutativity theorems.

LEMMA 5. Let [E.sub.del] be a delete operation and [E.sub.upd] - E[[A'.sub.u] = e][E.sub.c] an update operation expressed on the same relation R such that [E.sub.del] and [E.sub.upd] contain no aggregates on attributes of R. Then, [E.sub.upd]([E.sub.del](R)) = E[[A'.sub.u] = e]([E.sub.c](R) [intersection] [E.sub.del](R)).

PROOF.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

LEMMA 6. Let [E.sub.del] be a delete operation and [E.sub.upd] E-[[A'.sub.u] = e][E.sub.c] an update operation expressed on the same relation R such that: (a) [A.sub.u] [is not an element of] [A.sub.p], where [A.sub.p] is the list of attributes referenced by [E.sub.del] and [A.sub.u] is the attribute updated by [E.sub.upd], and (b) [E.sub.del] and [E.sub.upd] contain no aggregates on attributes of R. Then, [E.sub.del]([[Rho].sub.old]([E.sub.upd](R))) = [E.sub.del]([[Rho].sub.new]([E.sub.upd](R))).

PROOF. Follows directly from hypothesis. []

THEOREM 8. Let [E.sub.del] be a delete operation and [E.sub.upd] = E[[A'.sub.u] = e][E.sub.c] an update operation expressed on the same relation R. [E.sub.del] and [E.sub.upd] commute if (a) [A.sub.u] [is not an element of] [A.sub.p], where [A.sub.p] is the list of attributes referenced by [E.sub.del] and [A.sub.u] is the updated attribute; and (b) [E.sub.del] and [E.sub.upd] contain no aggregates on attributes of R.

PROOF. Let [E.sub.del] execute first. Its execution on R produces a new relation R' = R - [E.sub.del](R). Then, [E.sub.upd] executes on R' and produces a new relation R" = (R' - [[Rho].sub.old]([E.sub.upd](R'))) [union] [[Rho].sub.new]([E.sub.udp](R')), where

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Then,

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Now let [E.sub.upd] execute first, producing [bar]R' = (R - [[Rho].sub.old]([E.sub.upd](R))) [union] [[Rho].sub.new]([E.sub.upd](R)) and [E.sub.del] execute next, producing [bar]R".

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

THEOREM 9. Let [E.sub.upd1] and [E.sub.upd2] be two update operations on relation R, where [E.sub.upd1] = E[A' = [f.sub.1](A, [B.sub.1], ..., [B.sub.n])][E.sub.c1] and [E.sub.upd2] = E[A' = [f.sub.2](A, [B.sub.1], ..., [B.sub.m])][E.sub.c2], and where [E.sub.c1] and [E.sub.c2] are relational expressions selecting the data to be updated on R. [E.sub.upd1] and [E.sub.upd2] commute if: (a) [f.sub.1]([f.sub.2](A, [B.sub.1], ..., [B.sub.m]), [B.sub.1], ..., [B.sub.n]) = [f.sub.2]([f.sub.1](A, [B.sub.1], ..., [B.sub.m]), [B.sub.1], ..., [B.sub.m]); (b) the updated attribute A [is not an element of] [A.sub.c1] and A [is not an element of] [A.sub.c2], where [A.sub.c1] and [A.sub.c2] are lists of attributes referenced by [E.sub.c1] and [E.sub.c2], respectively; and (c) [E.sub.c1] and [E.sub.c2] do not contain aggregates on attributes of R.

PROOF. To prove commutativity, we must prove that

[E.sub.upd1] ([E.sub.upd2](R)) = [E.sub.upd2]([E.sub.upd1](R)).

Let [f.sub.1](A) be shorthand for [f.sub.1](A, [B.sub.1], ..., [B.sub.n]) and analogously for [f.sub.2](A).

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

With analogous steps [E.sub.upd2]([E.sub.upd1](R)) = E[A" = [f.sub.2]([f.sub.1](A))]([E.sub.c1](R) [intersection] [E.sub.c2](R)). Then, if [f.sub.1]([f.sub.2](A)) = [f.sub.2]([f.sub.1](A)), [E.sub.upd1]([E.sub.upd2](R)) = [E.sub.upd2]([E.sub.upd1](R)), and [E.sub.upd1] and [E.sub.upd2] commute. []

(1) If there are multiple references to relation R, we would need to treat each reference independently and take the union of the results. Furthermore, it is necessary, for example, in the case of insert modifications, to perform the propagation process considering the new state of relation R, given by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Due to the significant added complexity, we do not consider multiple occurrences of a relation in this article, but leave the extension to future work.

(2) Note that a conservative test for satisfiability is not really a limitation here, since our entire approach is conservative; see Section 3.5.

(3) Actually, any expression involving nonupdated attributes and constants can be considered as a constant in this context.

(4) A sophisticated satisfiability check would determine that the expressions [[Sigma].sub.balance [is greater than] 500 [conjunction] rate [is greater than] 0][E.sub.upd] and [[Sigma].sub.balance [is less than] 500 [conjunction] rate' [is greater than] 0][E.sub.upd] select the same data, thus both [E'.sub.ins] and [E'.sub.del] are empty.

(5) For simplicity, we consider rules with a single action here, although many active database rule languages allow rules with a sequence of actions. Our methods easily extend to multiple actions by applying the method once for each action [Baralis 1994].

(6) We do not consider complex or time-related events [Widom and Ceri 1996], which would significantly complicate the analysis.

(7) We are currently exploring the possibility of extracting from stored procedure bodies in trigger actions enough information about the SQL/DML statements to perform rule analysis.

(8) It is trivial to recognize syntactically that these predicates should be considered together in the satisfiability check.

(9) For instance, if r is the only rule and r's action does not operate on the data selected by C, then C = [C.sup.old] and [Delta]C = 0.

(10) Recall that the update of attribute num is not allowed in this example.

(11) The extension to multiple updated attributes is straightforward.

(12) This property follows from the fact that, in our algebraic language, negation is not allowed at the outermost level of a relational expression (see Example 9).

(13) Although containment tests can have exponential complexity [Ullman 1989], we expect to consider relatively small expressions here.

(14) If update [E.sub.upd] modifies more than one attribute, this property must hold for all updated attributes.

This work was partially performed while the authors were at the IBM Almaden Research Center in San Jose, CA. At Stanford, this work was supported by the Anderson Faculty Scholar Fund and by equipment grants from Digital Equipment Corporation and the IBM Corporation.

Authors' addresses: E. Baralis, Dipartimento di Automatica e Informatica, Politecnico di Torino, Corso Duca degli Abruzzi, 24, Torino, I-10129, Italy; email: baralis@polito.it; J. Widom, Department of Computer Science, Stanford University, Stanford, CA 94305; email: widom@cs.stanford.edu.

Printer friendly Cite/link Email Feedback | |

Author: | BARALIS, ELENA; WIDOM, JENNIFER |
---|---|

Publication: | ACM Transactions on Database Systems |

Geographic Code: | 1USA |

Date: | Sep 1, 2000 |

Words: | 26888 |

Previous Article: | Emancipating Instances from the Tyranny of Classes in Information Modeling. |

Next Article: | A New Approach to Developing and Implementing Eager Database Replication Protocols. |

Topics: |