# Disjunctive Datalog.

We consider disjunctive Datalog, a powerful database query language based on disjunctive gic programming. Briefly, disjunctive Datalog is a variant of Datalog where disjunctions may appear in the rule heads; advanced versions also allow for negation in the bodies, which can be handled according to a semantics for negation in disjunctive logic programming. In particular, we investigate three different semantics for disjunctive Datalog: the minimal model semantics, the perfect model semantics, and the stable model semantics. For each of these semantics, the expressive power and complexity are studied. We show that the possibility variants of these semantics express the same set of queries. In fact, they precisely capture the complexity class [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, unless the Polynomial Hierarchy collapses, disjunctive Datalog is more expressive than normal logic programming with negation. These results are not only of theoretical interest; we demonstrate that problems relevant in practice such as computing the optimal tour value in the Traveling Salesman Problem and eigenvector computations can be handled in disjunctive Datalog, but not Datalog with negation (unless the Polynomial Hierarchy collapses). In addition, we study modularity properties of disjunctive Datalog and investigate syntactic restrictions of the formalisms.Categories and Subject Descriptors: H.2.3 [Database Management]: Languages--query languages; D.1.6 [Programming Techniques]: Logic Programming; 1.2.3 [Artificial Intelligence]: Deduction and Theorem Proving; 1.2.4 [Artificial Intelligence]: Knowledge Representation Formalisms and Methods; F.2.2 [Analysis of Algorithms and Problem Complexity]: Nonnumerical Algorithms and Problems; F.4.1 [Mathematical Logic and Formal Languages]: Mathematical Logic

General Terms: Languages, Theory

Additional Key Words and Phrases: Datalog, deductive systems, descriptive complexity theory, disjunctive logic programming, expressive power, finite model theory, finite structures

1. INTRODUCTION

Strong interest in enhancing logic programs by the capability of disjunction emerged in the logic programming, artificial intelligence, and database communities. This interest is documented by a number of publications (cf. [Lobo et al. 1992; Minker 1994]) and even workshops dedicated to this subject [Wolfinger 1994; Dix et al. 1996]. In this article, we study disjunctive logic programming in the context of databases.

Datalog is a well-known database query language that uses disjunction-free logic programs, where no functions are allowed [Ullman 1989; Ceri et al. 1990]. We deal here with extensions of such programs by disjunction (and, as well as for Datalog, by negation); the corresponding query languages thus amount to disjunctive Datalog.

Briefly, a disjunctive logic program (DLP) is a logic program where disjunction may occur in the rule heads. For example, a DLP may contain rules such as

Male (x) [disjunction] Female (x) [left arrow] Person(x).

The usefulness of disjunctive rules for knowledge representation, database querying, and for representing incomplete information, is generally acknowledged (see Gelfond and Lifschitz [1991] and Baral and Gelfond [1994]).

Various semantics of DLPs have been proposed; they are commonly based on the paradigm of minimal models, which underlies Minker's [1982] Generalized Closed World Assumption. For an overview of various semantics for DLPs, the reader may consult Lobo et al. [1992] and Minker's more recent article [Minker 1994].

In this article, we limit our attention to three important semantics for disjunctive logic programming (and thus for disjunctive Datalog), whose precise definition is given in Section 3:

--Minimal model semantics. This semantics, also termed Extended Generalized Closed World Assumption (EGCWA) [Yahya and Henschen 1985], is closely related to McCarthy's [1986] circumscription.

--Perfect model semantics [Przymusinski 1988]. This semantics has been defined for general DLPs, but is particularly suited for stratified DLPs. Note that the perfect model approach is successfully used by Kifer et al. [1995] for defining a semantics of object-oriented languages.

--Disjunctive stable model semantics [Przymusinski 1991; Gelfond and Lifschitz 1991]. This semantics generalizes the well acknowledged stable semantics for normal logic programs [Gelfond and Lifschitz 1988].

Much work has been spent on the suitability of these and other DLP semantics in various contexts, as well as on mathematical and structural properties.(1) However, until recently, little was known of the complexity and the expressive power of disjunctive logic programming.

Noticeable recursion-theoretic results on disjunctive inference formalisms over infinite structures where obtained by Schlipf [1987], who showed that general circumscription over the integers is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete, and by Chomicki and Subrahmanian [1990], who showed that Minker's Generalized Closed World Assumption is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete. These results show that on infinite structures, disjunctive closed-world reasoning is harder than both classical reasoning and normal logic programming [Chomicki and Subrahmanian 1990] (see Schlipf [1995a] for an overview). However, most computer science applications deal with finite structures, in particular, with database relations. In this context, the important question of whether each DLP can be replaced by an equivalent normal logic program was open.

Recently, Eiter and Gottlob [1995b] analyzed the complexity of propositional disjunctive logic programming (see Eiter and Gottlob [1995b] for a detailed treatment). They show that in this setting, most forms of DLP are computationally harder than normal logic programming with negation (unless the Polynomial Hierarchy collapses). The present article tackles the more difficult and involved task of analyzing both the complexity and the expressive power of disjunctive Datalog over relational databases. In particular, we tackle the problem of characterizing the queries that a disjunctive Datalog program may compute on relational databases that are supplied for input. More precisely, this means that we investigate the expressive power of a query language where the database queries are syntactically given by disjunctive Datalog programs and a database consists of a finite collection of finite relations.

The main questions we address in this article are the following.

--How to define the precise semantics of disjunctive Datalog based on different well-known semantics for DLPs (in particular, the three preceding semantics).

--What is the expressive power of these variants of disjunctive Datalog? In particular, how is the expressive power affected by allowing disjunction in combination with other constructs such as inequality ([is not equal to]) and negation (??)?

--What is the complexity of disjunctive Datalog, measured by data and expression complexity for the various formalisms?

--Is disjunctive Datalog practically relevant aside from its theoretical interest, that is, do there exist queries arising in practice for which the power of disjunction in Datalog is needed?

Our answers to these questions and the main contributions of this article are summarized as follows.

--We give a precise definition of the syntax and the semantics of disjunctive Datalog based on the minimal model semantics (MM), the disjunctive stable model semantics (SM), and the perfect model semantics (PM). We consider both the possibility and the certainty version of these semantics, for Boolean queries and for queries computing output relations.

--We prove some new modularity properties for disjunctive logic programs, which generalize the concept of stratification and are a useful tool for the formal analysis of DLPs.

--Plain Datalog augmented with disjunction ([DATALOG.sup.[disjunction]]) allows under all considered semantics (MM, SM, PM) expressing some [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard queries, but not all queries in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In fact, it does not allow expressing all queries computable in polynomial time. (This is shown by proving a weak form of preservation under supermodels.)

--We show that disjunctive Datalog with [is not equal to] captures precisely the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries under the possibility version of all considered semantics. Thus, adding [is not equal to] (inequality) to [DATALOG.sup.[disjunction]] (giving [DATALOG.sup.[disjunction], [is not equal to]]) suffices to gain full expressive power. This result is proved exploiting a strengthened form of Fagin's Theorem generalized to the Polynomial Hierarchy ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) [Fagin 1974; Stockmeyer 19771, which exists for finite structures only and classes [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where k [is greater than or equal to] 2 [Eiter et al. 1996]. The same expressive power ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) is gained by adding negation (??) in case of both the SM and the PM semantics. For these semantics, symmetric results ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-expressiveness) holds for certainty reasoning. In particular, the results show that for disjunctive Datalog the SM-semantics and the PM-semantics express precisely the same set of queries; in the light of results in Eiter and Gottlob [1995b], this is probably not true for [DATALOG.sup.??], that is, nondisjunctive Datalog with negation.

--We show that disjunctive Datalog queries are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete in the data size and [NEXPTIME.sup.NP]-complete in the program size under the possibility version of all considered semantics. In fact, there are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard (resp., [NEXPTIME.sup.NP]-hard) queries defined by plain disjunctive Datalog programs without [is not equal to] and ??. Notice that NP [subset] [NEXPTIME.sup.NP], and hence [NEXPTIME.sup.NP]-complete problems are provably intractable; few natural among them are known. We establish the [NEXPTIME.sup.NP]-completeness results using complexity upgrade techniques that extend results in Balcazar et al. [1992], Papadimitriou [1994], and Papadimitriou and Yannakakis [1985], applied to results in Stewart [1991].

--We demonstrate the practical relevance of disjunctive Datalog by exhibiting queries that can be expressed in disjunctive Datalog but not in standard Datalog with negation ([DATALOG.sup.??]) unless the Polynomial Hierarchy collapses. In particular, we show how computing the optimal tour value in the Traveling Salesman Problem and eigenvector computations can be handled in disjunctive Datalog, but not in [DATALOG.sup.??] (unless the Polynomial Hierarchy collapses).

--We compare disjunctive Datalog with other variants of Datalog and well-known database query languages.

Our results provide a clear picture of the expressiveness of disjunctive Datalog in the various settings. Moreover, they complement and extend previous results of important extensions of Datalog and related query languages (see Abiteboul et al. [1995] and Kanellakis [1990]). Note that the major versions of [DATALOG.sup.??] [Kolaitis and Papadimitriou 1991; Schlipf 1995b; van Gelder et al. 1991] express database queries in NP (resp., co-NP) under possibility (resp., certainty) inference. Thus disjunction adds expressive power (unless [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). A consequence of the high expressive power of disjunctive Datalog is a high worst-case complexity. This tradeoff between expressivity and complexity, which is not necessarily a drawback, is discussed in the conclusion (Section 9).

The article is organized as follows: Section 2 gives some basic concepts and notation. In Section 3 we present syntax and semantics of disjunctive Datalog, based on different semantics for disjunctive logic programs. Applications and examples are given in Section 4, where programs for several queries are given that cannot be implemented in [DATALOG.sup.??]. Section 5 investigates the structure of disjunctive Datalog programs, and presents a result about decomposability of a program into hierarchically organized modules. Section 6 analyzes the expressive power of disjunctive Datalog, which is continued in Section 7 for plain disjunctive Datalog. The complexity of disjunctive Datalog is determined in Section 8. Finally, Section 9 compares disjunctive Datalog to common and novel database query languages, addresses the issue of why its precise expressiveness characterization is relevant, and outlines open issues for future work.

2. PRELIMINARIES

We assume familiarity with the basic concepts of complexity theory and logical query languages for databases.(2)

A relational scheme over a domain Dom is a (possibly empty) list R = [R.sub.1], . . . , [R.sub.k] of relation symbols [R.sub.i], i = 1, . . . , k of arity a([R.sub.i]) [is greater than or equal to] 0. We assume that Dom is fixed and countable. A relational database over R (or instance of R) is a finite structure D = (U, [r.sub.1], . . ., [r.sub.k]) of a finite universe U [subset or equal to] Dom and relations [r.sub.i] [subset or equal to] [U.sup.a([R.sub.i])]. Notice that for each tuple t [element of] [U.sup.a([R.sub.i])], t [element of] [r.sub.i] iff D | = [R.sub.i](t). By U(D) (or simply U, if clear from the context) we denote the universe of D and by D[R'] the projection of D to R' for any subscheme R' of R. The set of all databases over R is denoted by D(R). As commonly done, we use relation symbols also to denote relation instances, and say relation instead of relation symbol.

A database property over R is a predicate P on D(R) that associates with each D [element of] D(R) a truth value P(D) from (true, false) and which is generic, that is, invariant under permutations of Dom.

A database mapping or query is a (partial) recursive function q: D(R) [right arrow] D(S), where R and S are disjoint, which maps instances D of R (intuitively, the input relations) to instances q(D) of S (the output relations) so that U(D) = U(q(D)). Usually, attention is restricted to generic queries [Chandra and Harel 1982] (or, less restrictively, to C-generic queries, which allow generic constants), which are invariant under permutations of the elements of Dom. Genericity of queries asserts data independence; that is, data representation does not have any impact on the query result.

A query q is Boolean if S is a single 0-ary relation symbol (i.e., a propositional letter) S; technically, S is true if (as a relation) S = {( )}, and S is false if S = 0. A generic total Boolean query corresponds to a database property in the obvious way.

Recognizability of a query q is defined in terms of the following query output tuple (QOT) problem: Given D [element of] D(R), [S.sub.i] [element of] S, and a tuple t [element of] [U(D).sup.a([S.sub.i])], decide if q(D) | = [S.sub.i](t). Query q is C-recognizable, where C is a complexity class, iff the QOT-problem is in C (cf. Gurevich [1988]). Similarly, a database property P is C-recognizable (or, in C) iff deciding P(D) true is in C.

A query language is a set L of query expressions E and a function [Mu] defining the semantics of L, such that for each E [element of] Y, [Mu](E) is a query [Vardi 1982]. The expressive power E(L) of L is the set of all queries definable by query expressions from L; that is, E(L) = {q: q = [Mu](E), E [element of] L}. Of particular interest are the generic queries definable in L. As common with query languages, we focus in our analysis of the expressive power of disjunctive Datalog on generic queries.

t is well-known that second-order definability of properties on finite structures is closely connected to computability within the Polynomial Hierarchy (PH) [Stockmeyer 1977]. Recall that the classes of PH are defined as follows: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and for k [is greater than or equal to] 0, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

For a first-order vocabulary [Sigma], denote by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the second-order sentences of the form ([inverted]E[S.sub.1])([inverted]A[S.sub.2]) . . . ([QS.sub.k]) [Psi] ([S.sub.1], . . . , [S.sub.k]), where the quantifiers alternate and [Psi] is first order; each [S.sub.i] = [S.sub.i,1], . . . , [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is a list of predicate variables of fixed arities.

Proposition 2.1 [Fagin 1974; Stockmeyer 1977] A database property P over a relational scheme R is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], k [is greater than or equal to] 1, iff there exists a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (R)-sentence [Psi] such that for each D [element of] D(R), D | = [Psi] iff P(D) = true.

3. DISJUNCTIVE DATALOG

We assume that the reader is familiar with the basic concepts of deductive databases and logic programming [Lloyd 1984; Ullman 1989; Ceri et al. 1990]. Recall that Datalog programs are function-free logic programs. We consider here the extension of Datalog to disjunctive logic programs; for a background on disjunctive logic programming, see Lobo et al. [1992] and Minker [1994].

3.1 Syntax

A disjunctive Datalog rule is a clause of the form

(1) [a.sub.1] [disjunction] . . . [disjunction] [a.sub.n] [left arrow] [b.sub.1], . . . , [b.sub.m], 1 [is less than or equal to] n, 0 [is less than or equal to] m,

over a function-free first-order language, where the [a.sub.i]s are atoms forming the head of the clause, and the [b.sub.j]s are literals (i.e., atoms or negated atoms) forming the body; a logical constant = for equality is available, whose use is restricted to the body. For literals ?? ([t.sub.1] = [t.sub.2]) we write as usual [t.sub.1] [is not equal to] [t.sub.2]. We use the terms predicates and relations (resp., relation symbols) as synonyms.

A disjunctive Datalog program (simply program) is a finite collection [Pi] of clauses of form (1); by RS([Pi]) we denote the relations occurring in [Pi]. A program [Pi] is stratified if it is possible to decompose RS([Pi]) into pairwise disjoint sets [P.sub.0], [P.sub.1], . . . , [P.sub.k], such that: if a predicate P [element of] [P.sub.i] occurs in the head of some rule of [Pi] and a predicate P' occurs in the same rule, then P' [element of] [P.sub.i] if P' occurs in the head, and P' [element of] [P.sub.j] if P' occurs in the body, where j [is less than or equal to] i if it occurs positively and j [is less than] i if it occurs negatively. Any such decomposition is a stratification of P; it induces a decomposition of [Pi] into subsets of rules [[Pi].sub.0], . . . , [[Pi].sub.k] where [[Pi].sub.i] contains all rules that contain some predicate P [element of] [P.sub.i] in the head.

A query program is a triple ([Pi], R, S) of a program [Pi] and relation schemes R, S. The pair (R, S) is the input/output scheme of the query program, where R are the extensional predicates (i.e., the database relations), whose occurrence is restricted to the rule body, and S is a subset of the intentional predicates (i.e., the derived relations). Intuitively, the relations in R are the input relations and the relations in S are the output relations; the relations in RS([Pi]) - RS (where RS are the predicates in R and S) are referred to as auxiliary relations. An input database for the query program is any database D [element of] D(R).

For convenience, we identify a query program with [Pi] if R and S are clear from the context, and refer to it as the program [Pi] with input/output scheme IO([Pi]) = (R, S).

Example 1. Consider [Pi] with IO([Pi]) = (R; S):

S (x) [disjunction] A (x, b) [left arrow] R (x).

S (x) [left arrow] ?? R (x), ??B (a, x).

S (x) [left arrow] R (x, b).

Notice that [Pi] is stratified (choose [P.sub.0] = {B, R), [P.sub.1] = {A, S}; then [[Pi].sub.0] = [Theta], [[Pi].sub.1] = [Pi].

We define a family of classes [DATALOG.sup.X,Y] of query programs, based on independent syntactical restrictions X and Y concerning disjunction and negation, respectively, as follows. If X is void, then no disjunction is allowed (i.e., n = 1 in the rule (1)); if X is [disjunction], then disjunction is allowed. Y is one of the following increasingly liberal restrictions.

Y is void: negation restricted to R.

Y is [not equal to]: negation restricted to R and = (i.e., inequality literals allowed).

Y is [??.sub.s]: stratified negation ([Pi] is stratified).

Y is ??: arbitrary negation.

For example, [DATALOG.sup.[disjunction]], which we refer to as plain disjunctive Datalog, allows disjunction in the head and negation of predicates from R;(3) [DATALOG.sup.[disjunction],??], disjunctive Datalog, is the language of all query programs.

3.2 Semantics

There are several proposals to capture the meaning of disjunctive logic programs, based on different concepts of the intended models of a program (cf. Lobo et al. [1992] and Minker [1994]). We consider minimal models [Minker 1982] (arising in the AI community in circumscription [McCarthy 1986], and in advanced forms of the Closed World Assumption, e.g., the Extended Generalized CWA [Yahya and Henschen 1985]), the perfect models [Przymusinski 1988], and the (disjunctive) stable models [Gelfond and Lifschitz 1988; Przymusinski 1991], which are among the most well-known such concepts. Using these concepts, we define different semantics [Mu] of disjunctive Datalog, which give rise to different query languages; the semantics of sublanguages is implicit.

We need some notation and concepts from (disjunctive) logic programming. For any program [Pi], we denote by [HU.sub.[Pi]] its Herbrand universe, that is, the set of constants occurring in [Pi], and by [HB.sub.[Pi]], its Herbrand base, that is, the set of all ground atoms of predicates in RS([Pi]) over [HU.sub.[Pi]]. The ground instantiation of [Pi] over universe U is denoted by ground([Pi], U), or, if U = [HU.sub.[Pi]], simply by ground([Pi]). An (Herbrand) interpretation of [Pi] is a subset I [subset or equal to] [HB.sub.[Pi]]; it satisfies a ground rule (1) if I | = [b.sub.i] for every literal [b.sub.i] in the body implies I | = [a.sub.j], for at least one [a.sub.j] in the head. An interpretation I of [Pi] is a model of [Pi], if it satisfies each rule in ground([Pi]).

Definition 3.1 A model M of [Pi] is minimal, if it does not contain any other model of [Pi] properly; by MM([Pi]) we denote the set of all minimal models of [Pi].

The disjunctive stable models of [Pi] are defined as follows. For any interpretation I, denote by [[Pi].sup.I] the Gelfond-Lifschitz transform of [Pi] with respect to I, which is obtained by removing from ground([Pi]) all clauses that contain a negative literal ??a in the body such that a [element of] I, and by removing all negative literals from the remaining clauses; notice that [[Pi].sup.I] is a ??-free grounded (i.e., variable-free) program.

Definition 3.2 [Przymusinski 1991] An interpretation M is a (disjunctive) stable model of [Pi] iff M [element of] MM([[Pi].sup.M]); by SM([Pi]) we denote the set of all stable models of [Pi].

Notice that each stable model of [Pi] is a minimal model; that is, SM([Pi]) C MM([Pi]) holds. However, the converse is not true in general.

Perfect models have been proposed earlier than disjunctive stable models in Przymusinski [1991]. The definition of perfect models for arbitrary programs is provided in the appendix; we characterize here only the perfect models of stratified programs, for which the perfect models amount to the commonly agreed models. The characterization is known by the equivalence of perfect models and models of prioritized circumscription for stratified programs [Przymusinski 1991], and can be easily proved from the following properties and results.

PROPOSITION 3.1 An interpretation M is a perfect model of a stratified program [Pi], iff for a stratification [[Pi].sub.0], . . ., [[Pi].sub.k] of [Pi], M [intersection] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]), for i = 0, . . ., k, where [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], i = 1, . . ., k.

The set of all perfect models of [Pi] is denoted by PM([Pi]). Notice that like stable models, perfect models are always minimal models.

For convenience, we use in the sequel ?? as a generic variable for MM, PM, and SM. The following proposition states fundamental relationships between minimal, perfect, and stable models.

PROPOSITION 3.2 [Przymusinski 1991] Let [Pi] be a stratified program. Then PM([Pi]) = SM ([Pi]). Moreover, (i) if [Pi] is disjunction-free, then [Pi] has a unique perfect (resp., stable) model; (ii) if [Pi] is positive (i.e., ??,-free), then PM([Pi]) = SM([Pi]) = MM([Pi]).

We remark that in Bidoit and Froidevaux [19871 a semantics of stratified programs was proposed using default logic; it is equivalent to the stable (and also perfect) model semantics.

The evaluation of a query program over a database amounts to evaluating the ground program given by the facts in the database plus the instantiated rules. In order to circumvent a distinction between database predicates and intentional predicates (which is necessary for minimal models in the case of negative literals involving database predicates), we reduce this ground program by propagating the database facts to rule bodies.

The instantiation of a program [Pi] on a database D, denoted by [Pi][D], is the program obtained from ground([Pi], U) as follows. First remove every rule that contains a literal in the body which is false in D (in particular, also equality and inequality atoms). Then remove from bodies of the remaining rules all equality atoms, inequality atoms, and literals involving a relation from D.

Example 1 (ctd.). Consider the preceding program [Pi] again (IO([Pi]) (R; S)):

S (x) [disjunction] A (x, b) [left arrow] R (x).

S (x) [left arrow] ??R (x), ??B (a, x).

S (x) [left arrow] A (x, b).

Let D = ({a, b}, {a}) [element of] D(R). Then

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Models of a program on a database are now defined as follows.

Definition 3.3 Let [Pi] be a program and D be any database. The minimal (resp., stable, perfect) models of [Pi] on D, denoted by MM([Pi], D) (resp., SM([Pi], D), PM([Pi], D)), are the minimal models (resp., the stable models, perfect models) of the program [Pi][D].

Example 1 (ctd.). The minimal models of [Pi][D] are M = {S(a), S(b)} and M' = {S(a), B(a, b)}. From these, M is perfect. Indeed, [Pi][D] is stratified by [P.sub.0] B, [P.sub.1] = A, S, which induces [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Check that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Hence M is a perfect model of [Pi][D]. On the other hand, M' is not perfect. Since the program is stratified, M is thus also the unique stable model. Hence MM([Pi], D) = {{S(a), S(b)}, {S(a), B(a, b)}} and PM([Pi], D) = SM([Pi], D) {{S(a), S(b)}}.

The result of the query computed by a query program [Pi] (i.e., of [Mu]([Pi])) on an input database D is defined in terms of the models ??([Pi], D), by taking as usual either the union of all models or the intersection [Abiteboul et al. 1990; Schlipf 1995b]. The former approach (union) is known as possibility inference (also termed brave or credulous reasoning in the AI community), and the latter (intersection) as certainty inference (also termed cautious or skeptical inference) [Imielinski and Lipski 1984; Abiteboul et al. 1990].

Definition 3.4 Let inf be an inference modality from p(ossibility), c(ertainty). [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the query language of [DATALOG.sup.X,Y] query programs [Pi], IO([Pi]) = (R, S), with semantics [[Mu].sub.inf,??] such that [[Mu].sub.inf,??]([Pi]) is the partial query

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is defined if [HU.sub.[Pi] [subset or equal to] U (i.e., all constants in [Pi] belong to the universe of D), and depending on inf, for each [S.sub.i] [element of] S and tuple t [element of] [U.sup.a]([S.sub.i]),

(possibility inference) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(certainty inference) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

For example, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denotes [DATALOG.sup.[disjunction],??] with possibility stable model semantics.

Technically, we have defined the semantics of any disjunctive Datalog query program for any kind ?? of model. The intended use of minimal models, however, is for [DATALOG.sup.[disjunction],[not equal to], and of perfect models for [DATALOG.sup.[disjunction],[??.sub.s]]. The relationships between minimal, perfect, and stable models of a program (Proposition 3.2) imply the coincidence of several query languages [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which motivates a simplification of notation (inf = p, c):

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The simplified notation is inherited to sublanguages (e.g., DATALOG is familiar Datalog with negation of input relations).

In this article, we disregard [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], that is, the certainty variant of the MM-semantics, and restrict our attention to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The reason is the well-known fact that a positive atom is in all minimal models iff it is in all models. Thus the certainty MM-semantics does not provide model selection with respect to positive atoms, and computes the same query result as classical (certainty) semantics. Therefore [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] falls back to query languages in co-NP which are well-understood (cf. FN-queries in Chandra and Harel [1985]). For the other semantics (SM, PM), the certainty variant is precisely symmetric to the possibility variant if (stratified) negation is allowed. For instance, the Boolean (constant-free) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries express the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] database properties and the Boolean (constant-free) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries express the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] database properties. Since results on the certainty variant of SM-semantics and PM-semantics are mostly straightforward from results on the possibility variant, we keep the treatment of the certainty variant short.

4. APPLICATIONS

The extension of Datalog with disjunction increases the expressive capability. It is possible to formulate many queries that cannot be expressed in standard Datalog. Among these queries are classical optimization problems such as Traveling Salesman, Maximum Clique, Minimal Vertex Cover, and so on [Papadimitriou 1994] which are NP-hard and NP-easy, that is, solvable in polynomial time with an oracle for NP.

In this section, we show several practical queries that can be formulated in disjunctive Datalog, but not in standard Datalog unless PH collapses, which is generally believed to be false (see Papadimitriou [1994]). In our examples involving optimization problems, we use integers 0, 1, . . ., and the standard arithmetic operations (resp., predicates) of addition and multiplication. These operations can be easily defined from the standard successor relation; notice that we show in Section 6 how an enumeration of the database universe, which corresponds to the usual successor function on an interval of the integers, can be generated through a [DATALOG.sup.[disjunction]v,[not equal to]] program.

For the sake of brevity, we omit in the examples detailed formal proofs for the correctness of query programs. Such proofs are not difficult to obtain, however. For this concern, we also provide a formal result on a useful modularity property of disjunctive Datalog in the next section, which generalizes the concept of modularity embodied by stratified programs.

In our examples, we use the following notational convention. Rules of the form

[perpendicular to] [left arrow] [l.sub.1], . . ., [l.sub.m].

are a shorthand notation for

False [left arrow] [l.sub.1], . . ., [l.sub.m].

A [left arrow] False, ??A.

where False and A are new propositional letters. This proviso emulates an integrity constraint [left arrow][l.sub.1], . . . ., [l.sub.m], as False must be false in every stable model.(4)

Example 2 (Graph Coloring). Graph colorings have important applications in many areas. We show here that 3-colorability of a graph G = (V, E), as well as 3-uncolorability, can be decided in [DATALOG.sup.[disjunction], without any use of negation. Since these queries are NP- and co-NP-complete, respectively, both are not expressible in the same variant of the stable model semantics (possibility or certainty) of [DATALOG.sup.??] programs unless NP = co-NP.

A simple program for deciding 3-colorability is the following one.

Red(x)[disjunction]Green(x)[disjunction]Blue(x) [left arrow].

NotColored [left arrow] E(x,y),Red(x),Red(y).

NotColored [left arrow] E(x,y),Green(x),Green(y).

NotColored [left arrow] E(x,y),Blue(x),Blue(y).

NotColored[disjunction]Colored [left arrow].

The first rule assigns each vertex of G one of three colors. If the assignment is not a legal coloring, then NotColored is derived by one of the three subsequent rules. In this case, Colored takes the value false in the minimal model. On the other hand, if NotColored is not derivable from the assignment, a minimal model is obtained by including Colored. Thus, on any graph G, the program computes Colored true under the possibility variant of the minimal model semantics iff G is 3-colorable.

Notice that 3-colorability can be expressed in [DATALOG.sup.??] under the possibility variant of the stable model semantics. A respective program can be obtained from the preceding one by replacing the clause Red(x) [disjunction] Green(x) [disjunction] Blue(x) [left arrow] with the three clauses

Red(x) [left arrow] ??Green(x)??Blue(x).

Green(x) [left arrow] ??Blue(x)??Red(x).

Blue(x) [left arrow] ??Red(x)??Green(x).

and the clause NotColored [disjunction] Colored [left arrow] with

Colored [left arrow] ?NotColored.

A disjunctive program for deciding 3-uncolorability can be obtained from the disjunctive program for 3-colorability by the following modification: drop the rule NotColored [disjunction] Colored [left arrow]. and add instead the rules

Red(x) [left arrow] NotColored.

Green(x) [left arrow] NotColored.

Blue(x) [left arrow] NotColored.

These rules generate the maximal possible extension for the intentional predicates Red, Green, and Blue if NotColored is true, and effect that the (trivial) model consisting of all ground atoms is the only model if each possible assignment of colors to the vertices does not yield a 3-coloring. In fact, if an assignment is a 3-coloring, then the program has a model in which NotColored is not contained; consequently, NotColored is not contained in any minimal model. Thus the graph G is not 3-colorable iff the program has on G a minimal model containing NotColored. Consequently, the program computes NotColored true on input G under the possibility (as well the certainty) variant of the MM-semantics iff G is not 3-colorable.

Notice that no simple modification of the previous program yields a [DATALOG.sup.??] program that decides 3-uncolorability under the possibility variant of the stable model semantics; no such program exists unless NP = co-NP.

In Section 8, we encounter a generalized 3-coloring problem that can be expressed by a simple, intuitive [DATALOG.sup.[disjunction] program; however, that query is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete, and can thus not be expressed in [DATALOG.sup.??] under either possibility or certainty semantics.

Example 3 (Traveling Salesman). It is well known that finding an optimal solution to the Traveling Salesman Problem (TSP) is intractable. Recall that the problem is, given cities [c.sub.1], . . ., [c.sub.n], find a round trip that visits all cities in sequence and has minimal traveling cost, that is, a permutation [Tau] of 1, . . ., n such that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

is minimum, where c(i, j) is the cost of traveling from [c.sub.i] to [c.sub.j], given by an integer.

Computing an optimal tour is both NP-hard and co-NP-hard. In fact, in Papadimitriou [1984] it was shown that deciding whether the cost of an optimal tour is even is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete, and in Krentel [1988] it was shown that computing this cost (or even a single bit) is F [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete under so-called metric reductions. Hence this is not possible in [DATALOG.sup.??] (unless PH collapses). Here, we describe a disjunctive Datalog program [Pi] that can do this under the stable semantics.

Suppose that the cities are encoded by 1, n and that the intercity traveling costs are stored in a relation C(i, j, v) where v = c(i, j). We assume that a built-in predicate is available which tells whether x = y + z. In abuse of notation, we simply refer to the number n of cities (which is provided by an input relation) by itself.

The following program [[Pi].sub.1] computes legal tours and their costs in its stable models:

(1) T(i, j)[disjunction]T(i, j) [left arrow] [is less than or equal to] n, j [is less than or equal to] n.

(2) [perpendicular to] [left arrow] T(i, j), T(i, k), j [not equal to] k.

(3) [perpendicular to] [left arrow] T(i, k), T(j, k), i [not equal to] j.

(4) Chosen_Stop(i) [left arrow] T(i, j).

(5) [perpendicular to] [left arrow] ??Chosen_Stop (i), 1 [is less than or equal to] i [is less than or equal to] n.

(6) P_Value(n, x) [left arrow] T(n, i), T(1, j), C(i, j, x).

(7) P_Value(k - 1, x) [left arrow] P_Value(k, y), T(k - 1, i), T(k,i), T(k, j, z), x = y + z.

(8) Cost(x) [left arrow] P_Value(1, x).

The first clause (1) guesses a tour, where T(i, j) intuitively means that the ith stop of the tour is city j and T(i, j) that it is not.(5) By the minimality of a stable model, exactly one of T(i, j) and T(i, j) is true in it, for each i and j such that 1 [is less than or equal to] i, j [is less than or equal to] n; in all other cases, both are false.

The subsequent clauses (2) through (5) check that the guess is proper: each stop has attached at most one city, each city can be attached to at most one stop, and every stop must have attached some city. The rules (6) through (8) compute top-down the cost of the chosen tour, which is given by the (unique) atom Cost(x) contained in the model.

It holds that the stable models of [[Pi].sub.1] correspond one-to-one to the legal tours.

To reach our goal, we have to eliminate from them those that do not correspond to optimal tours. That is, we have to eliminate all tours T such that there exists a tour T' which has lower cost. This is performed by a [DATALOG.sup.[disjunction] program, which basically tests all choices for a tour T' and rules out each choice that is not a cheaper tour, which is indicated by a propositional atom NotCheaper. The following program, which is similar to [[Pi].sub.1], generates all possible choices for T'.

(1') T'(i, j)[disjunction]T'(i, j) [left arrow] 1[is less than or equal to] i [is less than or equal to] n, 1 [is less than or equal to] n.

(2') NotCheaper [left arrow] T(i, j), T'(i, k), j [not equal to] k.

(3') NotCheaper [left arrow] T'(i, k), T'(j, k), i [not equal to] j.

(4') NotChosen_Stop(i, 1) [left arrow] T'(j, 1).

(5') NotChosen_Stop(i, j) [left arrow] T'(i, j), NotChosen_Stop(i, j - 1).

(6') NotCheaper [left arrow] NotChosen_Stop(i, n).

(7') P_Value'(n, x) [left arrow] T'(n, i), T'(1, J), C(i, j, x).

(8') P_Value'(k - 1, x) [left arrow] P_Value'(k, y), T'(k - 1, i), T'(k, j),

(9') Cost'(x) [left arrow] P_Value'(1, x).

The predicates T', T', P_Value', and Cost' have the role of the predicates T, T, P_Value, and Cost in [[Pi].sub.1]. Since we do not allow negation, the test that for each stop a city has been chosen (rules (4) and (5) in [[Pi].sub.1]) has to be implemented differently (rules (4') through (6')). NotChosen_Stop(i, j) tells whether no city [is less than or equal to] j has been chosen for stop i. Thus if NotChosen_Stop(i, n) is true, then no city has been chosen for stop i, and the choice for T' does not correspond to a legal tour.

The minimal models of (1') through (9') that do not contain NotCheaper correspond one-to-one to all legal tours. By adding the following rule, each of them is eliminated which does not have smaller cost than the tour given by T.

(10') NotCheaper [left arrow] Cost(x), Cost'(y), x [is less than or equal to] y.

Thus if for a legal tour T, each choice for T' leads to the derivation of NotCheaper, then T is an optimal tour.

For the desired program, we add the following rules, applying a similar technique as in the previous program for graph 3-uncolorability.

(11') [perpendicular to] [left arrow] ?? NotCheaper.

(12') P([x.sub.1], . . . , [x.sub.n] [left arrow] NotCheaper.,

for any predicate P that occurs in a rule head of (1') through (9') except NotCheaper. The first rule enforces that NotCheaper must be contained in the stable model; consequently, it must be derivable. The other rules derive the maximal extension for each predicate P if NotCheaper is true, which is a trivial model for (1') through (9'). In fact, it is for some given tour T the only model if no choice for T' leads to a tour with cost smaller than the cost of T; otherwise, there exists another model, which does not contain NotCheaper.

Let [[Pi].sub.2] be the program consisting of the rules (1') through (12'). Then it holds that the stable models of [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] on any instance of TSP correspond to the optimal tours.(6) In particular, the optimal cost value, described by Cost(x), is contained in each stable model. Thus the program [Pi] computes on any instance of TSP under the possibility (as well as certainty) stable model semantics in Cost the cost of an optimal tour.

Notice that the preceding program can be extended for further discrimination among different optimal tours by using an order (e.g., the lexicographical order on the vectors ([Pi](1), . . . , [Pi](n)) corresponding to permutations [Pi]; see Example 4 for an implementation of lexicographic ordering) and output the first optimal tour with respect to the order.

Example 4 (Eigenvectors). The need for doing matrix computations in databases has been recently emphasized [Maier and Vance 1993]. In this context, we consider queries connected with eigenvector computations. Recall that an eigenvalue of an n X n matrix M = ([m.sub.i,j]) is a number A such that the system

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

has nontrivial solutions, where [x.sup.T] is the transpose of the vector x = ([x.sub.1], . . . , [x.sub.n]). Every such nontrivial solution x is called an eigenvector (with respect to [Lambda]). Eigenvalues/eigenvectors have important applications in many areas, for example, structural analysis, quantum chemistry, power system analysis, stability analysis, VLSI design, and geophysics [Cullum and Willoughby 1985].

It is well known that the eigenvectors of a real matrix M form with the zero vector a vector space and thus, in general, infinitely many eigenvectors exist. We consider here the problem of computing integer eigenvectors e = ([e.sub.1], . . . , [e.sub.n]) such that |[e.sub.i]| [is less than or equal to] c, for every i = 1, . . . , n, where c is an integer constant.,(7) We exhibit natural queries on such eigenvectors that cannot be expressed in DATALOG ?? but in disjunctive Datalog (unless PH collapses).

Suppose that an integer n X n matrix M is stored in a database using a 3-ary relation M such that (i, j, v) [element of] M iff [m.sub.i,j] = v. We use numbers 1, . . . , n with the usual successor relation to index the entries of M. As before, we refer in abuse of notation to the predecessor of i simply as i - 1, and to the dimension n of M by itself.

In the sequel, we tacitly restrict to bounded eigenvectors. For arithmetic, we use built-in predicates for telling whether x = y + z and x = y [multiplied by] z, respectively.

Consider the following program 7T1, which computes the eigenvectors of M in its stable models.

(1) E(i, x)[disjunction]E(i, x) [left arrow] [is less than or equal to] 1 [is less than or equal to] i [is less than or equal to] n, -c [is less than or equal to] x [is less than or equal to] c.

(2) [perpendicular to] [left arrow] E(i, x), E(i, y), x [not equal to] y.

(3) Chosen(i) [left arrow] E(i, x).

(4) [perpendicular to] [left arrow] ?? Chosen(i), 1 [is less than or equal to] i [is less than or equal to] n.

(5) S(i, 1, x) [left arrow] M(i, 1, y), E(1, z), x = y [multiplied by] z.

(6) S(i, k, x) [left arrow] S(i, k - 1, y), M(i, k, v), E(k, w), z = v [multiplied by] w, x = y + z.

(7) [perpendicular to] [left arrow] E(i, y), S(i, n, x), x [not equal to] y.

The first rules (1) implement a choice for an eigenvector e from all bounded n-vectors.(8) Here, E(i, x) represents that the ith component of the vector has value x. By the minimality of a stable model, exactly one of the atoms E(i, x) and R(i, x) is true for each pair of i and x such that 1 [is less than or equal to] i [is less than or equal to] n and -c [is less than or equal to] x [is less than or equal to] c, and both are false in the other case. The rules (2) through (4) check whether the guess is proper; that is, for each component of e precisely one integer has been chosen. The eigenvector condition is checked by rule (7); to this end, the value of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is Computed bottom-up in S(i, k, x) by the rules (5) and (6).

It holds that, on any input matrix M, the stable models of program [[Pi].sub.1] correspond to the eigenvectors of M in the obvious way.

Now assume that we are interested in eigenvectors e that have maximum norm; that is, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is maximum. In particular, we pose the query whether such an e exists where a particular component, say [e.sub.1], has a specific value; for example, [e.sub.1] = 0.

In the previous program [[Pi].sub.1], we essentially have to restrict the stable models to those that contain E(1, 0) and whose corresponding eigenvectors have maximal norm. The former is easily achieved, by simply adding to [[Pi].sub.1] the rule

E(1, 0) [left arrow].

Implementing the latter is more cumbersome. Given a stable model corresponding to an eigenvector e, testing whether e is maximal amounts to deciding that every eigenvector e' satisfies [parallel] e [parallel] [is less than or equal to] [parallel] e [parallel]. This test is implemented by a [DATALOG.sup.[disjunction] program, which basically considers all possible choices for a bounded n-vector, and rules out each choice that does not correspond to an eigenvector e' with greater norm than e by deriving a propositional atom NotGreater. If all choices are ruled out, then e is in fact maximal.

Consider the following program, which is similar to the program [[Pi].sub.1].

(1') E'(i, x)[disjunction]E(i, x) [left arrow] 1 [is less than or equal to] i [is less than or equal to] n, -c [is less than or equal to] x [is less than or equal to] c.

(2') NotGreater [left arrow] E'(i, x), E'(i, y), x [not equal to] y.

(3') NotChosen(i, -c) [left arrow] E(i, -c).

(4') NotChosen(i, j) [left arrow] E'(i, j), NotChosen(i, j - 1).

(5') NotGreater [left arrow] NotChosen (i, c).

(6') S'(i, 1, x) [left arrow] M(i, 1, y), E(1, z), Prod(x, y, z).

(7') S'(i, k, x) [left arrow] S'(i, k - 1, y), M(i, k, v), E(k, w), z = v [multiplied by] w, x = y + z.

(8') NotGreater [left arrow] E(i, y), S'(i, n, x), x [not equal to] y.

Here, E', P', and S' have the role of E, E, and S in [Pi]. The first rule (1') guesses a bounded vector; the validity of the guess is checked by the rules (2') through (5'). Since we may not use negation, the test that some value has been chosen for component i is implemented differently (rules (3') through (5')). Here NoWhosen(i,j) tells that no number [is less than or equal to] j has been chosen for component i. Thus, if NoWhosen(i, c) is true, no number has been chosen for component i, and NotGreater is derived.

The stable models of this program (which coincide with the minimal ones) that do not contain NotGreater correspond one-to-one to the eigenvectors of M. By adding the following rules, we can rule out all eigenvectors e' with [parallel] e [parallel] [is less than or equal to] [parallel] e [parallel]

(9') N(1, x) [left arrow] E(1, y), x = y [not equal to] y.

(10') N(i, x) [left arrow] N(i - 1, v), E(i, y), z = y [not equal to] y, x = v + z.

(11') N'(1, x) [left arrow] E'(1, y), x = y [not equal to] y.

(12') N'(i, x) [left arrow] N'(i - 1, v), E'(i, y), z = y [not equal to] y, x = v + z.

(13') NotGreater [left arrow] N(n, x), N'(n, y), y [is less than or equal to] x.

The first two rules compute bottom-up [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in N(n, x), and analogously the subsequent two rules [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in N'(n, x). The last rule derives NotGreater iff [parallel] e' [[parallel].sup.2] [is less than or equal to] [parallel] e' [[parallel].sup.2] which is equivalent to [parallel] e' [parallel] [is less than or equal to] [parallel] e' [parallel].

Thus, whenever for the eigenvector e given by a stable model of [Pi], NotGreater is derived upon all possible choices for an eigenvector e', then e is maximal. The checking part is added, together with the following linking rules, where P ranges over all predicates in rule heads of the checking part except NotGreater.

(14') [perpendicular to] [left arrow] ?? NotGreater.

(15') P([x.sub.1]1, . . . , [x.sub.n]) [left arrow] NotGreater.

The first rule states that NotGreater must be derivable; the other rules generate, applying a similar technique as in the program for 3-uncolorability in Example 2, the maximal extension for every predicate P defined in the checking part if NotGreater is true. This is trivially a model, and in fact the only one if NotGreater is in each model of the checking part; this, however, is true precisely if the choice for E corresponds to a maximal eigenvector e.

Let [[Pi].sub.2] be the program comprising the rules (1') through (15'). Then the stable models of the overall program [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] correspond one-to-one to the maximal eigenvectors e of M that have [e.sub.1] = 0. Since NotGreater is contained in every stable model, it follows that [Pi] computes NotGreater true on input M and I under the possibility stable model semantics iff such an eigenvector e exists.

Notice that the use of disjunction in the checking part of [Pi] is crucial as opposed to the part generating all eigenvectors e, and can most likely not be eliminated. One can show that the computed query is both NP- and co-NP-hard; precisely, the query is complete for the class [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], that is, the class of problems solvable in polynomial time with logarithmically many NP oracle queries) [Eiter and Gottlob 1995a]. Thus the query cannot be formulated in DATALOG ?? (unless PH collapses).

Computing a single maximal eigenvector does not specify a generic query, unless this vector is uniquely determined. By discrimination among the maximal eigenvectors, it is possible to single out a unique such vector. A natural discrimination is by lexicographical ordering, so that the task is to compute the first maximal eigenvector in this order. This can be accomplished by the following clauses.

(20) LexEqual(1) [left arrow] E(1, x), E'(1, x).

(21) LexEqual(i) [left arrow] LexEqual(i - 1), E(i, x), E'(i, x).

(22) NotBefore [left arrow] E(1, x), E'(1, y), x [is less than] y.

(23) NotBefore [left arrow] LexEqual(i - 1), E(i, x), E'(i, y), x [is less than] y.

(24) NotBefore [left arrow] LexEqual(n).

(25) NotGreater [left arrow] NotBefore.

The propositional letter NotBefore is derived if the vector corresponding to E' is not listed before the one corresponding to E in the lexicographic ordering, and rules out the choice of E' for a proof that E is not the first maximal eigenvector. Note that computing the lexicographically first maximal eigenvector is an [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete query [Eiter and Gottlob 1995a], and is thus not expressible in DATALOG ?? (unless PH collapses).

Even harder queries on eigenvectors can be expressed. For example, suppose eigenvectors that are maximal modulo a specified subset I of the components are considered, that is, the maximal eigenvectors with respect to the partial order [[is less than or equal to].sub.I], where x [is less than or equal to].sub.I] y iff x and y coincide on the components in I and [parallel] x [parallel] [is less than or equal to] [parallel] y [parallel]. The query whether a maximal such eigenvector e exists where [e.sub.1] = 0 can be expressed by adding to the preceding program [[Pi].sub.1] the clause

(30) NotGreater [left arrow] I(i), E(i, x), E'(i, y), x [not equal to] y.

(assuming that I is provided by a unary input relation). This query is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete [Eiter and Gottlob 1995a], and thus much harder than the previous queries; it is among the hardest queries that can be expressed in disjunctive Datalog.

In the preceding examples, candidate solutions are generated by one component of the program and tested by another. This is reminiscent of Datalog extended with a choice construct, by which such candidates can be nondeterministically generated. As shown by Sacca and Zaniolo [1990], choice can be formulated using stable models of a normal program that has unstratified rules. However, in this setting, the testing phase where the validity of a choice is checked is limited to a polynomial time computation. On the other hand, as shown by Examples (3) and (4), disjunctive Datalog is capable of expressing problems whose testing phase after the choice is in co-NP and might even bear an co-NP-complete problem. Therefore, unless the Polynomial Hierarchy collapses, those examples can not be formulated in Datalog with choice. Note that choice can be implemented using head-cycle-free disjunction [Ben-Eliyahu and Palopoli 1994]. Loosely speaking, a program is headcycle-free if no two distinct atoms that occur together in the head of a rule mutually depend on each other. Since, by the results in [Ben-Eliyahu and Palopoli 1994; Eiter et al. 1996], headcycle-free programs capture NP (resp., co-NP) under the possibility (resp., certainty) variant of the stable model semantics, it is not surprising that the programs exhibited in Examples (3) and (4) are not headcycle-free. In fact, for the respective queries no such programs exist (unless the Polynomial Hierarchy collapses).

5. MODULARITY OF DISJUNCTIVE DATALOG

In the examples of the previous section, we have designed query programs in a hierarchical fashion. Following the guess and check paradigm, we designed a basic module [[Pi].sub.1] that generates a number of possible solution candidates, and a module [[Pi].sub.2] that takes the output of [[Pi].sub.1] (i.e., the possible solution candidates) and eliminates from them those which do not fulfill certain criteria. Such a methodology, which is quite natural, should undoubtedly be possible in a reasonable query language. In fact, if a query program grows more complex, it must be possible to structure the program into modules in order not to lose the general view and to understand more easily how the program works. This requirement is well acknowledged, and led to the formulation of desirable modularity properties for normal and disjunctive logic programs.(9) The notion of modularity of programs is related to invariance under unfolding (cf. [Brass and Dix 1995]) which was independently established for the static semantics [Przymusinski 1994, 1995] and for the D-WFS semantics [Brass and Dix 1995].

Notice that stratified programs enjoy such a modularity property. If [[Pi].sub.0], [[Pi].sub.1], . . . , [[Pi].sub.k] is a stratification of a program [Pi], then each stratum [[Pi].sub.i] can be seen as a module which is on top of all lower strata, that is, [[union].sub.j] [is less than] i] [[Pi].sub.j]. We consider here a more general concept, in which modules can be even unstratified programs (cf Ullman [1989]); it shows that hierarchical structured query programming is possible in disjunctive Datalog using stable models. This makes the language, besides its declarative nature, appealing as a powerful extension of familiar datalog.

The following definition states a syntactic condition for the intuition that a module [[Pi].sub.2] is placed on top of a module [[Pi].sub.1].

Definition 5.1 Let [[Pi].sub.1] and [[Pi].sub.2] be programs. We say that [[Pi].sub.2] potentially uses [[Pi].sub.1], or [[Pi].sub.1] is independent of [[Pi].sub.2] (denoted [[Pi].sub.2] ?? [[Pi].sub.1]), iff each predicate that occurs in some rule head of [[Pi].sub.2] does not occur (positively or negatively) in [[Pi].sub.1].

Example 5. Consider the following ground programs [[Pi].sub.1] and [[Pi].sub.2].

[[Pi].sub.1] : D(a) [disjunction] B(c) [left arrow] A(c, a), ?? A(c, b). [[Pi].sub.2] : E(a, a) [disjunction] F(a, b) [left arrow] ?? C(b), D(a). A(c, a) [left arrow] ?? C(c), ?? B(a).

It holds that [[Pi].sub.2] ?? [[Pi].sub.1]. Notice that for the split of [[Pi].sub.1] into the subprograms [[Pi].sub.1,1] and [[Pi].sub.1,2] consisting of the first and the second rule, respectively, [[Pi].sub.1,1] [is less than or equal to] [[Pi].sub.1,2] and [[Pi].sub.1,2] [is less than or equal to] [[Pi].sub.1,1]. (In fact, 7T, is unstratifiable.)

Further examples are the programs for the Traveling Salesman Problem and the eigenvector problems in Section 4. There we have that [[Pi].sub.2] ?? [[Pi].sub.1].(10) Notice that, if [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] such that [[Pi].sub.2] ?? [[Pi].sub.1], then ground ([[Pi].sub.2], C) ?? ground ([[Pi].sub.1], C) for every set of constants C; in particular, [[Pi].sub.2][D] ?? [[Pi].sub.1][D] for every database D.

Intuitively, if [[Pi].sub.2] ?? [[Pi].sub.1] and [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] is a ground, then the stable models of [[Pi].sub.1] [union] [[Pi].sub.2] are obtained by computing the stable models of [[Pi].sub.2], augmented with all facts a [left arrow]. such that atom a is true in a particular stable model of [[Pi].sub.1]. Indeed, the rules of [[Pi].sub.2] cannot have any effect on the rules of [[Pi].sub.1]. This intuition is formally justified by the next lemma. For any program [Pi] and interpretation M, let M [union] [Pi] denote the program [Pi] [union] {a [left arrow]. | [element of] M}.

LEMMA 5.1(11) Let [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] be a ground program such that [[Pi].sub.2] ?? [[Pi].sub.1]. Then,

(i) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

PROOF

(i): Let M [element of] SM([Pi]) and let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Assume [M.sub.1] [is not an element of] SM ([[Pi].sub.1]). Clearly, [M.sub.1] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; hence there exists an [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] such that M' [subset] [M.sub.1]. Let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; note that [M.sup.*] [subset] M. We show that [M.sup.*] is a model of [[Pi].sup.M], which raises a contradiction to the hypothesis M [element of] SM([Pi]). Clearly, [M.sup.*] satisfies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Since [M.sub.1] [subset or equal to] M, we have that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; hence [M.sup.*] also satisfies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Moreover, [M.sup.*] also satisfies each clause of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. In fact, consider any such clause C = [r.sub.1] [disjunction] . . . [disjunction] [r.sub.n] [left arrow] [q.sub.1], . . . , [q.sub.m]. Assume C is not satisfied by [M.sup.*]; that is, each [q.sub.j] is satisfied by [M.sup.*] but none of the [r.sub.i]. Since [[Pi].sub.2] ?? [[Pi].sub.1] and [[Pi].sub.1], [[Pi].sub.2] are ground, each [r.sub.i] belongs to [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; by definition, [M.sup.*] and M coincide on [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], thus M does not satisfy [r.sub.i]. On the other hand, since each [q.sub.j] is a positive literal and [M.sup.*] [subset] M, M satisfies each [q.sub.j]. Consequently, M does not satisfy C, which implies M [is not an element of] MM([[Pi].sup.M]), and hence M [is not an element of] SM ([[Pi].sub.1] [union] [[Pi].sub.2]); thus we arrived at the desired contradiction. It follows M, [element of] SM([[Pi].sub.1]).

(ii): ([subset or equal to]. Let [M.sub.0] [element of] SM([Pi]) and let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. By (i), M [element of] SM([[Pi].sub.1]); it thus suffices to show that [M.sub.0] [element of] SM(M [union] [[Pi].sub.2]). Clearly, [M.sub.0] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Assume there exists a smaller model [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Clearly, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; hence, it follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which implies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; since [[Pi].sub.1], [[Pi].sub.2] are ground, it follows that [M.sup.*] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This is in contradiction to the hypothesis [M.sub.0] [element of] SM([Pi]). Therefore [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which means [M.sub.0] [element of] SM(M [union] [[Pi].sub.2]). This proves the [subset or equal to]-direction.

([contains or equal to]). Let [M.sub.0] [element of] SM(M [union] [[Pi].sub.2]) for some M [element of] SM([[Pi].sub.1]). We have to show that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Clearly M contains only atoms from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and therefore [[Pi].sub.2] ?? M. Thus, by (i), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Since M [subset or equal to] [M.sub.0], it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. From this it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; since [M.sub.0] satisfies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [[Pi].sub.1], [[Pi].sub.2] are ground, it follows that [M.sub.0] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It remains to show that [M.sub.0] is minimal. Towards a contradiction, assume there exists a smaller model [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is obviously a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] would be a smaller model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], than M, which would contradict the hypothesis that M is a stable model of [[Pi].sub.1]. Thus [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Consequently, [M.sup.*] is a model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This is a contradiction to the hypothesis that [M.sub.0] is a minimal model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It follows that [M.sub.0] is a minimal model of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Consequently, [M.sub.0] [element of] SM([[Pi].sub.1] [union] [[Pi].sub.2]). This completes the proof.

Example 5 (ctd.). [[Pi].sub.1] has the unique stable model {A(c, a), D(a)}. Thus [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As a consequence of Lemma 5.1, we can indeed write query programs that use stable models in a modular fashion, such that modules [[Pi].sub.2] use the output of modules [[Pi].sub.1]. On the other hand, it is possible to break up any given query program [Pi] efficiently into components and then compute the stable models bottom-up, according to a dependency relation between components. This property is important for implementation, and has been exploited already in systems based on Datalog [Ullman 1989].

Say that a predicate P directly depends in [Pi] on predicate Q, if P occurs in the head of some rule of [Pi] and Q occurs (positively or negatively) in the same rule; P depends on Q (in [Pi]) if the pair (P, Q) is in the transitive closure of the direct dependency relation (cf. Dix [1992]). P and Q are mutually recursiue if P depends on Q and vice versa. Intuitively, if P depends on Q, the extension of Q may influence derivability of ground atoms on P; if P and Q are mutually recursive, they ought to be computed in the same module. Guided by this intuition, we introduce a formal notion of program decomposition.

A subset C of the rules of [Pi] is called a complete component of [Pi] (cf. Ross [1994]), if for every predicate P that occurs in the head of any rule from C, every rule from [Pi] belongs to C that has P in the head or some predicate Q mutually recursive with P. A predicate Q is defined by a complete component C, if Q occurs in the head of some rule of C. The following proposition, which is easily proved from the definitions, relates complete components with relation ??.

PROPOSITION 5.2 Let [Pi] = [[Pi].sub.1] [union] [[Pi].sub.1] such that [[Pi].sub.1] ?? [[Pi].sub.1]. Then [[Pi].sub.1] and [[Pi].sub.1] are complete components of [Pi].

Call a collection ?? = {[[Pi].sub.1], . . . , [[Pi].sub.n]) of complete components [[Pi].sub.i] of program [Pi] a decomposition of [Pi] if the [[Pi].sub.i] are pairwise disjoint and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Notice that each stratification of [Pi] constitutes a decomposition (but not vice versa); thus the concept of decomposition is more general. The complete component [[Pi].sub.j] depends on [[Pi].sub.i] (denoted [[Pi].sub.i] [is less than or equal to] [[Pi].sub.j]) iff i = j or some predicate defined in [[Pi].sub.i] depends on some predicate defined in [[Pi].sub.i]. Notice that [is less than or equal to] defines a partial order on ??. Intuitively, the relations computed by [[Pi].sub.i] serve as input to [[Pi].sub.j]; the stable models of [Pi], restricted to the predicates defined in [[Pi].sub.j], are computable as the stable models of [[Pi].sub.j] using the output (i.e., the stable models) of all components on which [[Pi].sub.i] depends as input. The following theorem expresses this for components on top of a decomposition (i.e., those on which no other components depend).

Theorem 5.3 Let ?? be a decomposition of a program [Pi], let [[Pi].sub.t], {[[Pi].sub.t], ...,[[Pi].sub.k]) [subset or equal to] ?? be a subset of those complete components maximal under [is less than or equal to], (i.e., on which no other component depends), and let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = [Pi] - [[Pi].sub.t]. Then, given any database D such that [HU.sub.[Pi]] [subset or equal to] U (D),

(i) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

PROOF. Note that for any such database D, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Moreover, the different complete components [[Pi].sub.i], [[Pi].sub.j] define different predicates. Thus, the theorem follows from Lemma 5.1. []

Corollary 5.4 For ?? and [Pi] as in Theorem 5.3,

(a) if [[Pi].sub.t] = [Pi], then for every database D s.t. [HU.sub.[Pi] [subset or equal to] U (D),

SM([Pi], D) = {[M.sub.1] [union] ... [union] [M.sub.k]/[M.sub.i] [element or] SM([[Pi].sub.i], D), i = 1, ... k};

(b) if D [element of] D (P) such that [HU.sub.[Pi] [subset or equal to] U(D) and P [element of] P for each predicate P where ??P occurs in the body of some rule in [Pi], then

(i) SM([Pi], D) = MM([Pi], D) and for every M [element of] MM([Pi], D), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

(ii) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

As a consequence, the stable models of [Pi] on D can be computed bottom-up according to the ranking [r.sub.??] of the [[Pi].sub.i]s in ?? induced by [is less than or equal to]; that is, [r.sub.??]([[Pi].sub.j]) = 0 for each [[Pi].sub.j] which is minimal under [is less than or equal to], otherwise. The stable models of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are computed from the stable models M of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], by computing for each such M all unions M' = [M.sub.1] [union] ... [union] [M.sub.k] of stable models [M.sub.i] of (M [union] [[Pi].sub.i])[D], where [[Pi].sub.1] . . . , [[Pi].sub.k] are the complete components with rank r; for r = 0, the [M.sub,i] are the stable models of [[Pi].sub.i][D]. Notice that the [M.sub.i] can be computed in parallel. The stable models of [[Pi][D] are given by [[Pi].sup.[is less than or equal to]r][D] for the highest rank r.

With regard to computation, decompositions that break up [Pi] into small components are preferable. They lead to smaller programs for which stable models have to be computed; however, a larger degree of parallelism is possible. Say that decomposition ?? is a refinement of another decomposition ?? iff for every component [Pi] in ??, there is some component [Pi]' in ??, such that C' is a subset of C. It is easily checked that the collection of all complete components that are minimal with respect to inclusion (called strongly connected components (SCCs) in Ullman [1989] and Ross [1994]), is a decomposition and a refinement of any other decomposition, and hence the unique most refined decomposition of [Pi].

Example 5 (ctd.). Since [[Pi].sub.2] ?? [[Pi].sub.2], by Proposition 5.2 [[Pi].sub.1], and [[Pi].sub.2] are complete components of [Pi] = [[Pi].sub.1] [union] [Pi].sub.2]. In fact, they are the SCCs of [Pi]. (For a more involved example, see the program [[Pi].sub.1], computing an enumeration of the universe in the proof of Theorem 6.3). []

Notice that the SCCs of [Pi] can be determined efficiently. Thus one can decompose [Pi] into its SCCs (even at compile time), and then compute the stable models bottom-up. In particular, the decomposition into the SCCs supports parallelization of model computation to the highest degree among all decompositions, and is thus appealing for an implementation of Datalog that exploits parallel processing for query evaluation.

We remark that in Lifschitz and Turner [1994], independent of Eiter et al. [1994] and at the same time, a modularity property for the answer set semantics to extended disjunctive logic programs (which contain two kinds of negation) has been stated which is very similar (and in fact equivalent) to Lemma 5.1, based on the notion of the splitting set. With respect to our framework, a set of atoms S is a splitting set of a ground program [Pi] if every rule that has an atom from S in the head has all its atoms in S; denote by split([Pi], S) the collection of all such rules. Clearly, if [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2] and [[Pi].sub.2] ?? [[Pi].sub.1], then the set of atoms in [[Pi].sub.1], is a splitting set of [Pi]; conversely, if S is a splitting set of [Pi], then ([Pi] - split([Pi], S)) ?? split([Pi], S) if the ground atoms are seen as propositional letters. Under this view, the main result of Lifschitz and Turner [1994] (Splitting Set Theorem) amounts to programs with only one kind of negation precisely as Lemma 5.1 (Lemma 6.1 in Eiter et al. [1994]). It is no surprise that the same modularity property has been found independently in slightly different settings, since it is quite natural. Moreover, Lifschitz and Turner extend their result to iterated splittings of a program (Splitting Sequence Theorem), which decomposes the program into a chain of modules. This result corresponds to the previous observation about bottom-up computability of the stable models, but does not exploit parallelism in this process which is possible by the partial ordering of modules through dependency.

6. EXPRESSIVE POWER OF DISJUNCTIVE DATALOG

In this section, we address the issue of the expressive power of disjunctive Datalog. The main result is that disjunctive Datalog captures [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under possibility stable model semantics; that is, from the generic total queries, it can express precisely those that are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable. Thus the expressive power of disjunctive Datalog is one level above the expressive power of normal Datalog under stable model semantics [Schlipf 1995b] or fixpoint semantics [Kolaitis and Papadimitriou 1991], which is NP.

An interesting result is that the full expressive power is already available with [DATALOG.sup.[disjunction], [not equal to]] programs; hence the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], queries also capture [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], provided that inequality literals are allowed in the body. Without inequality literals, this is not the case; we discuss this in detail in the next section. This result also shows that the expressive power of circumscription [McCarthy 19861, applied to function-free first-order theories, is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under brave (possibility) inference.

Our plan for this section is as follows. We consider first Boolean (generic) queries and then general (generic) queries. In order to assure genericity of queries, we focus on disjunctive Datalog programs that are constant-free. Constants in queries are not an issue, since they can be provided by designated input relations (see Abiteboul et al. [1995, Section 9]).

The following proposition on the complexity of minimal, stable, and perfect models of propositional programs is well known.

Proposition 6.1 [Eiter and Gottlob 1995b] Let [Pi] be a propositional (i.e., ground) program and let p be a propositional atom. Deciding if p [element of] M for some M [element of] ?? ([Pi]) is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete for ?? = MM, SM, PM.

Proof. (Membership) A guess for a model M [element of] ??([Pi]) such that p [element of] M, which is polynomial in the size of [Pi], can be checked in polynomial time with an NP oracle. This proves membership in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

We obtain from this easily the following complexity bound on the recognizability of disjunctive Datalog queries.

Lemma 6.2 The [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable, for ?? = MM, SM, PM.

Proof. The QOT problem (cf. Section 2) for query [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] defined by the [DATALOG.sup.[disjunction], ??] program [Pi], trivially reduces to deciding, given a database D, if a grounded atom a belongs to some M [element of] ??([Pi], D) = ??([Pi][D]). For fixed [Pi], the size of ground([Pi], U) is polynomial in the size of D; hence, [Pi][D] can be clearly constructed in polynomial time. Thus, the lemma follows from Proposition 6.1. []

On the other hand, we show that every [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable query can be expressed in disjunctive Datalog under possibility inference. In fact, we show that this holds already for the fragment of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], queries, that is, under possibility minimal model inference.

We start by exploring the expressiveness of Boolean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], queries, which is useful for deriving the expressiveness of general [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries with output relations.

Theorem 6.3 The Boolean constant-free [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries coincide with the class of all database properties in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Proof. By Lemma 6.2, it remains to show that every database property ?? in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can be expressed by a Boolean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], query.

For any such ??, we construct a constant-free Boolean [DATALOG.sup.[disjunction], [not equal to]] query program [Pi] with IO([Pi]) = (R, W), such that for D [element of] D(R), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (D) | = W iff ??(D) = true.

The construction uses a lemma, whose proof can be found in Eiter et al. [1994, 1996]. We need some concepts first. Assume that S and F, L are distinguished relation symbols that do not occur in R. The class of enumerated databases over R is the collection of all databases D over R, S, F, L such that the binary relation S describes an enumeration of all elements in U, where (x, y) [element of] S means that y is the successor of x, and the unary relations F and L contain the first and the last element in the enumeration, respectively.

Lemma 6.4 [Eiter et al. [1994, 19961 A database property ?? on R is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff ?? is definable on the enumerated databases over R by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] formula T [Psi] over vocabulary R, S, F, L of the form

(2) ([inverted] E S)([inverted] A T)([inverted] E x) [Psi] (x),

where [Psi](x) = [??.sub.1] (x) [disjunction]... [disjunction] [??.sub.k](x) and the [??].sub.i] are conjunctions of literals.

We compose the program [Pi] of two modules [[Pi].sub.1] and [[Pi].sub.2] such that [[Pi].sub.2] ?? [[Pi].sub.1]. The lower module, [[Pi].sub.1] computes an enumeration of the universe of the given database in a minimal model, such that the minimal models containing enumerations are those in which a designated propositional atom Order is true; every enumeration is in some minimal model. The upper module, [[Pi].sub.2], evaluates the formula [Psi] for ?? on the database, by using the enumeration output by [[Pi].sub.1].

Program [[Pi].sub.1] is as follows.

(1) x [is less than] y [disjunction] y [is less than] x [left arrow] x [not equal to] y.

(2) x [is less than] z [left arrow] x [is less than] y, y [is less than] z.

(3) S (x, z) [left arrow] z [is less than] x.

(4) S (x, z) [left arrow] x [is less than] y, y [is less than] z.

(5) S(x, x) [left arrow.

(6) S (x, y) [disjunction] S(x, y) [left arrow].

(7) F(x) [left arrow] y [is less than] x.

(8) F(x) [disjunction] F(x) [left arrow].

(9) L(x) [left arrow] x [is less than] y.

(10) L(x) [disjunction] L(x) [left arrow].

(11) G(x) [left arrow] F(x).

(12) (G(y) [left arrow] G(x), S(x, y).

(13) Order [left arrow] G(x), L(x).

The predicates defined in the strongly connected components of [[Pi].sub.1], ordered by dependency, are shown in Figure 1.

Informally, a minimal model of [Pi].sub.1] on U is built bottom-up as follows: Denote for any linear order [is less than] on U by [S.sub.[is less than]], [F.sub.[is less than]], and [L.sub.[is less than]] the relations that give the successor, the first, and the last element of U with respect to [is greater than]. The lowest component (rules 1) and (2)) computes in [is less than] a linear order on U. The relations S, P, and L are computed such that they are supersets of the complements of F, S, and L, and S, F, and L are computed as their complements. Dependent on that, G is computed, such that G(x) means that F gives [F.sub.[is less than]] and S gives [S.sub.[is less than]] from the first element in [is less than] up to x. Finally, the rule deriving Order is evaluated, which fires precisely if F, S, and L give [F.sub.[is less than]], [S.sub.[is less than]], and [L.sub.[is less than]], respectively.

More formally, let for any linear order [is less than] on U be [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] the interpretation of ground([[Pi].sub.1], U) in which Order is true, G = U, [is less than] gives itself, and P (resp., P) for P = S, F, L is [P.sub.[is less than]] (resp., the complement of [P.sub.[is less than]]). Then we have the following.

Lemma 6.5 Let U be a finite universe, and let M be an interpretation of ground([[Pi].sub.1], U) such that Order [element of] M. Then, M [element of] MM(ground([[Pi].sub.1], U)) iff [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for some [is less than].

Proof. Let P denote ground([[Pi].sub.1], U).

(??) It is easy to see that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a model of P, and that no atoms can be removed from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] without losing the model property. Hence [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

(??) From Corollary 5.4, it follows by the rules (1) and (2) that [is less than] is in M a linear order on U; moreover, by the rules (6), (8), and (10) the relations S, F, and L must be in M the complements of S, F, and L, respectively. Consequently, from the rules (3) through (5), (7), and (9), it follows that F, S, and L must be in M subsets of [F.sub.[is less than]], [S.sub.[is less than]], and [L.sub.[is less than]], respectively. Since Order [element of] M, it follows that G(a), L(a) [element of] M for some a [element of] U; it follows that F is in M the relation [F.sub.[is less than]]. By induction downwards, the order [is less than], we obtain for each b [element of] U whose predecessor with respect to [is less than] is a, that G(a) [element of] M and S(a, b) [element of] M and that F(a) [element of] M where a is the first element of U with respect to [is less than]. Consequently, S and F are in M the relations [S.sub.[is less than]] and [F.sub.[is less than]], respectively, and G = U. Thus, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. []

The program [[Pi].sub.2], which evaluates the formula [Psi] on an enumerated database over R, is as follows (arities of tuple variables are understood).

(1) [S.sub.i](x) [disjunction] [S.sub.i](x) [left arrow]. for each [S.sub.i] [element of S;

(2) [T.sub.j](x) [disjunction] [T.sub.j](x) [left arrow]. } for each [T.sub.j] [element of T;

(3) [T.sub.j](x) [left arrow] W. } for each [T.sub.j] [element of T;

(4) [T.sub.j](x) [left arrow] W. } for each [T.sub.j] [element of T;

(5) W [left arrow] [[Eta].sub.i] (x). for i = 1, ...., k;

In this program, the, [S.sub.i] and [T.sub.j] are new predicates of the arity of [S.sub.i] and [T.sub.j], respectively, and W is a new propositional letter. The [[Eta].sub.i] are constructed from the [??].sub.j] in [Psi] as follows. Suppose that

[[??].sub.i](x) = [l.sub.i1] [conjunction] ... [conjunction] [l.sub.i,ni];

then,

[[Eta].sub.i] (x) = [l'.sub.i,1], ..., [l'.sub.i,ni], Order,

where [l'.sub.i,j] = P(x), if [l.sub.i,j] = ??P(x) and P [is not an element of] R, =; otherwise, [l'.sub.i,j] = [l.sub.i,j].

Intuitively, program [[Pi].sub.2] works as follows. The clauses (1) choose an extension for the predicates in S, and assign their complements to S. For every such extension S, choosing W true and for all predicates in T, T' the maximal extensions (i.e., all tuples) yields an interpretation [M.sub.s] which is clearly a model of [[Pi].sub.2]. This interpretation is in fact a minimal model, if there exists no choice for the predicates in T (implemented by the clauses (2)), such that all instances of the clauses (5) have a false body. Thus, on an enumerated database D (extended by the complements. F, S, L, of F, S, and L, resp., and by the true letter Order), [M.sub.s] is a minimal model precisely if (D, S) | = ([inverted]T) ([exists]x) [Psi] (x). This means that [[Pi].sub.2] has a minimal model containing W iff D | = [Psi].

More formally, we have the following. Let R' = R, (P, P\P = F, S, L), Order and adopt IO([[Pi].sub.2]) = (R', W).

Lemma 6.6 For every D' [element of] D(R'), such that D' is an enumerated database over R, D' | ([inverted] Ax)(P(x) [equivalence] ??P(x)), P = F, S, L and D' |= Order, it holds that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF

(??) If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then some M [element of] MM([[Pi].sub.2], D') exists with W [element of] M. From Corollary 5.4 and clauses (1), (3), and (4) it is clear that M = [M.sub.s] for the predicates S as interpreted in M. Suppose (D', S) | [not equal to] ([inverted]AT)([exists]x)[Psi](x). Then there exist relations for T such that (D', S, T) |= ([inverted] AT)?? [Psi](x). Let M' be the interpretation of [[Pi].sub.2][D'] in which W is false, the predicates S, T are as in (D', S, T), and the predicates S, T are their complements. Then M' is a model of [[Pi].sub.2][D']. However, M' [subset] M; this is a contradiction of M [element of] MM([[Pi].sub.2], D'). It follows W, S) |= ([inverted] AT)([exists]x)[Psi](x), and hence D' |= [Psi].

(??) Assume D' |= [Psi]. Hence there are relations S such that (D', S) |= ([inverted] AT)([exists]x)[Psi](x). Consider the interpretation [M.sub.s]. Clearly, [M.sub.s] is a model of [[Pi].sub.2][D]. Suppose that [M.sub.s] is not minimal. Hence there exists a model M' [subset] [M.sub.s]. Clearly, by the clauses (1), M' coincides on S with M, and hence by the clauses (3) and (4) W [is not an element of] M'. By the clauses (2), either [T.sub.j](t) [element of] M' or T(t) [element of] M' for each [T.sub.j] [element of] T and tuple t [element of] [U.sup.a]([T.sub.j]), and since M' is minimal, both are not true. Thus, by the clauses (5), it follows for the relations S, T as in M that (D', S, T) |= ([inverted] Ax)??[Psi](x); the latter implies (D', S) | [not equal to] ([inverted]AT)([exists]x)[Psi](x). Thus we arrive at a contradiction. Hence [M.sub.s] [element of] MM([[Pi].sub.2], D'). Since W [element of] M, it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This proves the lemma. []

Now let [Pi] = [[Pi].sub.1] [union] [[Pi].sub.2]; notice that [[Pi].sub.2] ?? [[Pi].sub.1]. We show that, given any database D [element of] D(R), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff ?? is true on D.

(??) If [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], then there exists M [element of] MM([Pi], D) such that W [element of] M. It is easily seen that Order [element of] M. Thus, by Corollary 5.4 and Lemma 6.5, it follows that M [intersection] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for some linear order [is less than] on U. By Corollary 5.4, M [element of] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, for the database D' over R' such that D'[R] = D and D' coincides on all predicates in R' - R with [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] clearly M [intersection] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus by Lemma 6.6, D' |= [Psi]. It follows from Lemma 6.4 that ?? is true on D.

(??) Assume that ?? is true on D. Then, from Lemmata 6.4 and 6.6, it follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for every D' [element of] D(R') as described such that D'[R] = D. Fix any such D', and let M [element of] MM([[Pi].sub.2], D') such that W [element of] M For the linear order [is less than] on U such that S in D' is [S.sub.[is less than]], it follows from Corollary 5.4 and Lemma 6.5 that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

This completes the proof of the theorem. []

A similar result can be derived for [DATALOG.sup.[disjunction],??] queries under the p,SM (resp., PM) -semantics. Notice that the expressive power of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries without inequality literals is the same as for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries, since negative literals involving noninput predicates in the body of a clause can be moved to the head.

Using negation, the equality and inequality can be computed easily, for example, by the program [[Pi].sub.Eq,Neq]:

Neq(x, y) [left arrow] ?? Eq (x, y).

Eq(x, x) [left arrow].

Notice that [[Pi].sub.Eq,Neq] is stratified. Hence for every universe U, ground([[Pi].sub.Eq,Neq], U) has a unique perfect model [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] which coincides Eq with the unique stable model, in which Eq and Neq are equality and inequality on U, respectively. We obtain the following.

Lemma 6.7 For every [DATALOG.sup.[disjunction],??] query program [Pi] there exists a [DATALOG.sup.[disjunction],??] query program [Pi]' such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], inf = p, c and that [Pi]' is stratified if [Pi] is stratified.

Proof. Let IO([Pi] = (R, S), and assume that Neq and Eq do not occur in RS([Pi]), R, S (otherwise use fresh predicates). Let [[Pi].sub.2] be the program resulting from [Pi] by replacing each equality atom [x.sub.i] = [x.sub.j] with [x.sub.i], [x.sub.j] and each inequality atom [x.sub.i] [not equal to] [x.sub.j] with Neq([x.sub.i], [x.sub.j]). Let [Pi] = [[Pi].sub.2] [union] [[Pi].sub.Eq,Neq] and IO([Pi]) = IO([Pi]). Notice that [[Pi].sub.2] ?? [[Pi].sub.1], and that for every D [element of] D(R), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is the unique stable model of [[Pi].sub.Eq,Neq][D]. Thus from Corollary 5.3, it follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for every D [element of] D(R), which means [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] Clearly [Pi]' is stratified if [Pi] is. []

Theorem 6.8 The Boolean constant-free [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries precisely capture the class of all database properties in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], even if no equality and inequality literals are allowed.

Proof. Follows from Theorem 6.3 and Lemma 6.7. []

6.1 I/O Queries

Since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is closed under conjunction, it is not difficult to extend our results for Boolean queries to the case of general I/O-queries. However, proofs of such generalizations are rarely found in the literature. Therefore, we give a proof here.

Theorem 6.9 A generic total query q is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable iff it is expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] through a constant-free query-program.

Proof. By Lemma 6.2, it remains to show the if-direction.

Let q: D(R) [right arrow] D(S) be a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable generic total query. Let ?? be the database property over R' = RS defined as follows: for every D' [element of] D(R'), ??(D') = true iff D'[S] [subset or equal to] q(D'[R]), where [subset or equal to] is understood componentwise. Notice that ?? is in fact a database property. For any instance D' of R', ??(D) is equivalent to a conjunction of QOT-problems for q as follows:

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Since q is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable, it follows that ?? is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, by Theorem 6.3, there exists a constant-free [DATALOG.sup.[disjunction],[not equal to]] query program [Pi]' where IO([Pi]) = (R', W) for some 0-ary predicate W, such that for every D' [element of] D(R'), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = true. We assume that [Pi]' has exactly the form of the [DATALOG.sup.[disjunction],[not equal to] query program in the proof of Theorem 6.3. Now let [Pi] be the [DATALOG.sup.[disjunction],[not equal to]] query program with IO([Pi]) = (R, S) obtained by modifying [Pi]' as follows. For each [S.sub.i] [element of] S, replace each positive (resp., negative) occurrence of [S.sub.i] in [Pi]' with [S'.sub.i](resp.,[S'.sub.i]), where [S'.sub.i] and [S'.sub.i] are new predicates of the same arity as [S.sub.i], and add the rules

[S.sub.i](x) [left arrow] [S'.sub.i](x), W.

[S'.sub.i] (x) [disjunction] [S'.sub.i](-) [left arrow].

Intuitively, the first rule allows derivation of [S.sub.i] only if W is in the interpretation, and the second rule simulates all possible instances of [S'.sub.i].

For every D [element of] D(R), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] essentially computes the union of all M [element of] MM([Pi], D) with W [element of] M. By our construction, for each such M the database D' [element of] D(R') such that D'[R] = D and D'[S] coincides with M, the program [Pi]' has a minimal model on D' that contains W, and hence ??(D') = true (i.e., D'[S] [subset or equal to9] q(D'[R])). On the other hand, given that M' [element of] MM([Pi]', D') such that W [element of] M', where D' [element of] D(R') and D'[R] = D, the interpretation N = M' [union] {[S.sub.i](t), [S'.sub.i](t) |D'|= [S.sub.i](t)} [union] {[S".sub.i](t) | D' |[not equal to]: [S.sub.i](t)} is clearly a minimal model of [Pi] on D. Consequently, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].[]

Theorem 6.10 A generic total query q is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable iff it is expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], ?? = PM, SM, even without equality and inequality literals, through a constant-free query program.

Proof. Follows from Theorem 6.9 and Lemma 6.7. []

Symmetric results hold for the certainty semantics, if negation is allowed.

THEOREM 6.11 A generic query total q is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable iff it is expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] through a constant-free query program.

PROOF. That each query [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (resp., [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable follows from the known fact that, given a propositional program [Pi] and an atom p, deciding if p [element of] M for all M [element of] SM([Pi]) (resp., M [element of] PM([Pi]) is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (cf. Eiter and Gottlob [1995b]; the proof is dual to the membership part of Proposition 6.1).

It remains to show that each generic total query q: D(R) [right arrow] D(S) that is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable can be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Let [bar]q be the query that computes the complement of q; that is, for each [S.sub.i] [element of] S and suitable tuple t, it holds [bar]q(D)[S.sub.i](t) iff q(D) |[not equal to] [S.sub.i](t). Since q is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable, by Theorem 6.10, q is expressible by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query program [Pi]' without equality and inequality literals.

Let [Pi] be the query program with IO([Pi]) [right arrow] (R, S) obtained from [Pi]' by first replacing in the clauses each occurrence of every [S.sub.i] [element of] S with [S.sub.i], where each [S.sub.i] is a new predicate letter with arity a([S.sub.i]), and by afterwards adding the clauses

[S.sub.i](x) [left arrow] ??[S.sub.i](x), [S.sub.i] [element of] S.

Then, for any D [element of] D(R), [S.sub.i] [element of] S, and t [element of] [U.sup.a([S.sub.i]), we have that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Consequently, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is identical to q. Since [Pi]' is stratified, it follows by Proposition 3.2 that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = q. This proves the theorem.

7. PLAIN DISJUNCTIVE DATALOG

In the previous section, we have seen that adding disjunction to DATALOG is enough to gain the full expressive power of the formalism, provided that inequality literals [t.sub.1] [not equal to] [t.sub.2] may be used. It is interesting to see what happens if we disallow for such literals, and look at plain disjunctive Datalog ([DATALOG.sup.[disjunction]). On a first guess, the expressive power remains unaffected. However, this intuition is wrong, as we show that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] cannot express simple P-time computable queries, and hence not all [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable queries. This means that extending plain disjunctive Datalog by inequality literals results in the maximal increase of the expressive power, as opposed to nondisjunctive Datalog, where the increase is minor [Kolaitis and Vardi 1995].

On the other hand, compared to (plain) DATALOG, which expresses a strict subclass of the P-time queries, the expressive power of [DATALOG.sup.[disjunction] increases a lot, since [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] can express some [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard queries (cf. Section 8). Analogous to DATALOG, which expresses all P-time queries on enumerated databases (i.e., databases with a successor relation on the universe) [Vardi 1982; Immerman 19861, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] expresses all [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-recognizable queries on enumerated databases, since the inequality predicate can be easily defined from the enumeration of the universe. Notice that if we have a linear order [is less than] rather than an enumeration, then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] also expresses all of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], whereas DATALOG then expresses only a strict subset of P.

The role of inequality literals in disjunctive Datalog is complemented by an interesting phenomenon: the expressive power of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is strictly higher than the expressive power of [DATALOG.sup.[not equal to]] (which is limited to P-time queries). In particular, the inequality predicate is computable by a query in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This sounds paradoxical, and is explained as follows: although it is possible to compute the inequality predicate in a minimal model of a program [[Pi].sub.1], it is not possible to recognize this model and pass it to a program [[Pi].sub.2] that uses [[Pi].sub.1]. This contrasts with computing an enumeration of the database universe (see Section 6).

The first result previously mentioned is that the [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries do not include all P-time queries. In particular, we show that the Boolean Even-query

[q.sub.Even]: D(R) [right arrow] S,

which tells if the relation R has an even number of elements, cannot be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] without inequality literals. This is established by exploiting a weak form of monotonicity of the query programs.

Notice that for normal (i.e., nondisjunctive) Datalog, nonexpressiveness results have been shown using the tool of infinitary logic with finitely many variables ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) [Kolaitis and Vardi 1992, 1995]. It is known that many prominent query languages (e.g., stratified Datalog, fixpoint queries, while queries, etc.) can only express queries that are definable in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [Kolaitis and Vardi 1992, 1995; Abiteboul et al. 1995]. Moreover, it is known that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] has a 0-1 law; that is, every query from this language is either almost surely true or almost surely false, if the size of the universe grows to infinity [Kolaitis and Vardi 1995]. Since the Even-query does not have this property, it cannot be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and hence also not in any of the query languages subsumed by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Unfortunately, the powerful tool of infinitary logic (prove inexpressibility in a query language embedded in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] by inexpressibility in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) cannot be applied to disjunctive Datalog. In fact, as shown in Section 4, graph 3-colorability can be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; on the other hand, it has been shown that graph 3-colorability cannot be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [Dawar 1995]. Thus we have the following.

PROPOSITION 7.1 (Constant-free) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is not equivalent to any fragment of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As shown in the following, the inequality relation cannot be recognized by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query. Thus [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is even incomparable with any fragment from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] that can recognize inequality (e.g., stratified Datalog, the fixpoint queries, and the while queries).

The following lemma implies a weak form of monotonicity for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] in the sense that minimal models of programs without input relations and inequality literals grow monotonically.

LEMMA 7.2 Let [Pi] be a [DATALOG.sup.[disjunction],??] program without inequality literals, and let [U.sub.1] [subset or equal to] [U.sub.2] such that |[U.sub.1]| [is greater than or equal to] 1. Then for every [M.sub.1] [element of] MM(ground([Pi], [U.sub.1])) there exists an [M.sub.2] [element of] MM(ground([Pi], [U.sub.2])) such that [M.sub.1] [intersection] [M.sub.2] = [M.sub.1].

PROOF. We prove the claim for [U.sub.2] = [U.sub.1] [union] {a} where a [is not an element of] [U.sub.1]; the general result follows by a simple induction argument.

In what follows, let [P.sub.1] = ground([Pi], [U.sub.1]) and [P.sub.2] = ground([Pi], [U.sub.2]. Consider [M.sub.1] [element of] MM([P.sub.1]). Let c be an arbitrary constant from [U.sub.1], and denote for any formal object X (formula, rule, term, etc.) by X[a/c] the object resulting from X by uniformly replacing a with c. Let M = [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Of course, M is an interpretation of [P.sub.2]; notice that [M.sub.1] [subset or equal to] M.

We show that M is a model of [P.sub.2] Assume to the contrary that M is not a model of [P.sub.2] Then there exists a clause C = H [left arrow] B in [P.sub.2] - [P.sub.1] such that C is not satisfied by M. This means that B is satisfied by M but no disjunct of H is in M. Clearly C[a/c] [element of] [P.sub.1]. By definition of M, since B is satisfied by M, B[a/c] must be satisfied by [M.sub.1]. Since [M.sub.1] is a model of [P.sub.1] H[a/c] must also be satisfied by [M.sub.1]. Therefore there exists an atom G [element of] [M.sub.1] such that G occurs as disjunct in H[a/c]. Thus there exists an atom [G.sup.*] in H with [G.sup.*][a/c] = G and [G.sup.*] [element of] M (by definition of M). This shows that H is satisfied by M. We arrive at a contradiction.

We thus have shown that M is a model of [P.sub.2]. Consequently, there must exist a minimal model [M.sub.2] [subset or equal to] M of [P.sub.2]. Clearly [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a model of [P.sub.1]. [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], but [M.sub.1] is already minimal, it holds that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; thus [M.sub.2] [intersection] [M.sub.1] = [M.sub.1]. This proves the lemma. []

THEOREM 7.3 The Even-query is not expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] without inequality literals.

PROOF. Assume that [q.sub.Even]: R [right arrow] S is computed by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query program [Pi] without inequality literals. Let [Pi]' be the program resulting from [Pi] by removing all rules that contain negative literals involving R and all positive literals involving R from the remaining rules. For every D = (U, U) the following condition (*) holds: [Pi][D] = ground([Pi], U).

Let [D.sub.1] = ([U.sub.1], [U.sub.1]) such that |[U.sub.1]| [is greater than or equal to] 1 is even. Then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = true; from (*) it follows that M [element of] MM(ground([Pi]', [U.sub.1])) exists with S [element of] M. Now choose [D.sub.2] = [U.sub.2] [U.sub.2] such that [U.sub.1] [subset or equal to] [U.sub.2] and |[U.sub.2]| is odd. From Lemma 7.2, it follows that M [element of] ground([Pi]', [U.sub.2]) exists with S [element of] M; thus, by (*) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = true. However, this means that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is different from [q.sub.Even]. We reach a contradiction. []

COROLLARY 7.4 The Even-query is not expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

Using Theorem 7.3, it is easily shown that the complement of the Even-query also cannot be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Analogous results can be shown for certainty inference; in particular, the Even-query (as well as its complement) cannot be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

As mentioned, the inequality predicate can be easily computed by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query. Consider the following program [[Pi].sub.[not equal to]:

(1) A(x, y)[disjunction]Neq(x,y) [left arrow]

(2) A (y, z) [left arrow] Neq(x, x).

(3) Neq(y, z) [left arrow] Neq(x, x).

For any universe U, the minimal models of ground([[Pi].sub.[not equal to], U) are the interpretations M in which Neq is a subset of the inequality relation on U, and A is the complement of Neq. In particular, there is a unique minimal model in which Neq coincides with the inequality relation, which we denote by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Thus, for IO([[Pi].sub.[not equal to]) = (0, Neq), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] computes in Neq the inequality relation on U.

An interesting consequence is that each query in DATALOG' can be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

THEOREM 7.5 The expressive power of DATALOG' is strictly included in the expressive power of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

PROOF. Let q: R [right arrow] S be a query expressed by a [DATALOG.sup.[not equal to] query program [Pi]'. Assume that Neq, A are fresh predicates. Denote by [[Pi].sub.2] the program obtained from [Pi]' by replacing each occurrence of inequality literals [t.sub.1] [not equal to] [t.sub.2] by Neq([t.sub.1], [t.sub.2]). Let [Pi] = [[Pi].sub.2] [union] [[Pi].sub.[not equal to] and IO([Pi]) = (R, S). Since [[Pi].sub.2] ?? [[Pi].sub.[not equal to] by Corollary 5.4 it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], D) for every D [element of] D(R). Each program M [union] [[Pi].sub.2] is a [DATALOG.sup.[not equal to] program, and thus has a unique minimal model N on D. It is easily seen that the predicate A has in N the same extension as in M, and that N increases monotonically with M. As A does not occur in [[Pi].sub.2] and MM([[Pi].sub.[not equal to], D) = MM(ground([[Pi].sub.[not equal to], M), it follows that for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is maximal with respect to inclusion. Consequently, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and hence [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This shows that every query in [DATALOG.sup.[not equal to] is expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. The result thus follows from Proposition 7.1. []

On the other hand, the inequality relation cannot be recognized in plain disjunctive Datalog. Let [q.sub.[not equal to]: Neq [right arrow] W be the Boolean query that tells, given D = (U, Neq), whether Neq is the inequality relation on U.

THEOREM 7.6 q, is not expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

PROOF. We show that expressibility of [q.sub.[not equal to] in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] would entail that every query in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (in particular, the Even-query) can be expressed in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This raises a contradiction to Theorem 7.3.

Suppose that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for some [DATALOG.sup.[disjunction] query program [[Pi].sub.0]. Let [[Pi].sub.1] = [[Pi].sub.0] [union] [[Pi].sub.[not equal to], where [[Pi].sub.[not equal to] is the preceding program. Notice that [[Pi].sub.0] ?? [[Pi].sub.[not equal to]. Corollary 5.4 implies the following property (*): for every D [element of] D(R) such that R is disjoint with RS([[Pi].sub.1]), some M [element of] MM([[Pi].sub.1], D) exists with W [element of] M, and Neq is the inequality relation on U in each M [element of] MM([[Pi].sub.1], D) with W [element of] M.

Consider now any query q: R [right arrow] S, defined by some [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query program [Pi]'. Assume that the predicates in [[Pi].sub.1] are distinct from those in RS([Pi]'), R, S (otherwise, use fresh predicates). Let [[Pi].sub.2] be the program obtained from [Pi] be replacing each inequality atom [t.sub.1] [not equal to] [t.sub.2] with Neq([t.sub.1], [t.sub.2]), and by adding the atom W to the body of each rule. Let [Pi] = [[Pi].sub.2] [union] [[Pi].sub.1] and OI([Pi]) = (R, S). Since [[Pi].sub.2] ?? [[Pi].sub.1], it follows from Corollary 5.4 that for every D [element of] D(R),

{M - {W}|M [element of] MM([Pi], D), W [element of] M} [intersection] [HB.sub.[Pi]'[D] = MM([Pi]'[D]).

It is easy to see that, if W [is not an element of] M for M [element of] MM([Pi], D), then [S.sub.i](t) [is not an element of] M, for every [S.sub.i] [element of] S and tuple t. This implies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; since D was arbitrary, it follows [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Since no inequality literals occur in [[Pi].sub.2], this means that q is expressible in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. However, for q = [q.sub.Even] this is not true (Theorem 7.3). This proves the theorem. []

8. COMPLEXITY OF DISJUNCTIVE DATALOG

Am important aspect of a query language is its computational complexity, which is traditionally measured by data and expression complexity (cf. Vardi [19821). The data (resp., expression) complexity of a query language L is defined as the complexity of the QOT-problem in Section 2 for the queries defined by expressions E [element of] L, where E is part of the input and E (resp., D) is fixed.

The data complexity is usually closely tied to the expressive power of a query language. In particular, data complexity provides an upper bound for the expressive power. On the other hand, the fact that hard queries can be formulated does not necessarily imply that all queries within the same complexity can be formulated. A well-known example is Datalog, which can express P-time hard queries, but not all P-time queries [Kanellakis 1990]. Such query languages are not balanced in the sense that the complexity dominates the expressive power. As shown in the following, plain disjunctive Datalog suffers this property, whereas [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is balanced. Notice that allowing stratified negation in nondisjunctive Datalog programs does not establish the balance of the language, since still only a strict subset of the P-time computable queries can be expressed, where each level of stratification adds expressive power [Kolaitis 1991]. However, inspection of the proofs in Section 6 yields that a single level of stratified negation in disjunctive programs gives the full expressive power.

Similar with respect to expressibility, we observe that constants in query programs are not an issue with respect to the complexity of disjunctive Datalog, since constants can be provided by designated input relations. Thus following the complexity results also apply to disjunctive Datalog without constants.

8.1 Data Complexity

From Lemma 6.2, we have the following.

THEOREM 8.1 The data complexity of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for ?? = MM, SM, PM.

From Theorem 6.3 it follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard in the data size. In fact, this can be sharpened to plain disjunctive Datalog queries defined by programs without [not equal to] and ??. Thus adding disjunction to pure Horn clauses allows one to express [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard queries. The proof is via a program [Pi] for deciding whether an atom a occurs in some minimal model of a given propositional program (cf. Eiter and Gottlob [1995b]).

THEOREM 8.2 The Boolean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard in the data size, even if no negation occurs in query programs.

PROOF. Let [Pi] be a program consisting of clauses of the form

[a.sub.1] [disjunction] [a.sub.2] [left arrow] [t.sub.1], [t.sub.2], [t.sub.3](*)

where all [a.sub.1], [a.sub.2] and [t.sub.1], [t.sub.2], [t.sub.3] are (not necessarily distinct) propositional letters (i.e., 0-ary predicates). By Corollary 6 in Eiter and Gottlob [1995b] it follows that, given [Pi], deciding if for a fixed propositional letter P, it holds P [element of] M for some M E MM([Pi]) is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard provided that a constant for true, given by ??, is available.

We define a fixed [DATALOG.sup.[disjunction] query program [[Pi].sub.1] with IO([[Pi].sub.1]) = (C; W), where C is a 5-ary relation and W is a 0-ary relation, such that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] solves this problem on the input database [D.sub.[Pi] in which [Pi] is encoded as follows.

The universe U of D is given by U = RS([Pi]) [union] {T, P}, that is, by the set of propositional letters occurring in [Pi] and T, P, and C contains tuples ([a.sub.1], [a.sub.2], [t.sub.1], [t.sub.2], [t.sub.3]) for the clauses (*) in [Pi]. Program [Pi] consists of the following three clauses.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Intuitively, V corresponds to a valuation of the propositional letters, where V(Q) means that Q is true.

It is easily checked that [Pi].sub.1][[D.sub.[Pi]] has a minimal model in which W occurs (i.e., [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] = true) iff [Pi] has a minimal model M with [P.sub.1] [element of] M. Since [D.sub.[Pi] is polynomial-time constructible from [Pi], the result follows. []

8.2 Expression Complexity

Intuitively, the expression complexity of disjunctive Datalog is exponentially higher than the data complexity, since the size of [Pi][D] can be exponential in the size of [Pi], as opposed to polynomial size in D. In fact, the expression complexity of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] turns out to be complete for the exponential analogue of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], which is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. This class consists of all problems that can be decided on a nondeterministic Turing machine in single exponential time (i.e., in time O([2.sup.p(n)]), where p(n) is polynomial), with the use of an NP-oracle. Notice that few natural [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete problems are known (see Kolaitis and Vardi [1995] for an example in the context of second-order logic).

THEOREM 8.3 The expression complexity of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] for ?? = MM, SM, PM.

PROOF. [Pi][D] is polytime constructible from D and ground([Pi], U), whose size is clearly bounded by [n.sup.n] = [2.sup.n log n], where n is the size of [Pi] plus D. By Proposition 6. 1, deciding if [S.sub.i](a) [element of] M for some M [element of] ?? ([Pi][D]) is nondeterministically possible with an NP oracle in time polynomial in the size [Pi][D]. Thus deciding whether [S.sub.i](a) [element of] M for some M [element of] ??([Pi], D) is possible by a nondeterministic Turing machine in single exponential time with respect to the size of [Pi] plus D by using an NP oracle. Thus the QOT-problem (Section 2) is in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].(12)

Analogous to the data complexity, hardness of the expression complexity for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] already shows up in the fragment of [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries, even if no negation occurs in the query program. The proof of this result relies on the use of complexity upgrade techniques that extend the results in Balcazar et al. [1992] and Papadimitriou and Yannakakis [1985], applied to results in Stewart [1991], and the simulation of Boolean circuits by a [DATALOG.sup.[disjunction]] program without [not equal to] and negation under minimal model semantics. In Kolaitis and Papadimitriou [1991], a similar simulation was described in [DATALOG.sup.??,[not equal to]] using stratified negation and fixpoint semantics.

Briefly, the proof consists of a reduction of a problem on graphs, co-CERT3COL [Stewart 1991], in its succinct version [co-CERT3COL.sub.S] [Balcazar et al. 1992], that is, the problem input consists of a Boolean circuit [C.sub.w] by which the bits of the standard problem input w (a binary string encoding the graph) can be computed. Using the complexity upgrade techniques, we show that [co-CERT3COL.sub.S] is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] This problem is reduced to evaluating Boolean queries [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] over a fixed database [D.sub.0], defined by [DATALOG.sup.[disjunction] programs [Pi] that are constructed in polynomial time.

8.2.1 Succinct Representation. It is well known that the input representation (e.g., numbers in unary or binary notation) can have a drastic effect on the problem complexity. Such effects were investigated for the "succinct versions" of graph-theoretic problems in Papadimitriou and Yannakakis [1985], where a graph whose vertices are elements of [{0, 1}.sup.n] is represented by a Boolean circuit with 2n input bits; the circuit outputs 1 if and only if the input represents two vertices that are connected by an edge.

Formally, a Boolean circuit is a finite set of triples (gates) {[g.sub.i] = ([a.sub.i], j, k): 1 [is less than or equal to] i [is less than or equal to] t}, where [a.sub.i] [element of] {in, and, or, not} is the kind of the gate, and [g.sub.j], [g.sub.k], j, k [is less than] i are the inputs of the gate, unless [g.sub.i] is an in-gate, in which case, say, j = k = 0. For not-gates, j = k (cf. Kolaitis and Papadimitriou [1991]). As shown in Papadimitriou and Yannakakis [1985], the succinct problem setting often leads to an exponential complexity increase.

The results and techniques of Papadimitriou and Yannakakis [1985] (see also Papadimitriou [1994]) were generalized in Balcazar et al. [1992] to the succinct version of any decision problem A encoded as a language over {0, 1}. The succinct representation of a binary word w is a Boolean circuit that on input of a number i (in binary), outputs whether i [is less than or equal to] |w| and in that case, the ith bit of w (|w| denotes the length of w). The succinct version [A.sub.S] of A is as follows: given a Boolean circuit [C.sub.w], describing a work w, decide whether w [element of] A.

The following theorem shows how to obtain hardness results under polynomial-time transformability ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) for the succinct problem version from hardness of the standard problem under polylogtime transformability ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). This theorem is a generalization of Theorem 5 in Balcazar et al. [1992], where the theorem is formulated for logtime transformability ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]). Roughly, a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation] is a problem mapping f such that, given w and an integer i, the size of f(w) is bounded by a polynomial and the ith bit of f(w) is computable in polylogarithmic time; note that every such mapping defines a polynomial-time transformation. For problem A, let long(A) be defined as follows: w [element of] long(A) iff the binary expression of the length of w is in {1w: w [element of] A).

THEOREM 8.4 Let C1 and C2 be arbitrary complexity classes such that for every A [element of] C1, long(A) [element of] C2. Then, for every B, if B is hard for C2 under [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation, then [B.sub.S] is hard for C1 under[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation.

PROOF. (Sketch) The theorem can be easily shown along the lines of the proof of Theorem 5 in Balcazar et al. [1992] by using a generalization of the following Conversion Lemma in Balcazar et al. [1992]: if A [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] B, then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. It can be seen, by carefully looking at the proof in Balcazar et al. [1992], that the Conversion Lemma remains valid if [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is replaced by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

To show the theorem, assume that B is hard for C2 under [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation, and let A [element of] C1 be arbitrary. By hypothesis, long(A) [element of] C2 and therefore long(A) [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] B. By the generalized Conversion Lemma, long[(A).sub.s] [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [B.sub.s]. Furthermore, A [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] long[(A).sub.s] (Balcazar et al. [1992, Lemma 4]); by transitivity, A [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [B.sub.s], and thus [B.sub.s] is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard for C1.

A strengthening of this theorem from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation to logspace-transformation is shown in Vieth [1994] and Gottlob et al. [1995].

The proof of Theorem 8.8 uses this complexity upgrade technique. It gives a polynomial-time transformation of the succinct version [co-CERT3COL.sub.s] of the graph problem co-CERT3COL, to evaluating Boolean ??-free [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] query programs over a fixed database. The query program involves a subprogram that simulates Boolean circuits by which the relations in the input structure encoding the graph can be computed.

Problem co-CERT3COL is as follows (cf. Stewart [1991]):

Instance of size n: a graph G on the vertices {0, 1, . . . , n - 1} such that every edge is labeled with a disjunction of two literals, where each literal is over the Boolean variables {[X.sub.i,j]: i, j = 0, . . . , n - 1}.

Yes-instance of size n: an instance G of size n such that for some truth value assignment t to the Boolean variables, the graph t(G) obtained from G by including only those edges set true under t is not 3-colorable.

Stewart showed that the complementary problem CERT3COL (i.e., negated condition for Yes-instances) is complete for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under projection translations. It can be easily seen that projection translatability implies [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformability (cf. Vieth [1994]). Thus co-CERT3COL is complete for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation. Applying the complexity upgrade theorem, we thus obtain the following.

Theorem 8.5 [co-CERT3COL.sub.s] is complete for [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] under [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation.

We next describe disjunctive Datalog programs for deciding co-CERT3COL and for simulating a Boolean circuit. The programs are used in the proof of Theorem 8.8.

8.2.2 Labeled Graph Coloring Program. Problem co-CERT3COL can be decided by a Boolean [DATALOG.sup.[disjunction]] query program [[Pi].sub.Col] in which no negation occurs using possibility stable model semantics. For an instance of co-CERT3COL of size n, the input database of [[Pi].sub.Col] consists of a structure with universe {0, . . . , n - 1} and certain relations encoding the labeled graph.

We may assume that co-CERT3COL is encoded using a binary relation E and 4-ary relations P and N, such that E(x, y) [disjunction] E(y, x) (resp., P(x, y, v) [disjunction] P(y, x, u, v), N(x, y, u, v) [disjunction] N(y, x, u, v)) encodes that an edge between x and y exists (resp., the edge between x and y is labeled by [X.sub.u,v] ??[X.sub.u,v]).

Thus the input relations of [[Pi].sub.Col] are E, N, and P; the output relation of [[Pi].sub.Col is the 0-ary relation W. The program [[Pi].sub.Col] uses in addition binary predicates T and T, where T(u, v) means that [X.sub.u,v] has value true; T is intended to be the complement of T. Furthermore, [[Pi].sub.Col] uses unary predicates R, B, and G, which encode the color of a node. [[Pi].sub.Col] consists of the following clauses (C = R, G, B).

(1) T(u, v) [disjunction] T(u, v) [left arrow].

(2) R (x) [disjunction] G (x) [disjunction] B (x) [left arrow].

(3) C(x) [left arrow] W.

(4) W [right arrow] P(x, y, u, v), T(u, v), C(x), C(y).

(5) W [right arrow] N(x, y, u, v), t(u, v), C (x), C (y).

Intuitively, clause (1) chooses a truth value for [X.sub.u,v]. Dependent on that choice, W is computed true iff the resulting graph t(G) is not 3-colorable.

Lemma 8.6 Let [D.sub.I] = ({0, . . . , n - 1}, P, N) be the input database for [[Pi].sub.Col] for an instance I of co-CERT3COL. Then W [element of] M for some M [element of] MM([[Pi].sub.Col], D) iff I is a Yes-instance.

PROOF

(??). Let t be a valuation of the [X.sub.u,v] such that t(G) is not 3-colorable. Let M be the interpretation of [[Pi].sub.Col][D] such that W is true, T(u, v) [element of] M iff t([X.sub.u,v]) = true, T(u, v) [element of] M iff t([X.sub.u,v]) = false, for all u, v [element of] U, and R = G = B = U. As easily checked, M is a minimal model of [[Pi].sub.Col][D].

(??). Assume that M is a minimal model of [[Pi].sub.Col][D] in which W is true. From Corollary 5.4, we can infer that for each u, v [element of] U, either T(u, v) [element of] M or P(u, v) [element of] M, but not both. Since M is a minimal model, we infer that for the valuation t of the [X.sub.u,v] defined by t([X.sub.u,v]) = true if T(u, v) [element of] M and t([X.sub.u,v]) = false if T(u, v) [is not an element of] M, for each assignment of one of the colors red, green, and blue to every node, two adjacent nodes of the graph t(G) are colored equally. Hence I is a Yes-instance. []

8.2.3 Boolean circuit program. Let C = {[g.sub.i] = ([a.sub.i], j, 1), 1 [is less than or equal to] i [is less than or equal to] t) be a Boolean circuit that decides a k-ary predicate R over [U.sub.0] = {0, 1}; that is, for any tuple [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] supplied to C as input, a designated output gate of C, which we assume is [g.sub.t], has value 1 iff t [element of] R.

We describe a [DATALOG.sup.??] program [[Pi].sub.C] that simulates C using the universe [U.sub.0]. For each gate [g.sub.i], [[Pi].sub.C] uses a k-ary predicate [G.sub.i], where [G.sub.i](x) informally states that on input of tuple x to C, the circuit computation sets the output of [g.sub.i] to 1. Moreover, it uses a propositional letter False, which is true in those models in which the [G.sub.i] do not have the intended interpretation; none of these models will be minimal.

The clauses of [[Pi].sub.C] are the following ones. For each gate [g.sub.i] = ([a.sub.i], j, l) of C, it contains the clause

(00) [G.sub.i] (x) [is less than] False.

Depending on the type [a.sub.i], [[Pi].sub.C] contains for [g.sub.i] additional clauses:

[a.sub.i] = in: (01) [G.sub.i]([x.sub.1], . . . , [x.sub.j-1], 1, [x.sub.j+1], . . ., [x.sub.k]) [left arrow].

(02) False [left arrow] [G.sub.i]([x.sub.1], . . . , [x.sub.j-1], 0, [x.sub.j+1], . . ., [x.sub.k])

[a.sub.i] = not: (03) [G.sub.i](x) [disjunction] [G.sub.j] (x) [left arrow].

(04) False [left arrow] [G.sub.j](x), [G.sub.i](x).

[a.sub.i] = and: (05) [G.sub.i](x) [left arrow] [G.sub.j](x), [G.sub.l](x).

(06) [G.sub.j](x) [left arrow] [G.sub.i](x).

(07) [G.sub.l](x) [left arrow] [G.sub.i](x).

[a.sub.i] = or: (08) [G.sub.i](x) [left arrow] [G.sub.j](x).

(09) [G.sub.i](x) [left arrow] [G.sub.l](x).

(10) [G.sub.j](x) [disjunction] [G.sub.i](x) [left arrow] [G.sub.i](x).

The clauses (00) ensure that if a model of ground ([[Pi].sub.C], [U.sub.0]) contains False, it is the maximal interpretation (which is trivially a model of [[Pi].sub.C]). In fact, this is the only model of [[Pi].sub.C] that contains False. Let [M.sub.C] denote the interpretation of ground ([[Pi].sub.C], [U.sub.0]) given by [M.sub.C] = {[G.sub.i](t)|[g.sub.i] [element of] C, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [g.sub.i] takes value 1 on input t to C}.

Lemma 8.7 For any Boolean circuit C, [M.sub.C] is the unique minimal model of ground ([[Pi].sub.C], [U.sub.0]).

Proof. In what follows, let P = ground ([[Pi].sub.C], [U.sub.0]). It is easily checked that [M.sub.C] is a model of P. Since [HB.sub.P] is the only model M of P such that False [element of] M, it suffices to show that [M.sub.C] is the only model M of P such that False [element of] M.

We need additional concepts. The level [l.sub.C]([g.sub.i]) of gate [g.sub.i] = ([a.sub.i], j, l) in C is the length of the longest path from any input gate to [g.sub.i]. Formally,

[l.sub.C]([g.sub.i] = {0, if [a.sub.i] = in;

{max (l.sub.C]([g.sub.i]),

[l.sub.C]([g.sub.i]) + 1. otherwise

For each model M of P, let for each d [is greater than or equal to] 0,

[M.sup.(d)] = M [intersection] {[G.sub.i](t) : [l.sub.C]([g.sub.i]) [is less than or equal to] d,t [element of] [{0, 1}.sup.k]},

that is, M restricted to all predicates [G.sub.i] such that [g.sub.i] occurs in C at level r [is less than or equal to] d.

We show by induction on the levels d of C the following: for each model M of P with False [is not an element of] M, it holds [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; this clearly implies M = [M.sub.C].

(Base) Let d = 0. It is easy to see that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Assume that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Then, for some input gate [g.sub.i] = (in, j, l) of C and tuple t, we have [G.sub.i](t) [element of] [M.sup.(0)] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. From the definition of [M.sub.C] and clause (02), we infer that the jth component of t is 0; consequently, from clause (03) we infer that False [element of] M, which is a contradiction. It follows that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and hence [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII].

(Induction) Assume that the statement holds for d [is greater than or equal to] 0 and consider [g.sub.i] = ([a.sub.i], j, l) such that [l.sub.C]([g.sub.i]) = d + 1. Consider for t [element of] [{0, 1}.sup.k] the two possibilities for membership of [G.sub.i](t) in [M.sub.C].

Case 1: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We show that in this case, [G.sub.i](t) [element of] [M.sup.(d+1)]. Following are the possibilities for the type [a.sub.i] of gate [g.sub.i].

(1) [a.sub.i] = not: By definition, we have that [G.sub.j](t) [is not an element of] [M.sub.C]. Hence it follows from the induction hypothesis that [G.sub.j](t) [is not an element of] M; thus from clause (03), we infer that [G.sub.i](t) [element of] [M.sup.(d+1)].

(2) [a.sub.i] = and: By definition, [G.sub.j](t), [G.sub.l](t) [element of] [M.sub.C]; hence, by the induction hypothesis, it follows that [G.sub.j](t), [G.sub.j](t) [element of] M. Thus from clause (05), we infer that [G.sub.i](t) [element of] [M.sup.(d+1)].

(3) [a.sub.i] = or: By definition, we have that [G.sub.j](t) [element of] [M.sub.C] or [G.sub.l](t) [element of] [M.sub.C]. If [G.sub.j](t) [element of] [M.sub.C], it follows from the induction hypothesis that [G.sub.j](t) [element of] M. Hence from clause (08) we infer that [G.sub.i](t) [element of] [M.sup.(d+1)]. The case [G.sub.l](t) [element of] [M.sub.C] is analogous.

Case 2: [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. We show that in this case, [G.sub.i](t) [is not an element of] [M.sup.(d+1)].

(1) [a.sub.i] = not: By definition, we have that [G.sub.j](t) [element of] [M.sub.C], and thus by the induction hypothesis, it follows that [G.sub.j](t) [element of] M. From clause (04) we infer that [G.sub.i](t) [is not an element of] [M.sup.(d)].

(2) [a.sup.i] = and: By definition, we have that [G.sub.j](t) [is not an element of] [M.sub.C] or [G.sub.l](t) [is not an element of] [M.sub.C]. Assume that [G.sub.j](t) [is not an element of] [M.sub.C]. From the induction hypothesis, it follows that [G.sub.j](t) [is not an element of] M. From clause (06), we infer that [G.sub.i](t) [is not an element of] [M.sup.(d+1)]. The case [G.sub.l](t) [is not an element of] [M.sub.C] is analogous.

(3) [a.sub.l] = or: By definition, we have that [G.sub.j](t), [G.sub.l](t) [is not an element of] [M.sub.C], and, by the induction hypothesis, we infer that [G.sub.j](t), [G.sub.l](t) [is not an element of] M. From clause (10), we infer that [G.sub.i](t) [is not an element of] [M.sup.(d+1)]. QED

Thus the claimed statement holds for d + 1. This concludes the proof. []

Theorem 8.8 The Boolean [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-hard in the program size.

Proof. Let [C.sub.I] be the Boolean circuit of an instance I of co-CERT3COL of size n. We describe a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation of deciding I into evaluating a program [[Pi].sub.1] on the fixed database [D.sub.0] = ([U.sub.0]).

From [C.sub.I], Boolean circuits [C.sub.P] and [C.sub.N] are constructible in polynomial time that compute P and N, respectively, where each argument x, y, u, and v is encoded by a string of [log n] bits. The circuits output 1 if a tuple is contained in P (resp. N) and 0 otherwise; for any string representing m with n [is less than or equal to] m [is less than] [2.sup.[log n]], the output is 0. Semantically, this means that we have added isolated nodes n, n + 1, . . . , [2.sup.[log n] to the graph of I that are not labeled and hence never included in the graph t(G).

[C.sub.P] and [C.sub.N] can be simulated by polynomial-time constructible [DATALOG.sup.[disjunction]] programs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], using [U.sub.0] as previously shown, which use predicates [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] [False.sup.P] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], [False.sup.N] respectively; assume that [MATHEMATICAL EXPRESSION NOT REPRODUCIBL [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] are the predicates corresponding to the output gates [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] of [C.sub.P] and [C.sub.N], respectively.

Program [[Pi].sub.Col] can be easily adapted to a variant [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] where the elements from {O, . . . , n - 1} are represented by binary strings of length [log n] and all occurrences of P, N are replaced by [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], respectively, in time polynomial in log n. For the database [D.sub.I] of I, let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] be the database such that U(D(*)) = [U.sub.0] and for each t' [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII],

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff t' encodes a tuple t, and D | = P(t),

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff t' encodes a tuple t, and D| = N(t).

Notice the following property (*): W [element of] M for some M [element of] MM([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) iff W [element of] M for some M [element of] MM([[Pi].sub.Col], [D.sub.I]).

Now let [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and adopt IO([[Pi].sub.I]) = ([Theta]; W). Notice that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and that [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] do not share any predicates.

It holds that W [element of] M for some M [element of] MM([[Pi].sub.I], [D.sub.0]); that is, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff I is a Yes-instance. Indeed, from Corollary 5.4 and Lemma 8.7, we can infer that for every M [element of] MM([[Pi].sub.I], [D.sub.0]), [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], respectively. Hence, from Corollary 5.4 and (*), it follows that W [element of] M for some M [element of] MM([[Pi].sub.I], [D.sub.0]) iff W [element of] M for some [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] iff W [element of] M for some M [element of] MM([[Pi].sub.Col], [D.sub.I]). Thus, by Lemma 8.6, W [element of] M for some M [element of] MM([[Pi].sub.I], [D.sub.0]) iff I is a Yes-instance.

Since [[Pi].sub.I] can be constructed from [C.sub.I] in polynomial time, the result follows. []

Notice that evaluating a given [DATALOG.sup.[disjunction],??] query program under p, L semantics on a given database (termed combined complexity in Vardi [1982]), is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete as well.

Remarks. Without loss of generality, we could bypass or-gates in the Boolean circuit simulation, by simulating or-gates with and-gates and not-gates. Moreover, the transformation described in the proof of Theorem 8.8 can be performed in logspace. Thus, by using the strengthening of Theorem 8.4 from [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-transformation to logspace-transformation [Veith 1994; Gottlob et al. 1995], one can obtain a strengthening of Theorem 8.8 to logspace-transformation.

In fact, the function of a circuit can be computed by a program that uses neither disjunction nor negation, that is, a pure Datalog program; this improves on the proof approach in Kolaitis and Papadimitriou [1991], where negation was needed. The following observation is crucial for this improvement.

Fact. For every Boolean circuit C, there exists an equivalent Boolean circuit C' that has O(|C|) size, so that C' has negation only immediately after input gates; moreover, such a C' can be easily constructed from C in logspace.

Indeed, introduce for each gate [g.sub.i] a new gate [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] that computes the complement of [g.sub.i]; if [g.sub.i] is an in-gate, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is a not-gate which is fed by [g.sub.i]; if [g.sub.i] is an and-gate (resp., or-gate) with input gates [g.sub.j] and [g.sub.k], then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] is an or-gate (resp., and-gate) with inputs [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]; finally, if [g.sub.i] is a not-gate fed by [g.sub.j], then it is changed to an or-gate whose both inputs are [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], and [MATHEMATICAL EXPRESSION NOT REPRODU an or-gate with [g.sub.j] for both inputs. Clearly, this yields the desired C', which can be constructed in logspace.

Negation on input gates can be realized quite easily in a program for circuit simulation. If [g.sub.i] is a not-gate fed by an in-gate [g.sub.k] = ([a.sub.k], j, l), then use the rule

(20) [G.sub.i]([x.sub.1], . . . , [x.sub.j-1]) 0) [x.sub.j+1], . . ., [x.sub.k]) [left arrow].

To simulate the circuit C' that has only input-negation, it is sufficient to use beside such rules the rules (01), (05), and (08) from the preceding circuit program for in-gates, and-gates, and or-gates, respectively. The resulting program [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], is a pure Datalog program; it is equivalent to the standard program [[Pi].sub.C], for C'; that is, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], has [M.sub.C], as its unique minimal model on [U.sub.0].

Notice that this way we obtain, using complexity upgrading techniques, a simple proof of the folklore result that pure DATALOG has EXPTIME-complete expression complexity, which is implicit already in Vardi [1982] and Chandra and Harel [1982]. Indeed, the exponential upgrade of P is EXPTIME; problems that are complete for P under [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] reductions (or even stronger notions of reductions) can be expressed in pure Datalog. Am example is the CIRCUIT VALUE problem (compute the output of a Boolean circuit on a given input). This problem is complete for P under projection translations, even if negation is restricted to input gates [Vieth 1996] (essentially, this can be seen as an encoding of the MONOTONE CIRCUIT VALUE problem in Papadimitriou [1994]). A pure Datalog program solving this problem is straightforward. This implies EXPTIME-hardness; membership is obvious since the data complexity is well-known polynomial.

9. DISCUSSION AND CONCLUSION

The results of this article give a precise picture of the computational and expressive power of various semantics of disjunctive Datalog over relational databases (i.e., finite structures). Moreover, by our complexity-theoretic characterization of the expressiveness of disjunctive Datalog ([MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]), we are able to compare disjunctive Datalog with well-known database query languages. These relationships are depicted in Figure 2. Proofs and/or references for the other query languages appearing in this figure can found in Abiteboul et al. [1995]; here [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] denotes nondisjunctive Datalog with well-founded negation [Van Gelder et al. 1991]. Solid lines represent proper inclusion ([subset]), and dashed lines stand for possibly improper inclusion ([subset or equal to]). In the latter case, it is currently not known, but strongly believed, that the inclusion is proper. Note that recent results of SaccA [1997] show that normal logic programming using maximal or least undefined partial stable models can have the same expressive power as disjunctive logic programming.

[Figure 2 ILLUSTRATION OMITTED]

[DATALOG.sup.[disjunction], ??] appears to be a language of very high expressive power, and therefore the worst-case complexity of answering queries in this formalism is very high, which may be seen as a serious drawback. A similar comment applies to several other expressive query languages studied in the literature, for example, to the language of While-Queries studied by Chandra and Harel [1982] and the language of Partial Fixpoint Queries considered by Abiteboul et al. [1995]. The latter two languages both capture the highly intractable complexity-class PSPACE. This raises the question of whether it makes sense to study such expressive, and therefore complex, query languages. For the following reasons, we believe that it indeed does make sense, in particular, in the case of [DATALOG.sup.[disjunction], ??].

First, [DATALOG.sup.[disjunction], ??] is a natural and conservative extension of the tractable query language DATA-LOG. Thus whatever query can be posed in DATALOG (or even in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) can be posed in exactly the same way in [DATALOG.sup.[disjunction],??] and can be answered in polynomial time. Therefore, with respect to plain DATALOG and [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] queries, [DATALOG.sup.[disjunction], ??] inherits all good properties.

Moreover, several problems of high practical relevance can be formulated in [DATALOG.sup.??], but presumably not in plain DATALOG (for examples, see Section 4). Clearly, we cannot expect a database engine to solve, say, the Traveling Salesman Problem, in polynomial time. However, such problems occur in practice and have to be solved. Disjunctive Datalog allows us to formulate such hard queries very elegantly, and gives us a chance to get an answer for small instances in reasonable time. In plain Datalog, unless the Polynomial Hierarchy collapses, it is not possible to formulate any NP-hard problem, even on enumerated databases unless proper restrictions are imposed on the problem description. It appears that also in this case, the plain Datalog formulation of a query can be rather cumbersome, for example, if the number of cities in the TSP is bounded by a constant.

We advocate that a modular approach to the expressiveness/complexity tradeoff should be taken. A solid kernel query language (in our case, [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]) guarantees polynomial response time at the cost of a limited expressiveness. Extending this kernel, a much more expressive language (in our case [DATALOG.sup.[disjunction], ??]) is made available, in which hard queries can be formulated. Clearly, even if this extended language is implemented by smart procedures that behave satisfactorily in many cases, no guarantee about response time can be given. The extended language should be used at the user's own risk. For the effectiveness of this approach, the core language must be easily distinguishable (at the syntactic level) from the extended language. This is indeed the case with [DATALOG.sup.[disjunction], ??]: it is easily (polynomially) recognizable whether a query does or does not contain disjunctions or unstratified negation. For a discussion of similar modular approaches in the field of terminological databases see Section 6 in Borgida [1995]; for a related discussion of the expressiveness versus complexity question, see Doyle and Patil [1991].

Finally, we believe that the exact determination of the expressiveness and complexity of [DATALOG.sup.[disjunction], ??] at the second level of the Polynomial Hierarchy is important for two reasons. The expressiveness results allow us to separate the queries that can be represented in our query language from those that (presumably) cannot. For example, important optimization problems such as computing optimal tours for the TSP or computing maximal eigenvectors are expressible, whereas playing games such as GO (which is PSPACE-complete, cf. Papadimitriou [1994]) is not possible in [DATALOG.sup.[disjunction], ??] unless the Polynomial Hierarchy collapses. The complexity results give us useful information about the algorithmic nature of the query-answering problem. As demonstrated in Section 4, a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete problem can be solved by a guess-and-check algorithm that generates guesses and checks their appropriateness via a co-NP procedure. We can thus identify two independent and, in a sense, orthogonal sources of complexity: guessing suitable candidates for stable models and checking them. This insight is important for the design of practical algorithms and heuristics that attack these two sources of complexity as much as possible, and, in particular, for the identification and recognition of tractable subcases. Several current projects deal with the implementation of disjunctive Datalog or, more generally, of disjunctive logic programming, in which the design of such algorithms and methods is a primary goal.

For example, the group of Dix and Furbach [1993] in Koblenz (Germany), in an ambitious project, implements the disjunctive stable semantics and other semantics of disjunctive logic programming; in particular, they implement the recently proposed static semantics [Przymusinski 1994, 1995], which is very similar to the D-WFS semantics [Brass and Dix 1995; Brass et al. 1996a, b] and the well-founded circumscription semantics [You and Yuan 1993]; in fact, for disjunctive Datalog, static semantics and well-founded circumscription semantics coincide (cf. You and Yuan [1995]). The static semantics associates with each program a set of modal formulas as its meaning; it appears to be the best candidate for a natural extension of the well-founded semantics to disjunctive logic programs. Notice that, since the static semantics coincides on ??-free disjunctive logic programs with the minimal model semantics [Brass et al. 1996a], it is clear from our results that [DATALOG.sup.[disjunction],??] programs can express under this semantics every query in [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]. Moreover, preliminary results based on characterizations in Brass et al. [1996a] indicate that it can only express such queries, and thus has the same expressive power as [DATALOG.sup.[disjunction,??] under the certainty variant of the stable model semantics; we leave this for further study.

Seipel [1994] in Wurzburg (Germany) implements various semantics and methods developed by Minker and his students [Lobo et al. 1992; Minker 1994] at College Park (Maryland). There, V. S. Subrahmanian and his coworkers [Bell et al. 1994, 1996; Nerode et al. 1995], use integer programming methods for implementing normal and disjunctive logic programming. Ben-Eliyahu [Ben-Eliyahu and Dechter 1994; Ben-Eliyahu and Palopoli [1994] (Ben Gurion University and Technion, Israel) has recently started a project of implementing a subclass of disjunctive logic programs (so called headcycle-free programs) that is less complex (thus also less powerful) than the full class; see Eiter et al. [1996] for more details.

Last but not least, a project for implementing a prototype of disjunctive Datalog based on the stable model semantics (funded by the Austrian Science Foundation (FWF)) was most recently started at the information systems department of TU Vienna. Results and techniques from Leone et al. [1995] are a starting point for this work.

The current and future research of these groups is centered around the development of smart algorithms for disjunctive logic programming. Such algorithms should be able to recognize cases where the high worst-case complexity of disjunction can be avoided, possibly using efficient rewriting techniques.

On the more theoretical side, we are currently investigating the expressive power of nonmonotonic formalisms that are related to disjunctive logic programming.

One of the most popular nonmonotonic logics is Reiter's [1980] default logic. This logic was developed as a knowledge representation formalism and was originally not conceived as a database query language. However, in Cadoli et al. [1994] a suitable setting was defined in which default logic can be used as a query language for relational databases (Default Query Language, DQL). It was shown in Cadoli et al. [1994] that DQL has over relational databases the same expressive power as disjunctive Datalog. This naturally raises the question about how these formalisms are related, in particular with regard to possible translations.

Note that each disjunctive logic program can be translated into an equivalent default theory. A concrete translation, related to earlier work in Bidoit and Froidevaux [1991], was described by Sakama and Inoue [1993]. Their translation is rather intuitive and modular, which means that the translation of a disjunctive logic program is obtained by the union of the independent translations of the single rules. In this translation, each rule is rewritten to a default and further normal defaults enforcing the closed world assumption are added. A related proposal for handling disjunction by using normal defaults has been recently pointed out in Turner [1996].

Based on the translation mapping of Sakama and Inoue [1993], a translation from [DATALOG.sup.[disjunction],??] under the stable semantics into DQL can be obtained. Therefore, as already remarked in Cadoli et al. [1994], the fact that DQL expresses at least as much as [DATALOG.sup.[disjunction],??] under the stable semantics follows from Theorems 6.9 and 6.11 of this article in combination with the translation result by Sakama and Inoue. Notwithstanding, a direct expressiveness proof for DQL, which does not rely on disjunctive Datalog but exploits particular syntactic features of DQL was given in the appendix of Cadoli et al. [1994]. That proof is instructive to users of default logic, because it shows how database properties can be expressed in DQL via their semantics in second-order logic.

On the other hand, to the best of our knowledge, no translation from default logic to disjunctive logic programming is known in the literature. Therefore, the expressivity results in Section 6 of this article cannot be obtained from the results in Cadoli et al. [1994] by means of translations. Genuine--and more involved--proofs were in order to determine the expressive power of disjunctive Datalog (in particular, for ??,-free disjunctive Datalog).

By the results of this article it follows that it is possible to devise a translation from DQL to [DATALOG.sup.[disjunction],??] (even to [DATALOG.sup.[disjunction], [is not equal to]]), which also gives us a translation from default logic to disjunctive logic programming. In fact, each DQL query can be easily expressed by a [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] second-order formula, which, as shown in Eiter et al. [1996] can be brought into the normal form used in Theorem 6.3, and thus translated into [DATALOG.sup.[disjunction], [is not equal to]]. However, this translation has shortcomings: first, it is not a true embedding since it extends the language by new predicate symbols; second, it is not modular.

The study of modular embeddings between database languages appears to be important for understanding their relationship. If a modular embedding from query language [L.sub.1] to [L.sub.2] exists, then, in a sense, [L.sub.1] can be seen as a fragment or sublanguage of [L.sub.2].

Interestingly, there exists no modular embedding of DQL into disjunctive Datalog. This can be seen as follows. In Lifschitz and Turner [1993] and Chen [1993], modular embeddings of disjunctive logic programming into autoepistemic logic [Moore 1985] are exhibited. Thus, if there existed a modular embedding of default logic into disjunctive logic programming, by composition, there would also exist a modular embedding of default logic into autoepistemic logic. This would contradict a result in Gottlob [1995] which proves that there exists no modular embedding of default logic into autoepistemic logic.

The existence of a modular embedding of [DATALOG.sup.[disjunction],??] into DQL and the nonexistence of a modular embedding in the reverse direction can be interpreted as follows. Loosely speaking, [DATALOG.sup.[disjunction],??] amounts to a fragment of DQL which, as shown in this article, provides the full expressive power over relational databases. However, DQL provides more complicated syntactic features that cannot be mapped into [DATALOG.sup.[disjunction],??] by a simple translation.

From a practical point of view, in the context of deductive databases disjunctive Datalog seems to be the more suitable extension of [DATALOG.sup.??] than DQL. Due to its plain syntax, [DATALOG.sup.[disjunction],??] is amenable to automatic program analysis and optimization. In particular, modularity properties (as described in Section 5) and natural syntactic restrictions on [DATALOG.sup.[disjunction],??] programs (such as headcycle freeness) can be exploited to improve efficiency [Ben-Eliyahu and Dechter 1994; Leone et al. 1995]. On the other hand, due to its rich (and in some sense more succinct) syntax, DQL is far less accessible to such methods. For example, the concept of stratification for default theories is much more intricate than for logic programs [Cholewinski 1995]. The strength of DQL is that certain Al problems such as model-based diagnostic reasoning about complex devices can be more naturally expressed in that language [Cadoli et al. 1994]. It seems that DQL is a suitable language for bridging the impedance mismatch between advanced knowledge representation and relational databases.

Another aspect highlighting the difference between disjunctive Datalog and DQL is their expressive power over disjunctive (clausal) databases. As shown in Bonatti and Eiter [1995], the expressive powers of the two languages are different in that context.

Autoepistemic logic [Moore 1985] has the same expressive power as default logic [Bonatti and Eiter 1995]. Promising variants of default logic such as stable class default logic [Baral and Subrahmanian 1992] remain to be explored.

ACKNOWLEDGMENTS

The authors would like to thank several people for their comments on this work. In particular, we are indebted to P. A. Bonatti, J. Dix, K. Inoue, P. Kolaitis, J. S. Schlipf, H. Veith, and V. Vianu for important hints and comments. Furthermore, we are grateful to the anonymous referees for their constructive comments and helpful suggestions for improvement.

(1) Cf. Dix [1992], Lobo et al. [1992], Minker [1994], and Sakama and Inoue [1993].

(2) For a background, please see Papadimitriou [1994], Ullman [1989], Ceri et al. [1990], and Abiteboul et al. [1885].

(3) Notice that [DATALOG.sup.[disjunction]] programs allow the use of positive equality atoms in the body. Equality atoms can be easily eliminated and are thus not essential.

(4) Alternatively, we could extend the formalism by a designated atom [perpendicular] which is false in every interpretation. Although straightforward, for simplicity we do not consider such an extension.

(5) This choice can also be implemented by the two nondisjunctive clauses T(i, j), [left arrow] ??T(i, j), [is less than or equal to] i [is less than or equal to] n, j [is less than or equal to] n, and T'(i, j) [left arrow] ??T(i, j), i [is less than or equal to] n, i [is less than or equal to] n. Note that the resulting program is nondisjunctive.

(6) Here, we suppose that the provided universe U of the database storing the instance is sufficiently large for computing the tour values.

(7) Notice that in practice, integers are usually supported on an interval [-c, c - 1] for some (large) constant c, with arithmetic modulo c.

(8) Similar to the previous example, this choice can alternatively be implemented through nondisjunctive clauses E(i, j) [left arrow] ?? E(i, j), 1 [is less than or equal to] i [is less than or equal to] n, -c [is less than or equal to] j [is less than or equal to] c and E(i, j) [left arrow] ?? E(i, j), 1 [is less than or equal to] i [is less than or equal to] n, -c [is less than or equal to] j [is less than or equal to] c.

(9) Cf. Schlipf [1995b], Dix [1992], Ullman [1989], Ross [1994], and Lifschitz and Turner [1994].

(10) Because of the lack of a designated constant [perpendicular to] for falsity, we make here the implicit assumption that each denial [perpendicular to] [left arrow] [l.sub.1], . . . , [l.sub.m] is rewritten using local fresh propositional atoms A and False (cf. Section 4).

(11) The lemma can be generalized to nonground programs, provided that in general constants occurring only in [[Pi].sub.2] (resp., [[Pi].sub.1]) are accessible to variables in [[Pi].sub.1] (resp., [[Pi].sub.1]). This is, for example, trivially true if ground([Pi]) = ground([[Pi].sub.1]) [union] ground([[Pi].sub.2]).

(12) In fact, for fixed D, the size of ground([Pi], U) is bounded by [2.sup.cn]; thus the problem is even in [NE.sup.NP], the class of problems decidable on a nondeterministic Turing machine in linear exponential (i.e., O([2.sup.cn])) time with an NP oracle, and, as follows from Theorem 8.8, also complete for this class.

REFERENCES

ABITEBOUL, A., SIMON, E., and VIANU, V. 1990. Non-deterministic languages to express deterministic transformations. In Proceedings PODS '90 (Nashville, TN, April 2-4), 218-229.

ABITEBOUL, S., HULL, R., and VIANU, V. 1995. Foundations of Databases. Addison-Wesley, Reading, MA.

BALCAZAR, J., LOZANO, A., and TORAN, J. 1992. The complexity of algorithmic problems on succinct instances. In Computer Science, R. Baeta-Yates and U. Manber, Eds., Plenum Press, New York, 351-377.

BARAL, C. and GELFOND, M. 1994. Logic programming and knowledge representation. J. Logic Program. 19/20, 73-148.

BARAL, C. and SUBRAHMANIAN, V. S. 1992. Stable and extension class theory for logic programs and default logic. J. Autom. Reasoning, 8, 345-366.

BELL, C., NERODE, A., NG, R., and SUBRAHMANIAN, V. S. 1994. Mixed integer programming methods for computing non-monotonic deductive databases. JACM 41, 6, 1178-1215.

BELL, C., NERODE, A., NG, R., and SUBRAHMANIAN, V. S. 1996. Implementing deductive databases by mixed integer programming. ACM Trans. Database Syst., 21, 2, 238-269.

BEN-ELIYAHU, R. and DECHTER, R. 1994. Propositional semantics for disjunctive logic programs. Ann. Math. Artif. Intell. 12, 53-87.

BEN-ELIYAHU, R. and PALOPOLI, L. 1994. Reasoning with minimal models: Efficient algorithms and applications. In Proceedings Fourth International Conference on Principles of Knowledge Representation and Reasoning (KR-94) (Bonn, Germany, May 24-27), 39-50.

BIDOIT, N. and FROIDEVAUX, C. 1987. Minimalism subsumes default logic and circumscription in stratified logic programming. In Proceedings of LICS-87 (Ithaca, NY, June 22-25), IEEE CS Press, 89-97.

BIDOIT, N. and FROIDEVAUX, C. 1991. General logic databases and programs: Default semantics and stratification. Inf. Comput. 19, 15-54.

BONATTI, P. and EITER, T. 1995. Querying disjunctive databases through nonmonotonic logics. Theor. Comput. Sci. 160, 321-363.

BORGIDA, A. 1995. Description logics in data management. IEEE Trans. Knowl. Data Eng. 7, 5, 671-682.

BRASS, S. and DIX, J. 1995. Disjunctive semantics based upon partial and bottom-up evaluation. In Proceedings 12th International Conference on Logic Programming, Leon Sterling, Ed., (Tokyo, June 13-16), 199-213.

BRASS, S. and DIX, J. 1997. A general framework for semantics of disjunctive logic programs based on partial evaluation. J. Logic Program. (to appear).

BRASS, S., DIX, J., NIEMELA, I., and PRZYMUSINSKI, T. 1996b. A comparison of the static and disjunctive well-founded semantics. University of California at Riverside, (in preparation).

BRASS, S., DIX, J., and PRZYMUSINSKI, T. 1996a. Super logic programs. In Proceedings of the International Conference on Principles of Knowledge Representation and Reasoning (KR-96) (Vancouver, Canada), Morgan-Kaufmann, San Mateo, CA, 529-540.

CADOLI, M., EITER, T., and GOTTLOB, G. 1994. Using default logic as a query language. IEEE Trans. Knowl. Data Eng. 9, 3 (May-June 1997), 448-463.

CERI, S., GOTTLOB, G., and TANCA, L. 1990. Logical Programming and Databases. Springer, New York, NY.

CHandRA, A. and HAREL, D. 1982. Structure and complexity of relational queries. J. Comput. Syst. Sci. 25, 99-128.

CHandRA, A. and HAREL, D. 1985. Horn clause queries and generalizations. J. Logic Program. 2, 115.

CHEN, J. 1993. Minimal knowledge + negation as failure = only knowing (sometimes). In Proceedings of the Second International Workshop on Logic Programming and Nonmonotonic Reasoning (LPNMR '93) (Lisbon, July), L.-M. Pereira and A. Nerode, Eds., MIT Press, Cambridge, MA, 132-150.

CHOLEWINSKI, P. 1995. Reasoning with stratified default theories. In Proceedings of Third International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR '95), (Lexington KY, July), LNCS 928, W. Marek, A. Nerode, and M. Truszczynski, Eds., 273-286.

CHOMICKI, J. and SUBRAHMANIAN, V. S. 1990. Generalized closed world assumption is [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]-complete. Inf. Process. Lett. 34, 289-291.

CULLUM, J. and WILLOUGHBY, R. 1985. Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1. Birkhauser, Boston.

DAWAR, A. 1995. A restricted second order logic for finite structures. In Proceedings of the International Workshop on Logic and Computational Complexity (LCC '94) (Bloomington IN), D. Leivant, Ed., LNCS 960, 393-413.

DIX, J. 1992. Classifying semantics of disjunctive logic programs. In Proceedings JIC-SLP-92 (Washington DC, Nov.), 798-812.

DIX, J. and MOLLER, M. 1993. Implementing semantics of disjunctive logic programs using fringes and abstract properties. In Proceedings of the Second International Workshop on Logic Programming and Nonmonotonic Reasoning (LPNMR '93), (Lisbon, July), L.-M. Pereira and A. Nerode, Eds. MIT Press, Cambridge, MA, 43-59.

DIX, J., LOVELAND, D., MINKER, J., and WARREN, D., EDS. 1996. Disjunctive Logic Programming and Databases: Nonmonotonic Aspects, Seminar Report 150, SchloB Dagstuhl, Germany.

DOYLE, J. and PATIL, R. 1991. Two theses of knowledge representation: Language restrictions, taxonomic classification, and the utility of representation services. Artif. Intell. 48, 261-297.

EITER, T. and GOTTLOB, G. 1995a. Note on the complexity of some eigenvector problems. Tech. Rep. CD-TR 95/89, Christian Doppler Laboratory for Expert Systems, TU Vienna, Dec.

EITER, T. and GOOTLOB, G. 1995b. On the computational cost of disjunctive logic programming: Propositional case. Ann. Math. Artif. Intell. 15, 3/4, 289-323.

EITER, T., GOTTLOB, G., and GUREVICH, Y. 1996. Normal forms for second-order logic over finite structures, and classification of NP optimization problems. Ann. Pure App. Logic 78, 111-125.

EITER, T., GOTTLOB, G., and MANNILA, H. 1994. Adding Disjunction to Datalog. In Proceedings of PODS '94 (Minneapolis, MN, May 24-26), 267-278.

EITER, T., LEONE, N., and SACCA, D. 1996. Expressive power of partial models for disjunctive deductive databases. In Proceedings International Workshop on Logic in Databases (LID '96) (San Mimiato, Italy, July 1-2), D. Pedreschi and C. Zaniolo, Eds., LNCS 1154, 245-264.

FAGIN, R. 1974. Generalized first-order spectra and polynomial-time recognizable sets. In Complexity of Computation, R. M. Karp, Ed., AMS, Providence, RI, 43-74.

GELFOND, M. and LIFSCHITZ, V. 1988. The stable model semantics for logic programming. In Proceedings of the Fifth Logic Programming Symposium, MIT Press. Cambridge, MA, 1070-1080.

GELFOND, M. and LIFSCHITZ, V. 1991. Classical negation in logic programs and disjunctive databases. New Gen. Comput. 9, 365-385.

GOTTLOB, G. 1995. Translating default logic into standard autoepistemic logic. JACM 42, 4, 711-740.

GOTTLOB, G., LEONE, N., and VEITH, H. 1995. Second-order logic and the weak exponential hierarchies. In Proceedings of the Conference on Mathematical Foundations of Computer Science (MFCS-95), (Prague), LNCS 969, 66-81. Full paper available as CD-TR 95/80, Christian Doppler Lab for Expert Systems, TU Vienna.

GUREVICH, Y. 1988. Logic and the Challenge of Computer Science. In Trends in Theoretical Computer Science, Chapter 1, E. Borger, Ed., Computer Science Press, New York.

IMIELINSKI, T. and LIPSKI, W. 1984. The relational model of data and cylindric algebras. J. Comput. Syst. Sci. 28, 80-102.

IMMERMAN, N. 1986. Relational queries computable in polynomial time. Inf. Contr. 68, 86-104.

KANELLAKIS, P. 1990. Elements of relational database theory. In Handbook of Theoretical Computer Science, Volume B, J. van Leeuwen, Ed., Elsevier, Amsterdam, The Netherlands, Chapter 17.

KIFER, M., LAUSEN, G., and WU, J. 1995. Logical foundations of object-oriented and frame-based languages. JACM 42, 4, 740-843.

KOLAITIS, P. 1991. The expressive power of stratified logic programs. Inf. Comput. 90, 50-66.

KOLAITIS, P. and PAPADIMITRIOU, C. H. 1991. Why not negation by fixpoint? J. Comput. Syst. Sci. 43, 125-144.

KOLAITIS, P. and VARDI, M. 1992. Fixpoint logic vs. infinitary logic in finite-model theory. In Proceedings LICS-92 (Santa Cruz, CA, June 22-25), IEEE Computer Science Press, Los Alamitos, CA, 61-71.

KOLAITIS, P. and VARDI, M. 1995. On the expressive power of Datalog: Tools and a case study. J. Comput. Syst. Sci. 51, 1, 110.

KRENTEL, M. 1988. The complexity of optimization problems. J. Comput. Syst. Sci. 36, 490-509.

LEONE, N., RULLO, P., and SCARCELLO, F. 1995. Disjunctive stable models: Unfounded sets, fixpoint semantics and computation. Abstract in Proceedings of International Logic Programming Symposium (ILPS-95) (Portland, OR), MIT Press, New York, 399-413.

LIFSCHITZ, V. and TURNER, H. 1993b. Extended logic programs as autoepistemic theories. In Proceeding Second International Workshop on Logic Programming and Nonmonotonic Reasoning (LPNMR '93), L.-M. Pereira and A. Nerode, Eds., MIT Press, Cambridge, MA, (Lisbon, July), 101-114.

LIFSCHITZ, V. and TURNER, H. 1994. Splitting a logic program. In Proceedings of ICLP-94, (Santa Margherita Ligure, Italy, June), MIT Press, Cambridge, MA, 23-38.

LLOYD, J. 1984. Foundations of Logic Programming. Springer, Berlin.

LOBO, J., MINKER, J., and RAJASEKAR, A. 1992. Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, MA.

MAIER, D. and VANCE, B. 1993. A call to order. In Proceedings of PODS '93 (Washington, D.C., May 25-28), 1-16.

MCCARTHY, J. 1986. Applications of circumscription to formalizing common-sense knowledge. Artif. Intell. 28, 89-116.

MINKER, J. 1982. On indefinite data bases and the closed world assumption. In Proceedings of the Sixth Conference on Automated Deduction (CADE-82) (New York, NY, July 11-14), LNCS 138, 292-308.

MINKER, J. 1994. Overview of disjunctive logic programming. Ann. Math. Artif. Intell. 12, 1-24.

MOORE, R. 1985. Semantical considerations on nonmonotonic logics. Artif. Intell. 25, 75-94.

NERODE, A., NG, R., and SUBRAHMANIAN, V. S. 1995. Computing circumscriptive databases. I: Theory and algorithms. Inf. Comput. 116, 58-80.

PAPADIMITRIOU, C. 1984. The complexity of unique solutions. JACM 31, 492-500.

PAPADIMITRIOU, C. H. 1994. Computational Complexity. Addison-Wesley, Reading, MA.

PAPADIMITRIOU, C. and YANNAKAKIS, M. 1985. A note on succinct representations of graphs. Inf. Comput. 71, 181-185.

PRZYMUSINSKI, T. 1988. On the declarative and procedural semantics of stratified deductive databases. In Foundations of Deductive Databases and Logic Programming, J. Minker, Ed., Morgan-Kaufmann, San Mateo, CA, 193-216.

PRZYMUSINSKI, T. 1991. Stable semantics for disjunctive programs. New Gen. Comput. 9, 401-424.

PRZYMUSINSKI, T. C. 1994. A knowledge representation framework based on autoepistemic logic of minimal beliefs. In Proceedings of the Twelfth National Conference on Artificial Intelligence, AAAI-94 (Seattle, Aug.), AAAI, Morgan-Kaufmann, San Mateo, CA, 952-959.

PRZYMUSINSKI, T. C. 1995. Static semantics for normal and disjunctive logic programs. Ann. Math. Artif. Intell. 14, 323-357.

REITER, R. 1980. A logic for default reasoning. Artif. Intell. 13, 81-132.

Ross, K. 1994. Modular stratification and magic sets for Datalog programs with negation. JACM 41, 6, 1216-1267.

SACCA, D. 1997. The expressive powers of stable models for bound and unbound DATALOG queries. J. Comput. Syst. Sci. 54, 3 (June), 441-464.

SACCA, D. and ZANIOLO, C. 1990. Stable models and non-determinism in logic programs with negation. In Proceedings PODS '90 (Nashville, TN, April 2-4), 205-218.

SAKAMA, C. and INOUE, K. 1993. Relating disjunctive logic programs to default theories. In Proceedings of the Second International Workshop on Logic Programming and Nonmonotonic Reasoning (LPNMR '93) (Lisbon, July) L.-M. Pereira and A. Nerode, Eds., MIT Press, Cambridge, MA, 266-282.

SAKAMA, C. and INOUE, K. 1994. An alternative approach to the semantics of disjunctive logic programs and deductive databases. J. Autom. Reason. 13, 145-172.

SAKAMA, C. and INOUE, K. 1995. Paraconsistent stable semantics for extended disjunctive programs. J. Logic Comput. 5, 3, 265-285.

SCHLIPF, J. 1987. Decidability and definability with circumscription. Ann. Pure App. Logic 35, 173-191.

SCHLIPF, J. 1995a. A survey of complexity and undecidability results in logic programming. Ann. Math. Artif. Intell. 15, 3/4, 257-288.

SCHLIPF, J. 1995b. The expressive powers of logic programming semantics. J. Comput. Syst. Sci. 51, 1, 64-86.

SEIPEL, D. 1994. Non-monotonic reasoning based on minimal models and its efficient implementation. In Proceedings of the Thirteenth IFIP World Computer Congress, Workshop F62: Disjunctive Logic Programming and Disjunctive Databases (Hamburg, Aug.), 53-60.

STEWART, I. 1991. Complete problems involving Boolean labelled structures and projection transactions. J. Logic Comput. 1, 6, 861-882.

STOCKMEYER, L. J. 1977. The polynomial-time hierarchy. Theor. Comput. Sci. 3, 1-22.

TURNER, H. 1996. Representing actions in default logic: A situation calculus approach. In Proceedings of Common Sense '96.

ULLMAN, J. D. 1989. Principles of Database and Knowledge Base Systems. CS Press, Rockville, MD.

VAN GELDER, A., Ross, K., and SCHLIPF, J. 1991. The well-founded semantics for general logic programs. JACM 38, 3, 620-650.

VARDI, M. 1982. Complexity of relational query languages. In Proceedings of the 14th ACM STOC (San Francisco, CA), 137-146.

VEITH, H. 1994. Logical reducibilities in finite model theory. Master's Thesis, Information Systems Department, TU Vienna, Austria, Sept.

VEITH, H. 1996. Succinct representation and leaf languages. In Proceedings of the 11th Annual IEEE Conference on Computational Complexity, (Philadelphia, PA, May) 118-126.

WOLFINGER, B. ED. 1994. 13th IFIP World Computer Congress, Proceedings Workshop FG2: Disjunctive Logic Programming and Disjunctive Databases (Hamburg, Aug.), Springer, Berlin.

YAHYA, A. and HENSCHEN, L. 1985. Deduction in non-Horn databases. J. Autom. Reason. 1, 2, 141-160.

You, J.-H. and YUAN, L.-Y. 1993. Autoepistemic circumscription and logic programming. J. Autom. Reason. 10, 143-160.

You, J.-H. and YUAN, L.-Y. 1995. On the extension of logic programming with negation through uniform proofs. In Proceedings Third International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR '95), (Lexington KY, July), LNCS 928, W. Marek, A. Nerode, and M. Truszczynski, Eds., 231-244.

Received February 1996; revised December 1996; accepted December 1996

A. APPENDIX: PERFECT MODELS

The perfect models of a program [Pi] are defined as follows (cf. Przymusinski [1988]). A priority relation [is less than] is defined on [HB.sub.[Pi]] using an auxiliary relation [is less than or equal to] as follows. For each clause

[a.sub.1][disjunction] . . . [disjunction][a.sub.n] [is less than] [b.sub.1], . . ., [b.sub.k], ?? [c.sub.1], . . . , ?? [c.sub.m]

from ground([Pi]), it holds that

(i) [a.sub.i] [is less than] [c.sub.j], for all 1 [is less than or equal to] i [is less than or equal to] n, 1 [is less than or equal to] j [is less than or equal to] m,

(ii) [a.sub.i] [b.sub.j], for all 1 [is less than or equal to] i [is less than or equal to] n, 1 [is less than or equal to] j [is less than or equal to] k, and

(iii) [a.sub.i] [a.sub.j] for all 1 [is less than or equal to] i, j [is less than or equal to] n.

Now [is less than] and [is less than or equal to] are the smallest extensions to (i) through (iii) that satisfy a [is less than] b ?? a [is less than or equal to] b and are closed under transitivity; that is, a [is less than or equal to] b, b [is less than or equal to] c ?? a [is less than or equal to] c and a [is less than or equal to] b, b [is less than] c (resp., d [is less than] a) ?? a [is less than] c (resp., d [is less than] b). Intuitively, a [is less than] b means that b has higher priority than a.

Now define the preference order [much less than] on the models of [Pi] by [M.sub.1] [much less than] [M.sub.2] iff [M.sub.1] [is not equal to] [M.sub.2] and for each a [element of] [M.sub.1] - [M.sub.2] there exists b [element of [M.sub.2] - [M.sub.1] such that a [is less than] b.

Definition A.1 A model M of [Pi] is perfect iff there is no model M' of [Pi] such that M' [much less than] M; the collection of all perfect models of [Pi] is denoted by PM([Pi]).

Notice that [M.sub.1] [subset] [M.sub.2] implies [M.sub.1] [much less than] [M.sub.2]; thus, PM([Pi]) [subset or equal to] MM([Pi]).

This paper substantially augments and improves an extended abstract presented at PODS '94 [Eiter et al. 1994]. Part of H. Mannila's work was carried out while visiting the Technical University of Vienna.

Authors' addresses: T. Eiter, Computer Science Group, Faculty of Mathematics, University of Giessen, Arndtstrasse 2, D-35392 Germany; email: <Thomas.Eiter@informatik.uni-giessen.de>; G. Gottlob, Information Systems Department, Technical University of Vienna, Paniglgasse 16, A-1040 Wien, Austria; email: <<eiter|gottlob)@dbai.tuwien.ac.at>; H. Mannila, Department of Computer Science, University of Helsinki, P.O. Box 26, FIN-00014 Helsinki, Finland. email: (mannila@cs.helsinki.fi).

Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee.

Printer friendly Cite/link Email Feedback | |

Title Annotation: | database query language |
---|---|

Author: | Eiter, Thomas; Gottlob, Georg; Mannila, Heikki |

Publication: | ACM Transactions on Database Systems |

Date: | Sep 1, 1997 |

Words: | 29984 |

Previous Article: | Transactional client-serve cache consistency: alternatives and performance. |

Next Article: | ProbView: a flexible probabilistic database system. |

Topics: |