# Complex quantification in Structured Query Language (SQL): a tutorial using relational calculus.

The Structured Query Language (SQL) forms a substantial component of introductory database courses and is supported by almost every commercial database product. One disadvantage of SQL is that it does not provide a universal quantification construct. Queries that have twisted universal and existential quantifiers can be stunning for students, practitioners, or even instructors. Universal quantification exists in natural languages and proliferates in our daily logic. Experience shows that students can infer the rigorous logic, such as the tuple relational calculus, from natural language queries, but find it cumbersome to translate it to SQL. To bridge this gap, this article develops a systematic method to translate tuple relational calculus queries to SQL. This is accomplished by introducing the SQL-Normal-From of tuple relational calculus from which generating SQL code is straightforward. The approach is illustrated by a series of examples. This method was voluntarily adopted by a vast majority of students when it was introduced in a third-year introductory course on database systems.**********

SQL is the most commonly used language in commercial relational database management systems, such as Oracle, SQL Server, and Ingress, and forms a substantial component of undergraduate courses on database systems (Date, 2000; Elmasri & Navathe, 2000; Korth, Silberschatz, & Sudarshan, 1999). As noted by Date (1984), SQL has many shortcomings. One major disadvantage of SQL is its lack of a direct support to a universal quantification construct. The for all and every English phrases can be satisfied in SQL through negating its existential quantifier construct EXISTS. However, when these phrases become nested and twisted with each other and with existential phrases, translating the English query to SQL can be a very subtle process, even for the most experienced. Quantification in SQL is far more complex than joining and grouping. Yet, Resiner (1981) reported that users even have some difficulties using the SQL join and group constructs. Date (2000) proposed some work techniques to avoid the use of quantification in SQL. Nevertheless, these proposed techniques require other constructs that are not supported in many commercial implementations of SQL, as Date (2000) warned. Thus, the use of quantifiers in SQL is unavoidable.

The use of universal quantifiers to comprehend and express the for all and every phrases is more natural and intuitive than negating existential quantifiers. Consider the following statement:

Every company, which is destroying at least one forest, is savage, and every person who lives in Canada is concerned (1) about such companies.

This statement could be equivalently written in English, eliminating all universal quantifiers (every phrases) as follows:

It is not true that there is a company that is destroying at least one forest and this company is not savage, or there is a person who lives in Canada and who is not concerned about that company.

The simplicity of the use of universal versus existential quantifiers in this example is evident. This is still true even after resolving the ambiguity that results from determining the scope of "It is not true that." Moreover, only "bracketing" the rest of the statement, something that requires more than English, can eliminate this ambiguity.

The use of implications with universal quantifiers is also a natural and intuitive way to express the for all and every English phrases. Furthermore, programmers, including students in the Computing disciplines, are accustomed to the use of if-then constructs. Rewriting the example with implications (if-then), results in the following:

For every c, if c is a company that is destroying at least one forest f then c is savage and for every p, if p is a person living in Canada then p is concerned about c.

It can be easily confirmed that this statement is easier to express and understand than the following, which does not use implications, although it is logically equivalent to the previous one:

For every c, either c is not a company or there is no forest f that c is destroying or c is savage, and for every p, either p is not a person or p does not live in Canada or p is not concerned about c.

Experience shows that students find it easier to translate English phrases to first-order predicate logical expressions than to SQL. This is due to the fact that universal quantification is abundant in our natural language use, but expressing it in SQL requires the negation of existential quantification, which can be quite unintuitive. The tuple relational calculus theoretical query language (Codd, 1972) is among the concepts that students learn in the same course with SQL. It uses first-order predicate logic and can be used as an intermediate step between English queries and SQL (Date, 1992).

This tutorial presents a systematic, easy-to-follow method to translate relational calculus expressions to SQL queries. The approach is demonstrated by example, using a handful of English queries with complicated quantification structures. These are first rewritten in tuple relational calculus using both kinds of quantification and using implication with universal quantifiers. Then, a sequence of well-defined steps to translate these calculus expressions to a special form, which we call the SQL-Normal-Form, makes the generation of the SQL code automatic.

We found this method extremely helpful for undergraduate students in a third-year course on database systems, and we presume it is so for practitioners who are involved in this kind of logically twisted queries and for instructors as well.

Before introducing the translation method in the section. "Translation to SQL" some required definitions and preliminaries are reviewed in "Preliminaries." The section "Example Database," contains an example database, on which the example queries of "More Examples" are based. The final section concludes the article.

PRELIMINARIES

In a relational data model, a relation schema is a set of attributes [a.sub.i] denoted by [R.sub.s]([a.sub.1], [a.sub.2], ..., [a.sub.n]). The domain [D.sub.i] of attribute [a.sub.i] is a specific set of values that can be assigned to [a.sub.i]. An [R.sub.s] tuple t is a sequence of values <[v.sub.1], [v.sub.2], ..., [v.sub.n]> where the value [v.sub.i] is taken from [D.sub.i]. A relation R, an instance of relation schema [R.sub.s], is a set of [R.sub.s] tuples. If R = {[t.sub.1], [t.sub.2], ..., [t.sub.m]}, then [t.sub.1], [t.sub.2], ..., [t.sub.m] are said to range on R.

Relational Calculus

Relational calculus is a nonprocedural formal query language. The variant of relational calculus that is used in this tutorial is the tuple relational calculus. The first formal definition for tuple relational calculus is due to Codd (1972), who also developed an algorithm to translate from relational calculus to relational algebra, another theoretical query language. A tuple relational calculus expression is defined by the following series of definitions, starting with the definition of an atom (Elmasri & Navathe, 2000).

Definition. An atom is any of the following:

* R(t) where R is a relation and t is a tuple variable.

* [t.sub.i].a [omega] [t.sub.j].b where [omega] [member of] {=, [not equal to], <, [less than or equal to], >, [greater than or equal to]}, a and b are attributes of the relations on which [t.sub.i] and [t.sub.j] range, respectively.

* t.a [omega] c or c [omega] t.a where t, [omega], and a are as defined, and c is a constant value.

A formula is a condition that consists of n [greater than or equal to] 1 atoms combined by the logical operators: [conjunction] (conjunction), [disjunction] (disjunction), [right arrow] (implication), and [logical not] (negation). In addition, [for all], the universal quantifier, and [there exists], the existential quantifier, can occur in formulas. The following is a more precise definition.

Definition. If F, [F.sub.1], and [F.sub.2] are formulas and t a tuple variable, then each of the following is a formula:

* An atom

* (F)

* [F.sub.1] [conjunction] [F.sub.2]

* [F.sub.1] [disjunction] [F.sub.2]

* [F.sub.1] [right arrow] [F.sub.2]

* [logical not] F

* ([for all]t) (F(t))

* ([there exists]t) (F(t))

Next, the formal definition of a tuple relational calculus expression is stated.

Definition. A tuple relational calculus expression N is of the form

N = {[t.sub.1].[a.sub.1], [t.sub.2].[a.sub.2], ..., [t.sub.n].[a.sub.n] | F([t.sub.1], [t.sub.2], ..., [t.sub.n], [t.sub.n+1], [t.sub.n+2], ..., [t.sub.n+m])}

where [t.sub.i] are tuple variables, [a.sub.i] are attributes of the relation on which the tuples [t.sub.i] range, and F is a formula.

SQL

In general, an SQL query has the following form (Date & Darwen, 1993):

SELECT [a.sub.1], [a.sub.2], ..., [a.sub.n] FROM [R.sub.1], [R.sub.2], ..., [R.sub.m] WHERE P

where [a.sub.i], [R.sub.j], and P respectively are attributes, relations, and a predicate.

Predicates may consist of constructs that contain further (nested) SQL queries, called correlated queries. One such construct which is of special interest in this tutorial is the EXISTS construct. EXISTS takes an SQL correlated query as an argument and evaluates to false only if its argument returns no values (the query has an empty result). Otherwise, it evaluates to true.

An SQL query with an EXISTS construct is of the following general form:

SELECT [a.sub.1], [a.sub.2], ..., [a.sub.n] FROM [R.sub.1], [R.sub.2], ..., [R.sub.m] WHERE EXISTS (SELECT [a'.sub.1], [a'.sub.2], ..., [a'.sub.n] FROM [R'.sub.1], [R'.sub.2], ..., [R'.sub.m] WHERE P)

where [a'.sub.i], [R.sub.j], and P respectively are attributes, relations, and a predicate. The predicate P possibly consists of further SQL constructs and correlated queries, and henceforth. Since SQL does not have a universal quantification construct, programmers must negate its EXISTS construct to capture universal quantifiers.

EXAMPLE DATABASE

Figure 1 depicts an ER diagram of a simple forestry database (Bradley, 1982). The database consists of five entity types: Species, Forest, Tree, Employee, and Measurement. It also consists of four relationship types: of, on, in, and works. It is intended to maintain data on forests where trees of different species exist. Employees who work in a certain forest carry measurements on the trees in that forest.

Semantics of Entity Types

* Species attributes are the species name (sname), bark color (bcolor), and maximum height (maxht), sname is Species primary attribute.

* A Tree has the attributes tree number (tree#) and the year it was planted (planted). Its primary attribute is tree#.

* Forest is described by a name (fname), area (size), location (loc) and an altitude. Forest's primary attribute is fname.

* Each Employee has a number (emp#), year employed (year), and a salary. The attribute emp# is the Employee's primary attribute.

* Measurement is described by a measurement number (meas#), year and month performed, trunk size (trunk), height, and the number of branches (branch). The Measurement's primary attribute is meas#.

[FIGURE 1 OMITTED]

Semantics of Relationship Types

* in is a 1:N relationship type relating Forest to Tree. Several trees could be in a certain forest, but a tree can exist in one forest only.

* of is a 1:N relationship type relating Tree to Species. Many trees could be of a certain species, and a tree is of one species only.

* on is a 1:N relationship type relating Tree to Measurement. A tree can have many measurements performed on it, but a measurement is performed on one tree only.

* works is a 1:N relationship type relating Forest to Employee. Several employees may work in a particular forest, and an employee works in one forest only.

The translation of the ER diagram to relation schema is given in Figure 2.

[FIGURE 2 OMITTED]

Running Example

In the following sections, the following query (Query 1) is used as a running example to illustrate the method introduced in this article. The section "Translation to SQL" will provide more examples of varying complexities.

Query 1: Get the species name of each species with red bark were one and all trees of that species where planted before 1985.

The Relational calculus expression for Query 1, [N.sub.1], is:

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([for all]t)(Tree(t) [conjunction] t.sname = s.sname) [right arrow] t.planted < 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

TRANSLATION TO SQL

This section introduces the systematic procedure to translate tuple relational calculus expressions to SQL queries. The key idea is to rewrite these expressions in a special form, which we call the SQL-Normal-Form, and abbreviate it SQLNF. Once in SQLNF, a relational calculus expression can be translated to SQL easily.

SQLNF

The main idea in SQLNF is to eliminate universal quantifiers from the relational calculus expression. first, the following equivalences must be recalled where F, [F.sub.1], and [F.sub.2] are formulas and t is a tuple variable.

Logical Equivalences

* Implication: ([F.sub.1] [right arrow] [F.sub.2]) [left and right arrow] ([logical not] [F.sub.1] [disjunction] [F.sub.2])

* Universal Quantification: ([for all]t)(F(t)) [left and right arrow] ([logical not][there exists]t) [logical not] (F(t))

* DeMorgan's: [logical not]([F.sub.1] [conjunction] [F.sub.2]) [left and right arrow] ([logical not][F.sub.1] [disjunction] [logical not][F.sub.2]) and [logical not]([F.sub.1] [disjunction] [F.sub.2]) [left and right arrow] ([logical not][F.sub.1] [conjunction] [logical not][F.sub.2])

* Double Negation: [logical not] [logical not] F [left and right arrow] F

The highlighted negation in the formula ([logical not][there exists]t)[logical not](F(t)) is referred by a sandwiched negation.

Definition. A formula F is in SQLNF if it does not contain each of the following:

* Double Negation

* Universal Quantification

* Implication

* Sandwiched Negation

Each one of these undesired forms can be eliminated by one of the equivalences previously stated. In particular, the sandwiched negation can be eliminated by applying DeMorgan's (i.e., replacing ([logical not][there exists]t)[logical not](F(t)) by ([logical not][there exists]t)([logical not]F(t))).

Definition. A tuple relational calculus expression

N = {[t.sub.1].[a.sub.1], [t.sub.2].[a.sub.2], ..., [t.sub.n].[a.sub.n]|F([t.sub.1], [t.sub.2], ..., [t.sub.n], [t.sub.n+1], [t.sub.n+2], ..., [t.sub.n+m])}

is in SQLNF if and only if F is in SQLNF.

The following normalization procedure puts an tuple relation calculus expression N in SQLNF.

Normalization Procedure

1. Eliminate all implications (by applying Implication).

2. Eliminate all universal quantifiers (by applying Universal Quantification).

3. Eliminate double negation (by applying Double Negation).

4. Eliminate sandwiched negation (by applying DeMorgan's).

This process is referred to as normalization and an expression that is in SQLNF is said to be normalized. This normalization procedure is illustrated through Query 1 given in the previous section.

Normalizing Query 1

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([for all]t)((Tree(t) [conjunction] t.sname = s.sname) [right arrow] t.planted < 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

1. Eliminate implications:

[N.sub.1]'s only implication

(Tree(t) [conjunction] t.sname = s.sname) [right arrow] t.planted < 1985

is replaced by

[logical not] (Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985

which yields the following

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([for all]t)([logical not](Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

2. Eliminate universal quantifiers:

The universal quantifier

([for all]t)([logical not](Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985)

is replaced by

([logical not][there exists]t) [logical not] ([logical not](Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985)

resulting in

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([logical not][there exists]t) [logical not] ([logical not](Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

There are no further universal quantifiers in [N.sub.1].

3. Eliminate sandwiched negation:

The sandwiched negation

([logical not][there exists]t) [logical not] ([logical not](Tree(t) [conjunction] t.sname = s.sname) [disjunction] t.planted < 1985)

is moved inside the brackets and DeMorgan's is applied, yielding

([logical not][there exists]t)([logical not] [logical not](Tree(t) [conjunction] t.sname = s.sname) [conjunction] t.planted [greater than or equal to] 1985)

The query becomes

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([logical not][there exists]t) ([logical not] [logical not](Tree(t) [conjunction] t.sname = s.sname) [conjunction] t.planted [greater than or equal to] 1985) [conjunction]

([there exists]t) (Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

4. Eliminate double negation:

Finally, the double negation [logical not] [logical not] (Tree(t)...) is simply dropped out, resulting in the following normalized query expression

[N.sub.1] = {s.sname|Species(s) [conjunction] s.bcolor = RED [conjunction]

([logical not][there exists]t) ((Tree(t) [conjunction] t.sname = s.sname) [conjunction] t.planted [greater than or equal to] 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

SQL Queries

Given an expression N in SQLNF, N = {[t.sub.1].[a.sub.1],[t.sub.2].[a.sub.2], ..., [t.sub.n].[a.sub.n]|F([t.sub.1],[t.sub.2], ...,[t.sub.n],[t.sub.n+1], [t.sub.n+2], ..., [t.sub.n+m])}, the translation to SQL is straightforward. The list [t.sub.1].[a.sub.1],[t.sub.2].[a.sub.2], ...,[t.sub.n].[a.sub.n] is called the left part of N and F([t.sub.1],[t.sub.2], ..., [t.sub.n],[t.sub.n+1],[t.sub.n+2], ...,[t.sub.n+m]) is called the right part of N. Let [R.sub.i](t) and [R.sub.j](t) denote relations with tuple variable t in F. Translating expressions in SQLNF is accomplished through the following procedure.

Translation Procedure

1. The SELECT clause consists of the left part.

2. The FROM clause consists of each [R.sub.i](t) in the right part, where [R.sub.i](t) is not within the scope of a quantifier.

3. The WHERE clause consists of F translated to SQL constructs, excluding each [R.sub.i](t) that is not within the scope of a quantifier. Since F is in SQLNF it must only consist of existential quantifiers and logical operators that exist in SQL. For each occurrence of an [R.sub.j](t), we should expect a correlated SQL query having [R.sub.j] in its FROM clause.

This procedure is best explained by example. Below is the translation of the normalized Query 1. Recall the normalized relational calculus expression of Query 1:

[N.sub.1] = {s.sname | Species(s) [conjunction] s.bcolor = RED [conjunction]

([logical not][there exists]t) ((Tree(t) [conjunction] t.sname = s.sname) [conjunction] t.planted [greater than or equal to] 1985) [conjunction]

([there exists]t)(Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1985)}

The left part of [N.sub.1] is s.sname and the SQL query must start with SELECT s.sname. The FROM clause is FROM Species s since Species(s) is not bound by a quantifier. The WHERE clause consists of the rest of the right part of [N.sub.1], starting with the condition t.planted [greater than or equal to] 1985, and translating to WHERE t.planted [greater than or equal to] 1985. The following part of the query [conjunction] (([logical not][there exists]t) ((Tree(t) [conjunction] t.sname = s.sname) [conjunction] t.planted [greater than or equal to] 1985)) starts with ([logical not][there exists]t), where t is a Tree. This translates to AND NOT EXISTS (SELECT * FROM TREE t WHERE t.sname = s.sname AND t.planted [greater than or equal to] 1985). Similarly the last part translates to AND EXISTS (SELECT * FROM TREE t WHERE t.sname = s.sname AND t.planted < 1985). The complete query and its corresponding relational calculus expression is given next.

MORE EXAMPLES

This section presents more example queries, with an increasing level of complexity. Queries 2 to 5 are stated in plain English and tuple relational calculus. Each is normalized and then systematically translated to SQL, following the procedure outlined in the previous sections.

Query 2

Get the name and location of each forest under 1000 square meters in which every tree was planted before 1960 and has had at least one measurement showing a height over 20 meters.

Relational Calculus Expression for Query 2

[N.sub.2] = {f.fname, f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([for all]t)((Tree(t) [conjunction] t.fname = f.fname) [right arrow] (t.planted < 1960 [conjunction]

([there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

Normalizing Query 2

1. Eliminate implications:

[N.sub.2] = {f.fname,f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([for all]t)([logical not](Tree(t) [conjunction] t.fname = f.fname) [disjunction] (t.planted < 1960 [conjunction]

([there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

2. Eliminate universal quantifiers:

[N.sub.2] = {f.fname,f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([logical not][there exists]t) [logical not]([logical not](Tree(t) [conjunction] t.fname = f.fname) [disjunction] (t.planted < 1960 [conjunction]

([there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

3. Eliminate sandwiched negation:

[N.sub.2] = {f.fname,f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([logical not][there exists]t)([logical not][logical not](Tree(t) [conjunction] t.fname = f.fname) [conjunction] [logical not] (t.planted < 1960 [conjunction]

([there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

4. Eliminate double negation:

[N.sub.2] = {f.fname,f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([logical not][there exists]t)((Tree(t) [conjunction] t.fname = f.fname) [conjunction] [logical not] (t.planted < 1960 [conjunction]

([there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

Note that [N.sub.2] is in SQLNF at this stage. It is safe to stop here or apply DeMorgan's one more time to get the following expression:

[N.sub.2] = {f.fname,f.loc|Forest(f) [conjunction] f.size < 1000 [conjunction]

([logical not] [there exists]t)((Tree(t) [conjunction] t.fname = f.fname) [conjunction] (t.planted [greater than or equal to] 1960 [disjunction]

([logical not][there exists]m)(Measeurement(m) [conjunction] m.tree# = t.tree# [conjunction] m.height > 20)))}

Both expressions are in SQLNF and both should translate to SQL systematically.

Query 3

Get the name and bark color for each species with a maximum height exceeding 30 meters such that every tree of that species that was planted before 1960 has had every measurement showing a height greater than 15 meters.

Relational Calculus Expression for Query 3

[N.sub.3] = {s.sname, s.bcolor | Species(s) [conjunction] s.maxht > 30 [conjunction]

([for all]t)((Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1960) [right arrow]

([for all]m)((Measurement(m) [conjunction] m.tree# = t.tree#) [right arrow] m.height > 15))}

Normalizing Query 3

1. Eliminate implications:

[N.sub.3] = {s.sname, s.bcolor | Species(s) [conjunction] s.maxht > 30 [conjunction]

([for all]t)([logical not](Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1960) [disjunction]

([for all]m)([logical not](Measurement(m) [conjunction] m.tree# = t.tree#) [disjunction] m.height > 15))}

2. Eliminate universal quantifiers:

[N.sub.3] = {s.sname, s.bcolor | Species(s) [conjunction] s.maxht > 30 [conjunction]

([logical not][there exists]t)[logical not]([logical not](Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1960) [disjunction]

([logical not][there exists]m)[logical not]([logical not](Measurement(m) [conjunction] m.tree# = t.tree#) [disjunction] m.height > 15))}

3. Eliminate sandwiched and double negation:

[N.sub.3] = {s.sname, s.bcolor | Species(s) [conjunction] s.maxht > 30 [conjunction]

([logical not][there exists]t)((Tree(t) [conjunction] t.sname = s.sname [conjunction] t.planted < 1960) [conjunction]

([there exists]m)((Measurement(m) [conjunction] m.tree# = t.tree#) [conjunction] m.height [less than or equal to] 15))}

Query 4

Get the name and maximum height of each species that occurs in every Alberta forest over 1500 meters in altitude and in which there is at least one employee who has been hired in 1993.

Relational Calculus Expression for Query 4

[N.sub.4] = {s.sname, s.maxht | Species(s) [conjunction]

([for all]f)((Forest(f) [conjunction] f.altitude > 1500 [conjunction] f.loc = ALBERTA) [right arrow]

(([there exists]t)(Tree(t) [conjunction] t.fname = f.fname [conjunction] t.sname = s.sname) [conjunction]

([there exists]e)(Employee(e) [conjunction] e.fname = f.fname [conjunction] e.year = 1993)))}

Normalizing Query 4

1. Eliminate implications:

[N.sub.4] = {s.sname, s.maxht | Species(s) [conjunction]

([for all]f)([logical not](Forest(f) [conjunction] f.altitude > 1500 [conjunction] f.loc = ALBERTA)[disjunction]

(([there exists]t)(Tree(t) [conjunction] t.fname = f.fname [conjunction] t.sname = s.sname) [conjunction]

([there exists]e)(Employee(e) [conjunction] e.fname = f.fname [conjunction] e.year = 1993)))}

2. Eliminate universal quantifiers:

[N.sub.4] = {s.sname, s.maxht | Species(s) [conjunction]

([logical not][there exists]f)[logical not]([logical not](Forest(f) [conjunction] f.altitude > 1500 [conjunction] f.loc = ALBERTA) [disjunction]

(([there exists]t)(Tree(t) [conjunction] t.fname = f.fname [conjunction] t.sname = s.sname) [conjunction]

([there exists]e)(Employee(e) [conjunction] e.fname = f.fname [conjunction] e.year = 1993)))}

3. Eliminate sandwiched and double negation:

[N.sub.4] = {s.sname, s.maxht | Species(s) [conjunction]

([logical not][there exists]f) ((Forest(f) [conjunction] f.altitude > 1500 [conjunction] f.loc = ALBERTA) [conjunction]

(([logical not][there exists]t)(Tree(t) [conjunction] t.fname = f.fname [conjunction] t.sname = s.sname) [disjunction]

([logical not][there exists]e)(Employee(e) [conjunction] e.fname = f.fname [conjunction] e.year = 1993)))}

Query 5

Get the name and size of each forest in Alberta in which every tree was planted before 1960 and has had all but one measurements showing a height over 20 meters.

Relational Calculus Expression for Query 5

[N.sub.5] = {f.fname, f.size | Forest(f) [conjunction] f.loc = ALBERTA [conjunction]

([for all]t)((Tree(t) [conjunction] t.fname = f.fname) [right arrow] (t.planted < 1960 [conjunction]

([for all][m.sub.1])((Measurement([m.sub.1]) [conjunction] [m.sub.1].tree# = t.tree# [conjunction] [m.sub.1].height [less than or equal to] 20) [right arrow]

(([for all][m.sub.2])((Measurement([m.sub.2]) [conjunction] [m.sub.2].tree# = t.tree# [conjunction] [m.sub.1].meas# [not equal to] [m.sub.2].meas#) [right arrow]

([m.sub.2].height > 20))) [conjunction] [there exists]m)(Measurement(m) [conjunction] t.tree# = m.tree# [conjunction] m.height [less than or equal to] 20)))}

Normalizing Query 5

1. Eliminate implications:

[N.sub.5] = {f.fname, f.size | Forest(f) [conjunction] f.loc = ALBERTA [conjunction]

([for all]t) ([logical not](Tree(t) [conjunction] t.fname = f.fname) [disjunction] (t.planted < 1960 [conjunction]

([for all][m.sub.1])([logical not](Measurement([m.sub.1]) [conjunction] [m.sub.1].tree# = t.tree# [conjunction] [m.sub.1].height [less than or equal to] 20) [disjunction]

(([for all][m.sub.2])([logical not](Measurement([m.sub.2]) [conjunction] [m.sub.2].tree# = t.tree# [conjunction] [m.sub.1].meas# [not equal to] [m.sub.2].meas#) [disjunction]

([m.sub.2].height > 20))) [conjunction] ([there exists]m)(Measurement(m) [conjunction] t.tree# = m.tree# [conjunction] m.height [less than or equal to] 20)))}

2. Eliminate universal quantifiers:

[N.sub.5] = {f.fname, f.size | Forest(f) [conjunction] f.loc = ALBERTA [conjunction]

([logical not][there exists]t)[logical not]([logical not](Tree(t) [conjunction] t.fname = f.fname) [disjunction] (t.planted < 1960 [conjunction]

([logical not][there exists][m.sub.1])[logical not]([logical not](Measurement([m.sub.1]) [conjunction] [m.sub.1].tree# = t.tree# [conjunction] [m.sub.1].height [less than or equal to] 20) [disjunction]

(([logical not][there exists][m.sub.2])[logical not]([logical not](Measurement([m.sub.2]) [conjunction] [m.sub.2].tree# = t.tree# [conjunction] [m.sub.1].meas# [not equal to] [m.sub.2].meas#) [disjunction]

([m.sub.2].height > 20))) [conjunction] ([there exists]m)(Measurement(m) [conjunction] t.tree# = m.tree# [conjunction] m.height [less than or equal to] 20)))}

3. Eliminate sandwiched and double negation:

[N.sub.5] = {f.fname, f.size | Forest(f) [conjunction] f.loc = ALBERTA [conjunction]

([logical not][there exists]t) ((Tree(t) [conjunction] t.fname = f.fname) [conjunction] [logical not] (t.planted < 1960 [conjunction]

([logical not][there exists][m.sub.1]) ((Measurement([m.sub.1]) [conjunction] [m.sub.1].tree# = t.tree# [conjunction] [m.sub.1].height [less than or equal to] 20) [conjunction]

(([there exists][m.sub.2]) ((Measurement([m.sub.2]) [conjunction] [m.sub.2].tree# = t.tree# [conjunction] [m.sub.1].meas# [not equal to] [m.sub.2].meas#) [conjunction] [logical not]

([m.sub.2].height > 20))) [disjunction] ([logical not][there exists]m)(Measurement(m) [conjunction] t.tree# = m.tree# [conjunction] m.height [less than or equal to] 20)))}

Expression N5 is already in SQLNF and it is safe to stop here. However, it can be normalized (simplified) further by applying DeMorgan's and Double Negation again, yielding:

[N.sub.5] = {f.fname, f.size | Forest(f) [conjunction] f.loc = ALBERTA [conjunction]

([logical not][there exists]t) ((Tree(t) [conjunction] t.fname = f.fname) [conjunction] (t.planted < 1960 [disjunction]

([there exists][m.sub.1] ((Measurement([m.sub.1]) [conjunction] [m.sub.1].tree# = t.tree# [conjunction] [m.sub.1].height [less than or equal to] 20) [conjunction]

(([there exists][m.sub.2]) ((Measurement([m.sub.2]) [conjunction] [m.sub.2].tree# = t.tree# [conjunction] [m.sub.1].meas# [not equal to] [m.sub.2].meas#) [conjunction]

([m.sub.2].height [less than or equal to] 20))) [disjunction] ([logical not][there exists]m)(Measurement(m) [conjunction] t.tree# = m.tree# [conjunction] m.height [less than or equal to] 20)))}

CONCLUSION

This article presented a tutorial to simplify the writing of complex SQL queries. Precisely, it presented a method by which SQL queries that contain mixed and interrelated complex quantification constructs, universal and existential, can be substantially simplified. The method starts by writing the English query in tuple relational calculus, something that students have relatively less trouble coping with. Then, the calculus expression is transformed into a form, called SQL normal form. Calculus expressions that are in this form only contain constructs that exist in SQL. Therefore, the generation of the SQL query is simple. In particular, universal quantification and implication must be eliminated from the expression before it is translated to SQL. These constructs are intuitive and they can make the logical expression easier to comprehend and write. However, since they do not exist in SQL, the normalization procedure must be followed.

We have introduced the method for students in a first course in databases, typically at the third year level, during different academic years. They have been advised to voluntarily follow the approach, if they find it helpful. With the exception of a negligible minority, all students found the method very handy and used it systematically throughout the course, without being required to do so. This is an indication of the usefulness of the method discussed here.

Translating Query 1 Query 1 [N.sub.1] = { SELECT s.sname s.sname | FROM Species s Species (s) WHERE s.bcolor = 'RED' [conjunction] s.bcolor = RED AND NOT EXISTS [conjunction] ([logical not] [there exists]t) (SELECT * ( FROM TREE t Tree(t) WHERE t.sname = s.sname [conjunction] t.sname = s.sname) AND t.planted [conjunction] t.planted [greater than or equal to] [greater than or equal to] 1985) 1985) AND EXISTS [conjunction] ([there exists]t) (SELECT * ( FROM TREE t (Tree(t) WHERE t.sname = s.sname [conjunction] t.sname = s.sname) AND t.planted < 1985) [conjunction] t.planted [greater than or equal to] 1985)} Translating Query 2 Query 2 [N.sub.2] = { SELECT f.fname, f.loc f.fname, f.loc | FROM Forest f Forest (f) WHERE f.size < 1000 [conjunction] f.size < 1000 AND NOT EXISTS [conjunction] ([logical not] [there exists]t) (SELECT * ( FROM TREE t (Tree(t) WHERE t.fname = f.fname [conjunction] t.fname = f.fname) AND (t.planted [conjunction] (t.planted [greater than or equal to] [greater than or equal to] 1960 1960 OR NOT EXISTS [disjunction] ([logical not] [there exists]m) (SELECT * ( FROM Measurement m (Measurement (m) WHERE m.tree# = t.tree# [conjunction] m.tree# = t.tree#) AND m.height > 20))) [conjunction] m.height > 20)))} Translating Query 3 Query 3 N 3 = { SELECT s.sname, s.bcolor s.sname, s.bcolor | FROM Species s Species (s) WHERE s.maxht > 30 [conjunction] s.maxht > 30 AND NOT EXISTS [conjunction] ([logical not] [there exists]t) (SELECT * ( FROM TREE t (Tree(t) WHERE t.sname = s.sname [conjunction] t.sname = s.sname AND t.planted < 1960 [conjunction] t.planted < 1960) AND EXISTS [conjunction] ([there exists]m) (SELECT * ( FROM Measurement m (Measurement (m) WHERE t.tree# = m.tree# [conjunction] t.tree# = m.tree#) AND m.height [conjunction] m.height [less than or equal to] 15) [less than or equal to] 15))} Translating Query 4 Query 4 [N.sub.4] = { SELECT s.sname, s.maxht s.sname, s.maxht | FROM Species s Species(s) WHERE NOT EXISTS [conjunction] ([logical not][there exists]f) (SELECT * ( FROM Forest (Forest (f) WHERE f.altitude > 1500 [conjunction] f.altitude > 1500 AND f.loc = 'ALBERTA' [conjunction] f.loc = ALBERTA AND NOT EXISTS [conjunction] ([logical not] [there exists]t) (SELECT * ( FROM TREE t (Tree(t) WHERE t.fname = f.fname [conjunction] tfname = s.fname) AND t.sname = s.sname) [conjunction] t.sname = s.sname) OR NOT EXISTS [disjunction] ([logical not] [there exists]e) (SELECT * ( FROM EMPLOYEE e (Emplyee (e) WHERE e.fname = f.fname [conjunction] e.fname = f.fname) AND e.year = 1993))) [conjunction] e.year = 1993)))} Translating Query 5 Query 5 [N.sub.5] = { SELECT f.fname, f.size f.fname, f.size | FROM Forest f Forest(f) WHERE f.loc = 'ALBERTA' [conjunction] f.loc = ALBERTA AND NOT EXISTS [conjunction] ([logical not] [there exists]t) (SELECT * ( FROM TREE t ( Tree(t) WHERE t.fname = f.fname [conjunction] t.fname = f.fname) AND (t.planted [conjunction] ( t.planted [greater than or equal to] [greater than or equal to] 1960 1960 OR EXISTS [disjunction] ([there exists]m1) (SELECT * ( FROM Measurement m1 ( Measurement (m1) WHERE m1.tree# = t.tree# [conjunction] m1.tree# = t.tree# ) AND m1.height [conjunction] m1.height [less than or equal to] 20) [less than or equal to] 20 ) AND EXISTS [conjunction] ( ([there exists] m2) (SELECT * ( FROM Measurement m2 ( Measurement (m2) WHERE m2.tree# = t.tree# [conjunction] m2.tree# = t.tree# AND m1.meas# <> m2.meas# [conjunction] m1.meas# [not equal to] m2.meas# AND m2.height [conjunction] m2.height [less than or equal to] [less than or equal to] 20) ) 20) ) ) OR NOT EXISTS [disjunction] ([logical not] [there exists]m) (SELECT * ( FROM Measurement m Measurement(m) WHERE m.tree# = t.tree# [conjunction] t.tree# = m.tree# AND m.height [conjunction] m.height [less than or equal to] 20)) [less than or equal to] 20 ) ) ) }

Acknowledgements

Many thanks go to James Bradley for his comments on an earlier tutorial version of this work and for his incitement to turn the tutorial into an article. The example database is adapted from his work (Bradley, 1982). The example queries are inspired by the assignments of an undergraduate database systems course at The University of Calgary.

Note

This work was supported in part by the Natural Science and Engineering Research Council of Canada.

References

Bradley, J. (1982). File and database techniques. Austin, TX: Holt, Rinehart, and Winston.

Codd, E. (1972). Relational completeness of the data base sublanguages. In R. Rustin (Ed.), Data base systems, Courant computer science symposia series 6, (pp. 65-98). Englewood Cliffs, NJ: Prentice Hall.

Date, C. (1984). A critique of the SQL database language. ACM SIGMOD Record, 14(2), 8-54.

Date, C. (1992). Relational calculus as an aid to effective query formulation. In C. Date & H. Darwen (Eds.), Relational database writings 1989-1991. Boston: Addison-Wesley.

Date, C. (2000). An introduction to database systems (7th ed.). Boston: Addison-Wesley.

Date, C. & Darwen, H. (1993). A guide to the SQL standard (3rd ed.). Boston: Addison-Wesley.

Elmasri, R. & Navathe, S. (2000). Fundamentals of database systems (3rd ed.). San Francisco: Benjamin/Cummings.

Korth, H., Silberschatz, A., & Sudarshan, S. (1999). Database system concepts (3rd ed.). New York: McGraw-Hill.

Resiner, P. (1981). Human factor studies of database query languages: A survey and assessment. ACM Computing Surveys, 13(1), 13-31.

JALAL KAWASH

American University of Sharjah

UAE

jkawash@ausharjah.ah.edu

Printer friendly Cite/link Email Feedback | |

Title Annotation: | Structured Query Language |
---|---|

Author: | Kawash, Jalal |

Publication: | Journal of Computers in Mathematics and Science Teaching |

Geographic Code: | 1USA |

Date: | Jun 22, 2004 |

Words: | 6202 |

Previous Article: | Considering the efficacy of web-based worked examples in introductory chemistry. |

Next Article: | Working with accurate representations: the case of preconstructed dynamic geometry sketches. |

Topics: |