# Logical design for temporal databases with multiple granularities.

The purpose of good database logical design is to eliminate data redundancy and insertion and deletion anomalies. In order to achieve this objective for temporal databases, the notions of temporal types, which formalize time granularities, and temporal functional dependencies (TFDs) are introduced. A temporal type is a monotonic mapping from ticks of time (represented by positive integers) to time sets (represented by subsets of reals) and is used to capture various standard and user-defined calendars. A TFD is a proper extension of the traditional functional dependency and takes the form X [right arrow.sub.Mu] Y, meaning that there is a unique value for Y during one tick of the temporal type 1L for one particular X value. An axiomatization for TFDs is given. Because a finite set of TFDs usually implies an infinite number of TFDs, we introduce the notion of and give an axiomatization for a finite closure to effectively capture a finite set of implied TFDs that are essential to the logical design. Temporal normalization procedures with respect to TFDs are given. Specifically, temporal Boyce-Codd normal form (TBCNF) that avoids all data redundancies due to TFDs, and temporal third normal form (T3NF) that allows dependency preservation, are defined. Both normal forms are proper extensions of their traditional counterparts, BCNF and 3NF. Decomposition algorithms are presented that give lossless TBCNF decompositions and lossless, dependency-preserving, T3NF decompositions.Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Design -- normal forms

General Terms: Algorithms, Design, Theory

Additional Key Words and Phrases: Boyce-Codd normal form, granularity, normalization, temporal databases, temporal modules, temporal relations, third normal form

1. INTRODUCTION

In the database area a large body of knowledge has been developed for addressing the problem of logical design:

Given a body of data and constraints on the data to be represented in a

database, how do we decide on a suitable logical structure for these

data?

In the relational context, the problem of logical design can be restated as follows: how do we produce a database scheme (a collection of relation schemes) with certain desirable properties? Central to the design of database schemes is the idea of a data dependency which is a constraint on the allowable relations corresponding to a relation scheme. Functional dependencies (FDs) are widely used dependencies in the framework of logical design (cf. [U]llman 1988]).(1) A collection of known functional and other dependencies serves as input to the database design.

The purpose of good database design is to avoid the problems of data redundancy, potential inconsistency, and insertion and deletion anomalies. Consider a relation with attributes AcctNo, Bank-branch, and Bank-address, and FDs AcctNo [right arrow] Bank-branch and Bank-branch [right arrow] Address. Although each bank branch has a unique address (because Bank-branch [right arrow] Address holds), the repetition of the address of a bank branch with every account in that branch is redundant in the relation. If, by mistake, we update the address of a branch in one tuple, but forget to do so in another, inconsistency arises. Furthermore, because AcctNo is the primary key, we cannot insert an address for a new bank branch that does not currently have at least one account. Finally, if the last account in a bank branch is deleted, we unintentionally lose track of its address. To resolve these problems, the relation (AcctNo, Bank-branch, Address) must be decomposed into two relations: (AcctNo, Bank-branch) with the FD AcctNo [right arrow] Bank-branch, and (Bank-branch, Address) with the FD Bank-branch [right arrow] Address. In addition to being free of the redundancy-related problems cited, the decomposition has several very desirable properties: (1) it is lossless in the sense that the original relation can be reconstructed by taking the natural join of the two projections and, thus, no information is lost; (2) the decomposition preserves the FDs in the sense that the FDs associated with the schemes in the decomposition are equivalent (identical in our example) to those associated with the original scheme; and (3) the two schemes are in Boyce-Codd normal form (BCNF), the strongest normal form in terms of FDs as the only dependencies. We should note that it is not always possible to derive a lossless FD-preserving decomposition such that all its schemes are in BCNF; a lossless FD-preserving decomposition into schemes in the third normal form (3NF), which is slightly weaker than BCNF, can always be achieved (cf. [Ullman 1988]).

1.1 Temporal Dimension of Logical Design

The introduction of time adds a new dimension to the normalization problem. To illustrate, consider a temporal relation COURSES that is used to store, for each computer science course, the course number (Course#), the number of credits given for the course (credits), the name of the laboratory assistant on duty (Lab-assist), the amount of her pay (Lab-Assist-Pay), the number of students attending the given class (#Students-in-class), and the timestamp that records the day(2) on which the class is being given (Day). A typical instance of COURSES is shown in Figure 1. Suppose for the COURSE relation, the following constraints are given: (1) the attributes Day and Course# together form a candidate key; (2) FD Course# [right arrow] Credits must be satisfied; (3) the pay of a laboratory assistant does not change within a month; and (4) a laboratory assistant cannot be changed during any given week (to keep the responsibility for the homework assignments straight). Note that constraints 3 and 4 cannot be expressed as FDs involving just the attributes of the COURSES relation.

Relation COURSES contains not only nontemporal redundancy (e.g., Credits value for a course is repeated many times), but temporal redundancy as well (e.g., the same value of Lab-Assist-Pay is repeated several times during a month for Woo and Lee). Like nontemporal redundancy, temporal redundancy introduces potential inconsistency and insertion and deletion anomalies: we may update Woo's pay in one tuple, but leave it unchanged in another tuple for the same month, in which case we would have different pay for Woo during the same month, creating an inconsistency. Moreover, we cannot record a laboratory assistant for a course if we do not know the day on which the course is being given. Deletion creates a similar problem: if we delete all tuples for a particular course, we unintentionally lose track of the name and the pay of the laboratory assistant.

To remove data redundancy, if we use the FDs corresponding to constraints 1 and 2, we obtain the following BCNF decomposition for COURSES: (Course#, Credits) and (Course#, Lab-Assist, Lab-Assist-Pay, #Students-in-Class, Day). This decomposition is inadequate because the redundant values for Lab-Assist and Lab-Assist-Pay still remain, although the redundant values for Credits are eliminated.

In order to eliminate redundancy in COURSES, constraints 3 and 4 involving time granularities must be considered. Indeed, a natural decomposition for COURSES is shown in Figure 2, where schemes are in terms of different granularities. Note that a week is specified by the form "week of date" where "date" gives the first day (i.e., Sunday) of the week. It can be easily seen that each scheme in Figure 2 is free of the redundancies caused by constraints 1-4.

Several questions arise from this example:

(1) How do we accurately describe the constraints involving time? How do we derive implicit constraints? How do we define the avoidance of specific kinds of redundancy (i.e., normal forms)?

(2) How do we automatically obtain decompositions that do not have redundancy, permit reconstruction of the original data, and preserve the original constraints?

(3) What operators do we use to reconstruct the original relation from the decomposed relations (note that natural join cannot be used for this purpose)? How is the notion of lossless decomposition defined? The purpose of this article is to answer these questions.

1.2 Contributions

We introduce a general notion of a temporal type that formalizes time granularity. Intuitively, a temporal type is a mapping from ticks of time (represented by positive integers) to time sets (represented by subsets of reals). By definition, this mapping is monotonic; that is, time sets corresponding to later ticks must have greater values. Thus various time granularities (days, weeks, months, years) of different calendars can be viewed as temporal types, as well as user-defined types such as library opening hours, class meetings, and so on.

We next present our notion of temporal functional dependencies (TFDs) which requires that underlying temporal type be designated. An example of a TFD is Course# [right arrow.sub.week] Lab-Assist which states that Lab-Assist for a specific Course# cannot have different values within a week. Note that a TFD may designate any temporal type (week in the TFD), independent of the temporal type used to store the corresponding temporal relation (it could be stored, e.g., in days or in hours).

A sound and complete axiomatization for TFDs is developed. Unfortunately, unlike the case of traditional FDs, there are usually an infinite number of TFDs implied by a finite number of TFDs. To overcome this problem, we introduce the notion of a finite closure of TFDs, and develop sound and complete axiomatization to effectively compute finite closures.

The property of lossless decomposition is similar to the traditional one in that it is possible to reconstruct the original temporal relation from its projections. However, the traditional natural join operation turns out to be insufficient for this purpose (e.g., it is not clear how we should join the relations in Figure 2 to recover the original COURSES relation in Figure 1), and we need to introduce new temporal join, projection, and union operators that incorporate temporal types, which are then used in the definition of temporal lossless decomposition.

The central part of the article gives the definitions of temporal BCNF (TBCNF) and temporal 3NF (T3NF) and algorithms for achieving TBCNF and T3NF decompositions. Intuitively, a scheme R with temporal type [Mu] is in TBCNF if for every nontrivial TFD X [right arrow.sub.v] Y, (1) X is a temporal superkey of the scheme (this is analogous to the requirement for traditional BCNF), and (2) there is no temporal redundancy due to the fact that two different time ticks of [Mu] belong to the same time tick of v.

To understand the motivation behind the second condition, consider two temporal tuples t and t' that agree on X and whose time ticks in terms of [Mu] belong to the same time tick of v. Because of X [right arrow.sub.v] Y we can conclude that the Y values of t and t' must also coincide. Inasmuch as this information can be implied, temporal relations have redundancy that we would like to avoid.

Regarding decomposition algorithms, it is natural to think of the following naive approach:(3) add additional attributes, such as week and month, to the original scheme and then apply standard decomposition techniques, such as BCNF decomposition. In fact, in our COURSES example, the decomposition in Figure 2 can be obtained 4 by the traditional BCNF decomposition algorithm if constraints 3 and 4 are expressed as Fds week, Course# [right arrow] Lab-Assist and Month, Lab-Assist [right arrow] Lab-Assist-Pay. Unfortunately, this naive approach, unlike the one proposed in this article, does not always produce decompositions that are free of redundancy, as we exemplify and discuss in Section 8.

In this article, a decomposition algorithm is given that renders lossless decomposition of any temporal scheme into schemes that are in TBCNF. We also discuss the preservation of TFDs in the decomposed temporal schemes. Analogous to the traditional relational theory, the decomposition of temporal schemes into TBCNF may not preserve TFDs. Therefore we give the definition of temporal 3NF (T3NF), and present an algorithm that provides lossless, dependency-preserving, T3NF decomposition.

1.3 Related Work

There has been some work on normal forms for temporal relations, including the first temporal normal form [Segev and Shoshani 1988], the time normal form [Navathe and Ahmed 1989], and the temporal extensions of traditional normal forms [Jensen et al. 1994; Wijsen 1995].(5) The first temporal normal form states that each time-varying attribute (i.e., an attribute that may take different values over time) of an object (represented by a surrogate) can have at most one value at any tick of time. This restriction avoids set-valued domains for attributes; however, it does not help eliminate temporal data redundancy. Inasmuch as the time normal form [Navathe and Ahmed 1988] and the temporal extensions in [Jensen et al. 1994; Wijsen 1995] aim at reducing temporal data redundancy, we compare each of them with our approach in turn.

The Time Normal Form. The time normal form in Navathe and Ahmed [1989] is based on the notion of temporally dependent attributes: Intuitively, two time-varying attributes are said to be temporally dependent in Navathe and Ahmed [1989] if their values do not always change at the same time. Because temporally dependent attributes contribute to redundancy in a temporal relation, the time normal form specifies that a temporal relation should not contain "temporally dependent" attributes. Although our work can be viewed as having the same aim, there are significant differences. First, the notion of temporal dependency in Navathe and Ahmed [1989] is based on observing how two time-varying attributes are synchronized, whereas our notion of TFDs naturally extends the traditional notion of FDs, taking into account multiple granularities. Second, Navathe and Ahmed [1989] require that temporal relations must have time-invariant keys, and we do not. Third, Navathe and Ahmed [1989] do not consider issues related to dependency preservation, and we do. Specifically, we introduce T3NF decompositions that always preserve TFDs. Finally, the dependencies considered in each work are subtly different and, therefore, forms of data redundancy and decomposition strategies are different as well.

To elaborate on the aforementioned difference, consider a temporal relation TA-schedule with attributes Course#, TA, and Office and a timestamp which is in days. Semantically, each tuple of TA- Schedule gives a course number and the name of the teaching assistant, with the location of his or her office, who is to be on duty on a particular day for a course. We know that (1) for a specific day a single TA with the corresponding office is associated with a certain course number (i.e., the FD Course#, Day [right arrow] TA, Office holds), (2) the office of a TA cannot be changed within a semester (i.e., the TFD TA [right arrow.sub.semester] Office holds, which implies that the FD TA, Day [right arrow] Office holds), (3) a TA assists for at most one class per day (i.e., the FD TA, Day [right arrow] Course# holds). The relation is in snapshot BCNF; however, it would be decomposed by both approaches. The reason for the decomposition under the approach of Navathe and Ahmed [1989] is that the attributes TA and Office are temporally dependent, as can be observed in the following instance.

Course# TA Office Day CS15 Lee R401 4/3/95 CS50 Woo R401 4/4/95 CS50 Lee R401 4/5/95 CS80 Woo R401 4/11/95

Indeed, (i) the second and third tuple belong to two contiguous ticks, (ii) they have the same temporal key (Course#) value, and (iii) whereas the Office value remains the same, the TA value changes. Hence, by Navathe and Ahmed [1989], the relation is decomposed into (Course#, TA) and (Course#, Office) both with timestamps in (intervals of) days. The tuples in the first relation are obtained simply by dropping the office attribute, whereas three tuples are in the second relation, obtained by dropping the TA attribute and collapsing the two tuples with the same values into <CS50, R401> with timestamp [4/4/95, 4/5/95].

The reason for the decomposition in our approach is the presence of the TFD in terms of the granularity semester. Under our approach, the relation is decomposed into (Course#, TA) with timestamps in days, and (TA, Office) with timestamps in semesters. The tuples in the first relation are the same as in the approach of Navathe and Ahmed [1989] as shown, whereas the second relation consists of <Woo, R401> and <Lee, R401> both with the timestamp "1st-sem-95". Observe that at most one single tuple will appear in the second relation for each combination of TA and semester.

Temporal Extensions of Traditional Dependencies and Normal Forms.

The work in Jensen et al. [1994] represents another aspect of reducing data redundancy in temporal information, namely, the data redundancy in the snapshots of temporal relations. Conceptually, Jensen et al. [1994] view each temporal relation as a collection of snapshot relations. Because each snapshot is a relation in the conventional (nontemporal) relational model, the usual notions of FDs, multivalued dependencies, and normal forms are applied. Considering the temporal relation, this results in defining a notion of temporal functional dependency that is a special case of the one proposed in this article. Namely, it is equivalent to a TFD that expresses a constraint in terms of the same temporal type used to store the temporal relation. Our work can be seen as an extension of Jensen et al. [1994] in that we consider not only the data redundancy in snapshots, but also the data redundancy across snapshots. For example, according to Jensen et al. [1994] a temporal relation satisfies a temporal functional dependency X [T over right arrow] Y (this is the notation used) if each snapshot satisfies X [right arrow] Y. Likewise, a temporal relation is in temporal Boyce-Codd normal form if each snapshot is in BCNF. Returning to the COURSES relation in Figure 1, we can observe that it is not in temporal Boyce-Codd normal form according to Jensen et al. [1994] because of the temporal functional dependency (Lab-Assist [T over right arrow] Lab-Assist-Pay) that is equivalent in our notation to (Lab-Assist [right arrow.sub.day] Lab-Assist-Pay),(6) because day is the granularity used to store the temporal relation. The resulting decomposition would give the two relations (Lab-Assist, Lab-Assist-Pay) and (Course#, Credits, Lab-Assist, #Students-in-Class), both timestamped in days. It is easy to see that these two relations are in temporal Boyce-Codd normal form according to Jensen et al. [1994], in spite of the evident redundancy and other related anomalies across snapshots.

Wijsen [1995] elaborates on the notion of temporal functional dependency proposed in Jensen et al. [1994] identifying two other kinds of dependencies and proposing a new definition of temporal third normal form. In addition to the snapshot dependencies described, Wijsen [1995] considers dependencies that must be satisfied regardless of time. For example, XGY is satisfied by a temporal relation if any two tuples having the same value for X also have the same value for Y regardless of the time associated with the two tuples. The third kind of dependency proposed in Wijsen [1995] considers temporally adjacent snapshots: the dependency XNY is satisfied by a temporal relation if any two tuples in adjacent ticks of time having the same value for X also have the same value for Y. In Wijsen [1995] the three kinds of dependencies are called respectively FD, TFD (temporal functional dependency), and DFD (dynamic functional dependency). These dependencies can express practically useful constraints on temporal relations; however, they are only a very limited subset of the ones expressible in our framework. We already pointed out that snapshot dependencies are just a particular case of our temporal functional dependencies. It is also easy to show that we can express TFDs and DFDs of Wijsen [1995]. Any dependency XGY can simply be translated into our TFD X [right arrow.sub.Top] Y, where Top identifies the temporal type covering the whole time line in a single tick. Any dependency XNY can be translated into the following pair of our TFDs: X [right arrow.sub.Mu.sub.1] Y and X [right arrow.sub.Mu.sub.2] Y where each tick of [Mu.sub.1] and [Mu.sub.2] is the union of two consequent ticks of [Mu], the temporal type of the considered temporal schema. The only difference between [Mu.sub.1] and [Mu.sub.2] is that [Mu.sub.1] starts with the first tick of [Mu] ([Mu.sub.1](1) = [Mu](1) [union] [Mu](2)), whereas [Mu.sub.2] starts with the second tick of [Mu] ([Mu.sub.2](1) = [Mu](2) [union] [Mu](3)). This is necessary to enforce the dependency between X and Y on any two adjacent ticks of [Mu]. It is also interesting to observe that the sound and complete axiomatization for logical implication of temporal functional dependencies given in Wijsen [1995] is subsumed by ours using the translation previously given.

Based on the three kinds of dependencies, a new notion of temporal third normal form is provided in Wijsen [1995]. The abstract data model adopted in Wijsen [1995] is quite different from ours because it only considers a single granularity. In contrast, our model is based on multiple time granularities. This motivates the difference between our notion of T3NF and associated decomposition strategy and those proposed in Wijsen [1995]. An interesting consequence is that our notion and algorithm always guarantee the existence of a lossless T3NF decomposition preserving all the dependencies, whereas the same cannot be obtained for the notion proposed in Wijsen [1995].

1.4 Organization of the Article

The rest of the article is organized as follows. In Section 2, temporal types are defined and temporal modules are reviewed. In Section 3, we introduce TFDs and develop a sound and complete axiomatization. Temporal normalization is discussed in Section 4, in which lossless decompositions and related notions are introduced. TBCNF and its decomposition algorithm is presented in Section 5. Dependency-preserving decompositions are studied in Section 6. T3NF and its decomposition algorithm are presented in Section 7. In Section 8 we discuss in more detail the motivation for the decomposition algorithms and we analyze their computational complexity. Section 9 concludes the article. Finally, the appendix contains the proofs of the results.

2. TEMPORAL TYPES AND MODULES

2.1 Temporal Types

Before we can incorporate multiple temporal types into the logical design of temporal databases, we need to formalize the notion of a temporal type. In the following we denote by R the set of all real numbers.

We assume that there is an underlying notion of absolute time, represented by the set of the real numbers.(7)

Definition (Temporal Type). A temporal type is a mapping [mu] from the set of the positive integers (the time ticks) to [2.sup.R] (the set of absolute time sets) such that for all positive integers i and j with i [is less than] j, the following conditions are satisfied.

(1) [Mu](i) [is not equal to] 0 and [Mu](j) [is not equal to] 0 imply that each real number in [Mu](i) is less than all real numbers in [Mu](j), and (2) [Mu](i) = 0 implies [Mu](j) = 0.

Property (1) states that the mapping must be monotonic. Property (2) disallows an empty set to be the value of a mapping for a certain time tick unless the empty set will be the value of the mapping for all subsequent time ticks.

Intuitive temporal types (e.g., day, month, week, and year) satisfy the preceding definition. For example, we can define a special temporal type year starting from year 1800 as follows: year(1) is the set of absolute time set (an interval of reals) corresponding to the year 1800, year(2) is the set of absolute time set corresponding to the year 1801, and so on. Note that the sets in the type year are consecutive intervals; however, this does not have to be the case for all types. Leap years, which are not consecutive intervals, also constitute a temporal type. If we take 1892 as the first leap year, then leap-year(2) corresponds to 1896, leap-year(3) corresponds to 1904(8) leap-year(4) corresponds to 1908, and so on. We can also represent a finite collection of "ticks" as a temporal type as well. For example, to specify the year 1993, we can use the temporal type T such that T(1) is the set of absolute time corresponding to the year 1993, and T(i) = 0 for each i [is greater than] 1.

The definition also allows temporal types in which ticks are mapped to more than a single interval. This is a generalization of most previous definitions of temporal types. As an example, consider the type business-month, where every business month is a union of all business days in a month (i.e., excluding all Saturdays and Sundays). In this case, more than one single interval is in one tick.

In a realistic system, we should distinguish names of temporal types, such as day, employed by users or system designers from the mapping they denote. It is possible that in a system different names can be used for the same mapping. For example, day and giorno (which is the Italian word for day) denote the same mapping. However, for simplicity, in this article we assume that different symbols used for temporal types denote different mappings.

It is important to emphasize that a real system can only treat infinite types that have finite representations. Various periodical descriptions, for example, Kabanza et al. [1990] and Niezette and Stevenne [1992], are possible but outside the scope of this article.

There is a natural relation among temporal types as given in the following.

Definition. Let [Mu.sub.1] and [Mu.sub.2] be temporal types. Then [Mu.sub.1] is said to be finer than [Mu.sub.2], denoted [Mu.sub.1] [is less than or equal to] [Mu.sub.2], f for each i, there exists j such that [Mu.sub.1](i) [subset or equal to] [Mu.sub.2](j).

This "finer-than" relation is essential for temporal FDs. In the COURSES relation, because a lab assistant does not change in a week, it does not change in any day during the week. (Note that day is finer than week.) In fact, the lab assistant will not change in terms of any temporal type that is finer than week. We discuss this issue further in Section 3. We only want to note here that there is an infinite number of temporal types that are finer than week.

In the rest of the paper conditions like [Mu.sub.1](i) [subset or equal to] [Mu.sub.2](j) are often expressed as "tick j of [Mu.sub.2] covers tick i of [Mu.sub.1]." The notation [Mu] [is less than] v is used for a strictly finer than relation: [Mu] [is less than] v if [Mu] [is less than or equal to] v and [Mu] [is not equal to] v.

Consider now the properties of the finer-than relation. By definition, [Mu] [is less than or equal] [Mu.sub.2] for each temporal type [Mu]. Also, if [Mu.sub.1] [is less than or equal to] [Mu.sub.2] and [Mu.sub.2] [is less than or equal to] [Mu.sub.1], then [Mu.sub.1] = [Mu.sub.2]. Furthermore, the finer-than relation is obviously transitive. Thus [is less than or equal to] is a partial order. The relation [is less than or equal to] is not a total order because, for example, week and month are incomparable (i.e., week is not finer than month, and month is not finer than week). There exists a unique least upper bound of the set of all temporal types denoted by [Mu.sub.Top], and a unique greatest lower bound, denoted by [Mu.sub.Bottom]. These top and bottom elements are defined as follows: [Mu.sub.Top](1) = R and [Mu.sub.Top](i) = 0 for each i [is greater than] 1, and [Mu.sub.Bottom](i) = 0 for each positive integer i. Moreover, for each pair of temporal types [Mu.sub.1], [Mu.sub.2], there exist a unique least upper bound lub([Mu.sub.1], [Mu.sub.2]) and a unique greatest lower bound glb([Mu.sub.1], [Mu.sub.2]) of the two types, with respect to [is less than or equal to]. That is, the set of all temporal types forms a lattice with respect to [is less than or equal].

Proposition 1. The set of all temporal types is a lattice with respect to

the finer-than relation.

2.2 Temporal Modules

Our discussion of logical design for temporal databases is in terms of temporal modules that were introduced in Wang et al. [1995] to provide a unified interface for accessing different underlying temporal information systems. Thus the temporal module concept is rather general; the concepts and the results of this article are readily translated in terms of other temporal data models. Temporal modules defined in this article are simplified but equivalent versions of extended temporal modules of Wang et al. [1995], explained in the following.

We assume there is an infinite set of attributes. For each attribute A, there exists an infinite set of values called domain of A, denoted dom(A). Each finite set R of attributes is called a relation scheme. A relation scheme R = {[A.sub.1], . . . , [A.sub.n]} is usually written as <[A.sub.1], . . . , [A.sub.n]>. For relation scheme R, let Tup(R) denote the set of all mappings t, called tuples, from R to [union.sub.A[element of]R]dom(A) such that for each A in R, t(A), the value of t for attribute A, is in dom(A). A tuple t of relation scheme <[A.sub.1], . . . , [A.sub.n]> is usually written as <[a.sub.1], . . . , [a.sub.n]>, where [a.sub.i] is in dom([A.sub.i]) for each 1 [is less than or equal to] i [is less than or equal to] n, whereas its projection on a subset X of the attributes in the relation scheme is denoted by t[X]. We are now ready to define temporal module schemes and temporal modules.

Definition (Temporal Module Scheme and Temporal Module). A temporal module scheme is a pair (R, [Mu]), where R is a relation scheme and [Mu] a temporal type. A temporal module is a triple (R, [Mu], [Phi]), where (R, [Mu]) is a temporal module scheme and [Phi] is a function, called a time windowing function from N (i.e., the positive integers) to [2.sup.Tup(R)] such that, for each i [is greater than or equal to] 1, [Phi](i) = 0 if [Mu](i) = 0.

Intuitively, the time windowing function [Phi] in a temporal module (R, [Mu], [Phi]) gives the tuples (facts) that hold at time tick i in temporal type [Mu]. This is a generalization of many temporal models that have appeared in the literature. Note that, for practical reasons, we could add the restriction that [Phi](i) be finite for each i [is greater than or equal to] 1. The results of this article continue to hold with this restriction.

The temporal module (R, [Mu], [Phi]) is said to be on (R, [Mu]) and to be in terms of [Mu]. Temporal modules are also denoted by symbol M, possibly subscripted. For each temporal module M, we denote its relation scheme, temporal type, and windowing function by [R.sub.M], [Mu.sub.M], and [Phi.sub.M], respectively.

Example 1. We view the temporal relation COURSES given in the Introduction as a temporal module with (COURSES, day), where COURSES = <Course#, Credits, Lab-Assist, Lab-Assist-Pay, #Students-in-Class>, as its scheme. The relation in Figure 1 corresponds to the time windowing function [Phi] defined(9) as follows.

[Phi](3/3/93) = {<CS50, 3, Woo, 1K, 50>} [Phi](3/8/93) = {<CS50, 3, Woo, 1K, 45>} [Phi](4/5/93) = {<CS50, 3, Woo, 1.5K, 50>} [Phi](10/3/94) = {<CS50, 3, Lee, 1K, 50>} [Phi](10/7/94) = {<CS50, 3, Lee, 1K, 50>} [Phi](10/17/94) = {<CS50, 3, Lee, 1K, 45>}

Two equivalent query languages, a calculus [Wang et al. 1995] and an algebra [Wang 1995], have been proposed to frame queries on temporal modules.

3. TEMPORAL FUNCTIONAL DEPENDENCIES

Relations in relational databases are traditionally used to store "static" information, or only the "current" information. Let R = <[A.sub.1], . . . , [A.sub.n]> be a relation scheme, and X and Y subsets of {[A.sub.1], . . . , [A.sub.n]}. In the traditional dependency theory, X functionally determines Y, denoted X [right arrow] Y, if for each relation r for R and each pair of tuples [t.sub.1], [t.sub.2] in r, [t.sub.1][X] = [t.sub.2][X] implies [t.sub.1][Y] = [t.sub.2][Y]. As an example, consider a relation scheme, called FACULTY, that records for each faculty member the social security number (SSn), name (Name), rank (Rank), and department (Dept). FD SSn [right arrow] Rank states that each faculty member's rank is unique, even though he or she may serve in more than one department.

In a temporal database, information becomes dynamic. An FD that is valid in the "current" relation may no longer be valid in the corresponding temporal relation if the traditional definition of FDs is used without change. This would be the case if FACULTY were a temporal relation because it is likely that FACULTY would contain two tuples, one stating that a particular faculty is an Assistant Professor at one time, but an Associate Professor at a different time. We extend the traditional notion of FDs that will not only permit these possibilities, but also enable us to model additional constraints such as "a faculty member's rank does not change during an academic year."

Definition (Temporal Functional Dependency). Let X and Y be (finite) sets of attributes and [Mu] a temporal type such that [Mu](i) [not equal to] [Phi] for some i. Then X [[right arrow].sub.[Mu]] Y is called a temporal functional dependency (TFD).

Intuitively, a TFD X [[right arrow].sub.[Mu]] Y states that whenever two tuples, holding on time ticks covered by the same time tick in [Mu], agree on X, they must also agree on Y. Thus the TFD SSn [[right arrow].sub.academic-year] Rank, where academic-year is the temporal type of academic years, expresses the fact that a faculty's rank cannot change during an academic year. We now formally define the preceding notion of satisfaction. In order to simplify the notation throughout this article, we use [Mu](i.sub.1],...,[i.sub.k]) to denote [[Union].sub.1[is less than or equal to] j [is less than or equal to] k] [Mu](i.sub.j]).

Definition (Satisfaction of TFD). A TFD [[right arrow].sub.v] Y is satisfied by a temporal module M = (R, [Mu], [Phi]) if for all tuples [t.sub.1] and [t.sub.2] and positive integers [i.sub.1] and [i.sub.2], the following conditions imply [t.sub.1][Y] = [t.sub.2][Y].

(a) [t.sub.1][X] = [t.sub.2][X], (b) [t.sub.1] is in [Phi]([i.sub.1]) and [t.sub.2] is in [Phi]([i.sub.2]), and (c) there exists j such that [Mu]([i.sub.1], [i.sub.2]) [subset or equal to] v(j).

For example, the temporal module in Example 1 satisfies the TFD Course# [[right arrow].sub week] Lab-Assist. The temporal module also satisfies Course# [[right arrow].sub.day] #Students-in-Class. However, the temporal module does not satisfy Course# [[right arrow].sub.month] #Students-in-Class.

The definition implies that the TFD X [[right arrow].sub.v] Y is always satisfied by the temporal module (R, [Mu], [Phi]) if [Mu] does not have two different ticks that are covered by a single tick of v [because condition (c) in the definition will not be satisfied].

Let (R, [Mu]) be a temporal module scheme with a set F of TFDs involving only attributes in R. We only consider temporal modules on (R, [Mu]) that satisfy each TFD in F. Thus the set F determines the set of allowed modules.

Example 2. The temporal module reported in Example 1 satisfies the TFD

Course# [[right arrow].sub.day] Credits, Lab-Assist, Lab-Assist-Pay, #Students-in-Class.

In this case both the temporal module and the TFD are defined in terms of the same temporal type (day). However, the same module satisfies the TFD Course# [[right arrow].sub.week] Lab-Assist that is defined in terms of a different temporal type. The notion of temporal FDs introduced in Jensen et al. [1994] allows only TFDs in terms of the same temporal type of the module.

Our notion of TFDs can also be used to express FDs that always hold, independent of time. For example, we can express the FD "each person has only one biological father (regardless of time)" by using the TFD Name [[right arrow].sub.[Mu]Top] B_Father. This TFD says that whenever two facts [t.sub.1] and [t.sub.2] with [t.sub.1[X] = [t.sub.2][X] are valid in one temporal module, then [t.sub.1][Y] = [t.sub.2][Y], independent of the temporal type.

3.1 Inference Axioms for TFDs

As in the case of traditional FDs, inference axioms to derive all TFDs that logically follow from a set of TFDs are important. The following inference axioms include not only the temporal analogues of Armstrong's axioms [Ullman 19881, but axioms that also reflect the relationships among temporal types. In Example 2, because a course does not have more than one lab assistant in a week, it won't have more than one lab assistant in a day. In fact, it won't have more than one lab assistant in any tick of a temporal type finer than a week. This is captured by the inference rule: if X [[right arrow].sub.[Mu] Y and v [is less than or equal to] [Mu], then X [[right arrow].sub.v] Y.

An intuitive conjecture would be that the preceding rule along with the temporal analogues of Armstrong's axioms(10) will constitute a complete axiomatization. This turns out to be false. Consider, for example, two TFDs X [[right arrow].sub.y1], Y and X [[right arrow].sub.y2] Y, where y] is the temporal type corresponding to years before 1990, and [y.sub.2] is the temporal type corresponding to years 1990 and beyond. Taken together, these two TFDs say that X [[right arrow].sub.year], Y. However, year is not finer than either [y.sub.1] or [y.sub.2].

In order to capture such inferences, we define the notion that a temporal type is collectively finer than a set of temporal types. The temporal type year will be collectively finer than the set of types {[y.sub.1], [y.sub.2]} because each tick of the type year is covered by (i.e., contained in) a tick of either [y.sub.1] or [y.sub.2]. This is shown formally in the following.

Definition. We say that a temporal type v is collectively finer than a set {[[Mu].sub.1],..., [[Mu].sub.n]} of temporal types, denoted v [is less than or equal to] {[[Mu].sub.1],..., [[Mu].sub.n], if for each positive integer i, there exist 1 [is less than or equal to] k [is less than or equal to] n and a positive integer j such that v(i) [subset or equal to] [[Mu].sub.k](j).

Note that [Mu] [is less than or equal to] [[Mu].sub.1] implies [Mu] [[is less than or equal to].sub.c] {[Mu].sub.1], [[Mu].sub.2]} for any [[Mu].sub.2]. Inference axioms for TFDs are given next.

Four Inference Axioms for TFDs.

(1) Reflexivity: If [subset or equal to] X, then X [[right arrow].sub.[Mu] Y for each temporal type [Mu]. (2) Augmentation: If X [[right arrow].sub.[Mu]] Y, then XZ [[right arrow].sub.[Mu]] Z. (3) Transitivity: If X [[right arrow].sub.[Mu]] Y, and Y [[right arrow].sub.[Mu]] Z, then X [[right arrow].sub.[Mu] Z. (4) Descendability: If X [[right arrow].sub.[Mu].sub.1] Y ,..., X [[right arrow].sub.[Mu].sub.n] Y with n [is greater than or equal to] 1, then X [[right arrow].sub.[Mu] Y for each [Mu] with [Mu] [is less than or equal to] c {[[Mu].sub.1],..., [[Mu].sub.n]}

The first three axioms are temporal analogues of the Armstrong's axioms. The descendability axiom states that if one or more TFDs (with the same left- and right-hand sides X and Y) in terms of different types are satisfied by a temporal module M, then a TFD (with the same X and Y) in terms of any temporal type that is collectively finer than the set of these temporal types is also satisfied by M. In particular, if we know that X [[right arrow].sub.v] Y is satisfied by M, descendability ensures that for each [Mu] with [Mu] finer than v, X [[right arrow].sub.[Mu]], Y is satisfied by M. This makes intuitive sense: if the rank of a faculty cannot change within an academic year, it cannot change within a month or a day of an academic year.

Let F be a finite set of TFDs. The notion of derivation of TFDs is analogous to the one for traditional FDs, and is given formally in the following.

Definition. The TFD X [[right arrow].sub.[Mu]] Y is derived from F, denoted F X [[right arrow].sub.[Mu]] Y, if there exists a proof sequence [f.sub.1], . . . , [f.sub.k] such that (i) [f.sub.k] is X [[right arrow].sub.[Mu]] Y and (ii) each [f.sub.i] is a TFD either in F or obtained by using one of the four axioms on TFDs [f.sub.1], . . . , [f.sub.i-1].

The notion of logical implication of TFD is also standard. Formally, this is as follows.

Definition. The TFD X [[right arrow].sub.[Mu]] Y is logically implied by F, denoted ?? X [[right arrow].sub.[Mu]] Y, if every temporal module M that satisfies each TFD in F, also satisfies X [[right arrow].sub.[Mu]] Y.

In the following we establish the fact that the four axioms are sound (i.e., they can be used to derive only logically implied TFDs) and complete (i.e., they can be used to derive all logically implied TFDs). First we have the soundness result.

Lemma 1. The four inference axioms are sound.

By using the preceding four inference axioms, we may derive other inference rules. For example, given X [[right arrow].sub.[Mu]] Y and Y [[right arrow].[Mu].sub.2]] Z, we may derive X [[right arrow].sub.glb([[Mu].sub.1], [[Mu].sub.2]]) Z. (We call this rule the extended transitivity axiom.) Indeed, because glb([[Mu].sub.1], [[Mu].sub.2]]) [is less than or equal to] [[Mu].sub.1]] and glb([[Mu].sub.1], [[Mu].sub.2]]) [is less than or equal to] [[Mu].sub.2], X [[right arrow].sub.glb.sub.([[Mu].sub.1], [[Mu].sub.2]]) Y and Y [[right arrow].sub.glb.sub.([[Mu].sub.1], [[Mu].sub.2]]) Z by descendability. By transitivity, X [[right arrow].sub.glb.([[Mu].sub.1], [Mu].sub.2]]) Z. Another rule we may derive is as follows: given X [[right arrow].sub.[Mu].sub.1]] Y and X [[right arrow].sub.[[Mu].sub.2]] Z, we have X [[right arrow].glb.([[Mu].sub.1], [[Mu].sub.2]]) YZ. (We call this the union axiom.) To see this, we use augmentation to get X [[right arrow].sub.[[Mu].sub.1]] XY and XY [[right arrow].sub.[Mu].sub.2]] YZ from X [[right arrow].sub.[[Mu].sub.1]] Y and X [[right arrow].sub.2] Z, respectively. Then by the preceding extended transitivity, we have X [[right arrow].sub.glb.sub.([[Mu].sub.1], [[Mu].sub.2]]) YZ. In summary, we have the following two additional inference axioms.

Additional Inference Axioms for TFDs.

(1) Extended Transitivity: If X [[right arrow].sub.[[Mu].sub.1]] Y and Y [[right arrow].sub.[Mu].sub.2] Z, then X [[right arrow].sub.glb.sub.([[Mu].sub.1, [[Mu].sub.2]) Z. (2) Union: If X [[right arrow].sub.1] Y and X [[right arrow].sub.[[Mu.sub.2]] Z, then X [[right arrow].sub.glb.sub.([[Mu.sub.1], [[Mu.sub.2]]) YZ.

Unlike the case of traditional FDs, the application of the TFD inference axioms on a finite set F of TFDs may lead to an infinite number of temporal functional dependencies. This is due to reflexivity and descendability axioms. It is obvious that the reflexivity axiom gives an infinite number of TFDs. However, these TFDs are trivial ones. The more serious problem is the descendability axiom. The reason why the descendability axiom gives an infinite number of TFDs is that given a type (or a set of types), there may be an infinite number of types that are (collectively) finer than the given type (or set of types). Consider, for example, course# [[right arrow].sub.week] Lab-Assist. Let [week.sub.i] be the temporal type that covers only the ith week; that is, [week.sub.i](1) maps to the absolute time of the week i, and [week.sub.i](j) = [Phi] for all j [greater than] 1. Then clearly, [week.sub.1] [is less than or equal to] week and hence [week.sub.i] [[is less than or equal to].sub.C] {week} for all [is greater than or equal to] 1. Therefore, by the descendability axiom, we have course# [[right arrow].sub.week.sub.1] Lab-Assist for all i [is greater than or equal to] 1. There are an infinite number of TFDs.

To overcome the problem of an infinite number of logically implied TFDs, we ask the following questions: does there exist a finite set F' of TFDs which has the property that every TFD logically implied by F can be derived from F' by just one application of the descendability axiom? Can the set F' be effectively computed? We answer both questions positively by developing three finite inference axioms described in the following.

Three Finite Inference Axioms for TFDs.

(1) Restricted Reflexivity: If Y [subset or equal to] X, then X [[right arrow].sub.[[Mu].sub.top]] Y. (2) Augmentation: If X [[right arrow].sub.[Mu]] Y, then XZ [[right arrow].sub.[Mu]] YZ. (3) Extended Transitivity: If X [[right arrow].sub.[[Mu].sub.1] Y and Y [[right arrow].sub.[[Mu].sub.2]] Z, then X [[right arrow].sub.glb.([[Mu].sub.1], [[Mu].sub.2]]) Z.

If a TFD X [[right arrow].sub.[Mu]] Y is derived by using the three finite inference axioms from a set F of TFDs, we say that F finitely derives X [[right arrow].sub.[Mu]] Y, denoted F [??.sub.f] X [[right arrow].sub.[Mu]] Y. It is easily seen that if F [??.sub.f] X [[right arrow].sub.[Mu]] Y, then [Mu] is the glb of some temporal types appearing in F. Because F is finite, we know that there are only a finite number of TFDs that are derived from F by using the three finite inference axioms. We call the set of TFDs that are derived from F by these Finite Inference Axioms as the "finite closure" of F. This is defined formally in the following.

Definition (Finite Closure). Let F be a set of TFDs. The finite closure of F, denoted [F.sup.+], is the set of all the TFDs derivable from F by the Three Finite Inference Axioms. More formally, [F.sup.+] = {X [[right arrow].sub.[Mu]] Y / F [??.sub.f] X [[right arrow].sub.[Mu]] Y}.

Consider, for example, F = {A [[right arrow].sub.[Mu]] B, A [[right arrow].sub.v] B}. By the augmentation axiom, we obtain A [[right arrow].sub.[Mu]] AB and AB [[right arrow].sub.v] B from A [[right arrow].sub.[Mu]] B and A [[right arrow].sub.v] B, respectively. Then, by extended transitivity, we have A [[right arrow].sub.glb.sub.([Mu],v]]) B. Thus A [[right arrow].sub.glb.sub.([Mu], v]]) B is in [F.sup.+].

The three finite axioms are sound because each of the three axioms is derived from the Four Inference Axioms and, by Lemma 1, the Four Inference Axioms are sound. We show that the Three Finite Axioms are complete up to descendability; that is, if F ?? X [right arrow] [Mu] Y, then there exist X

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Theorem 1. The three finite axioms are sound, and complete up to descendability.

Hence, although there may be an infinite number of TFDs that are logically implied by a finite set F of TFDS, the only source of infiniteness is that there may be infinite number of temporal types that are collectively finer than several temporal types that appear in the finite closure of F.

Before showing the completeness, we give the following auxiliary operation that is used throughout the article. For each set F of TFDS and a set [Laplace operational symbol] of real numbers, that is, absolute time, let [Pi] (F) be the set of nontemporal functional dependencies that have "effects" on the time in [Laplace operational symbol]. Formally, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Clearly, [pi][phi] (F) gives the "nontemporal version" of all the TFDs in F.

To show completeness up to descendability, we need the following two lemmas. The first lemma formalizes an important relationship between temporal and corresponding nontemporal functional dependencies.

Lemma 2. Let F be a set of TFDs and X [right arrow] [Mu] Y a TFD such that F ?? X [right arrow] [Mu] Y. Then for all i such that [Mu] (i) [not equal to] 0, we have [Pi.sub.Mu(i) (F) ?? X [right arrow] Y.

We now introduce the following notion. We call a set G of TFDS a support for X [right arrow] Y if [Pi.sub.0] (G) ?? X [right arrow] Y. We say that it is minimal if no proper subset of G is a support for X [right arrow] Y. The second lemma states two properties that relate the temporal types of a TFD to its minimal supports. We use the following notation. For each set G of TFDs, define glb(G) to be the temporal type glb({v | V [right arrow] W [element of] G}). Note that glb(G) = [Mu.sub.Top] if G = 0 (see Davey and Priestley [1990]).

Lemma 3. Let F be a set of TFDs. (a) If [F.sub.1] [subset or equal to] F is a minimal support for the FD X [right arrow] Y, then [MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII], . . . , [F.sub.n] with [F.sub.i] [subset or equal to] F for i = 1, . . . , n are all the minimal supports for X [right arrow] Y and F ?? X [[right arrow].sub.Mu] Y, then [Mu] [[less than or equal to].sub.c] {[u.sub.1], . . . , [Mu.sub.n]} where [Mu.sub.i] = glb([F.sub.i]) for i = 1, . . . ., n.

The completeness up to descendability of the Three Finite Axioms (in Theorem 1) now follows easily.

Proof. Suppose F ?? X [right arrow] [Mu] Y. By the definition of TFD, there exists i such that [Mu](i) ?? 0. By Lemma 2, we have [Pi.sub.Mu(i) (F) ?? X [right arrow] Y. Because [Pi.sub.Mu(i)(F) [subset or equal to] [Pi.sub.0], we have [Pi.sub.0](F) ?? X [right arrow] Y. Because [Pi.sub.0](F) ?? X [right arrow] Y, there exists at least one minimal support for X [right arrow] Y. Let [F.sub.1], . . . , [F.sub.n] be all the minimal supports for X [right arrow] Y. For each 1 [less than or equal to] i [less than or equal to] n, let [Mu.sub.i] = glb([Fi.sub.i]). By Lemma 3, F [??.sub.f] X [right arrow.sub.Mu.i] Y for each 1 [is less than or equal to] .i [is less than or equal to] n. Y [element of] [F.sup.+] for each 1 [is less than or equal to] i [is less than or equal to]. However, from the same lemma we know that [Mu] [?.sub.c] {[Mu.sub.1], ..., [Mu.sub.n]}. Hence we have shown that if F ?? X [right arrow.sub.Mu] Y, then there exists a set {X [right arrow Mu1] Y, . . . . , X [right arrow.sub.Mum] such that [Mu] [??.sub.c] {[Mu.sub.1], . . . , ; that is, the Three Finite Axioms are complete up to descendability.

We now show how Theorem 1 implies the completeness of the Four Inference Axioms.

Theorem 2. The Four Inference Axioms for TFDs are both sound and complete.

Proof. Soundness is provided by Lemma 1. For completeness, let F be a set of TFDs and X [right arrow.sub.Mu] Y a TFD logically implied by F. By Theorem 1 we know that there exist TFDS X [right arrow.sub.Mui] Y, . . . . , X [right arrow.sub.Muh] such that [Mu] [??.sub.c] {[Mu.sub.1], . . . , [Mu.sub.k]}. By the definition of [F.sup.+] we know that, for each 1 [is less than or equal to] i [is less than or equal to] k, F [??.sub.f] X [right arrow.sub.Mui] and hence F ?? X [right arrow.sub.Mui]. Applying the descendability axiom we obtain F ?? X [right arrow.sub.Mu] Y, which concludes the completeness proof.

3.2 Closure of Attributes

As in the traditional relational dependency theory, we wish to give a test to verify if a TFD of the form X [right arrow.sub.Mu] B is implied from a set of TFDs. For this purpose we introduce the (temporal) notion of finite closures of attributes and give an algorithm to compute the finite closures. First we have the definition.

Definition Finite Closure of Attributes). Let F be a finite set of TFDS. For each finite set X of attributes, the finite closure of X with respect to F is defined as

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

and there is no X [right arrow.sub.v] B in [F.sup.+] such that [Mu] [is less than] v}.

Note that [X.sup.+] for each X is a finite set, because [F.sup.+] is finite.

Proposition 2. Let F be a finite set of TFDsand X a finite set of attributes. The following holds.

F ?? X [right arrow] [sub.[Mu]B] iff there exists {(B, [Mu.sub.1], . . . , (B, [Mu.sub.m])}

[subset] [X.sup.+] such that [Mu] [is less than or equal to] c { [Mu.sub.1], . . . , [Mu.sub.m]}.

From the previous proposition, we know that if we have an effective procedure for obtaining the finite closure for X and an effective procedure for testing the [is less than or equal to] c relation, we can then effectively decide whether a TFD X [right arrow] [sub.[Mu]B] is logically implied by F even if we may have an infinite number of logically implied TFDs. In Figure 3, we provide an algorithm for [X.sup.+].

Theorem 3. The algorithm in Figure 3 correctly computes [X.sup.+] in a finite number of steps.

Fig. 3. Algorithm for computing [X.sup.+].

Algorithm for Computing x +

INPUT: A finite set of attributes U, a set of temporal functional

dependencies F involving only attributes in U, and a set X

[subset] U. OUTPUT: [X.sup.+], the finite closure of X with respect of F. METHOD: We compute a sequence of sets [X.sup.(0)],

[X.sup.(1)],... whose elements are (attribute,

temporal-type) pairs.

1. Let [X.sup.(0)] = {(A, [Mu.sub.Top]) | A [element of]

X}.

2. For each TFD [A.sub.1]...[A.sub.k] [right arrow]

[sub.[Mu]B.sub.1]...[B.sub.m] in F such that

{([A.sub.1], [Mu.sub.1]),.., ([A.sub.k], [Mu.sub.k])}

is a subset of [X.sup.(i)] we compute the set

{([B.sub.l], [Mu]') | 1 [is less than or equal to] l

[is less than or equal to] m, [Mu]' =

glb([Mu.sub.1],..., [Mu.sub.k], [Mu])}. Let

[f.sub.1],..., [f.sub.r] be all the TFDs in F that

satisfy the above condition and [Y.sub.1],...,

[Y.sub.r] the corresponding computed sets. Then

let [X.sup.(i+1)] = [X.sup.(i)] [union] [Y.sub.1]

[union]...[union] [Y.sub.r].

Step 2 is repeated until [X.sup.(i+1)] = [X.sup.(i)]. The

Algorithm returns the set

[X.sup.(i)] \ {(B, [Mu]') [element of]

[X.sup.(i)] | [exists][Nu] (B, [Nu])

[element of] [X.sup.(i)] with [Mu]' ?? [Nu]}

Example 3. As an example, let [w.sub.r] be the temporal type of "recent weeks" defined as follows: [w.sub.r](1) maps to the week starting July 4, 1994, and [w.sub.r] (2) to the week after that, and so on. Now let F = {A [right arrow] B, B [right arrow] [sub.month.A]}. It is easily seen that [A.sup.+] = {(A, [Mu.sub.Top]), (B, [w.sub.r])}, [B.sup.+] = {(B, [Mu.sub.Top]), (A, month)} and [AB.sup.+] = [A.sup.+] [union] [B.sup.+].

4. TEMPORAL NORMALIZATION

In this section, we extend the traditional normalization theory to temporal databases. Thus TFDs not only capture a certain type of constraints in temporal databases, but can also be used to eliminate data redundancy and other anomalies in temporal relations. We begin by reconsidering the temporal module (COURSES, day, [phi]) given in Example 1.

Example 4. As discussed in the Introduction, the temporal module scheme (COURSES, day) given in Example 1 should be decomposed into four temporal module schemes (CREDIT, [Mu.sub.Top]), (TA-ASSIGNMENT, week), (TA-PAY, month), and (CLASS-HISTORY, day) where CREDIT = <Course#, Credits>, TA-ASSIGNMENT = <Course#, Lab-Assist>, TA-PAY = <Lab-Assist, Lab-Assist-Pay>, and CLASS-HISTORY = <Course#, #Students-in-Class). The time windowing function [Phi] given earlier will be also "decomposed" into four time windowing functions [Phi.sub.1], [Phi.sub.2], [Phi.sub.3], and [Phi.sub.4], defined as follows.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

In order to have meaningful decompositions of temporal modules, we need to define how we can join decomposed temporal modules to recover original temporal modules. To this end, we define temporal natural join and temporal projection operations.

Definition. Let [M.sub.1] = ([R.sub.1], [Mu], [Phi.sub.1]) and [M.sub.2] = ([R.sub.], [Mu], [Phi.sub.2]) be two temporal modules in terms of the same temporal type [Mu]. Then [M.sub.1] [??.sub.T] [M.sub.2], called the temporal natural join of [M.sub.1] and [M.sub.2], is the temporal module M = ([R.sub.1] [union] [R.sub.2], [Mu], [Phi]), where [Phi] is defined as follows: For each i [is greater than or equal to] 1, [Phi.sub.1](i) ?? [Phi.sub.2](i), where ?? is the traditional natural join operation (cf, Ullman [1988]).

Thus temporal natural join of two temporal modules is obtained by taking the natural joins of their corresponding snapshots, namely, the natural join of the two nontemporal relations [Phi.sub.1](i) and [Phi.sub.2](i) for each positive integer i. Temporal projection of temporal modules is defined similarly. Basically, the projection of a temporal module is a collection of snapshot projections.

Definition. Let M = (R, [Mu], [Phi]) and [R.sub.1] [subset] R. Then [Pi.sup.T.sub.R.sub.1], (M), called the projection of M on [R.sub.1], is the temporal module [R.sub.1], [Mu], [Phi.sub.1]), where [Phi.sub.1], is defined as follows. For each i [is greater than or equal to] 0, [Phi.sub.1](i) = [Pi.sub.R.sub.1]([Phi])(i)), where [Pi] is the traditional projection operation (cf., Ullman [1988]).

We define the projection of a set of TFDs analogously to the standard definition of projection of a set of functional dependencies.

Definition. Given a set of TFDs F, the projection of F onto a set of attributes Z, denoted [Phi.sub.Z](F), is the set of TFDs X [right arrow] [Nu] Y logically implied by F such that XY [subset] Z.

Note that the projection only takes into account the attributes, not the underlying temporal types.

As we have seen for F, [Pi.sub.Z](F) can also be infinite. However, because we know how to compute the finite closure of attributes, we can compute a "finite cover" of the projection of F on Z as follows. Let

[Pi.sub.Z](F) = {X [right arrow] [Nu] [A.sub.1] ... [A.sub.m|[XA.sub.1 ... [A.sub.m [subset] Z and ([A.sub.i], [Nu]) [element of] [X.sup.+]

for 1 [[is less than or equal to] i [is less than or equal to] m}. Clearly [Pi.sub.Z](F) is a finite set. By the completeness of the Three Finite inference Axioms and the definition of [F.sup.+], we can easily see that [Pi.sub.Z](F) is a finite cover of [Pi.zub.Z](F). That is as follows.

Proposition 3. The following holds.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Before we define lossless decompositions, we need first to introduce three auxiliary operations, Down, Up, and [union.sub.T], on temporal modules.

Definition. For M = (R, [Mu], [Phi]) and temporal types [v.sub.1] and [v.sub.2], let Down(M, [v.sub.1]) and Up(m, [v.sub.2]) be the temporal modules (R, [v.sub.1], [Phi.sub.1]) and (R, [v.sub.2], [Phi.sub.2]), respectively, where [Phi.sub.1] and [Phi.sub.2] are defined as follows. For each i [is greater than or equal to] 1, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Note that if there is no j such that [Mu](j) [subset or equal to v.sub.2](i), then [Phi.sub.2](i) = 0

Intuitively, the function Down maps a temporal module in terms of the temporal type [Mu] to a temporal module in terms of a finer temporal type such that each tuple that is valid at tick i of [Mu] is taken to be valid at each tick j of [v.sub.1] satisfying [v.sub.1](j) [subset or equal to Mu](i). For example, consider a temporal module [M.sub.1] that registers faculty ranks in terms of the temporal type academic-year. Then Down([M.sub.1], day) converts [M.sub.1] into a temporal module in terms of day with a windowing function that gives for each day the rank of the faculty member during the corresponding academic year. Similarly, the function Up maps a temporal module in terms of the temporal type [Mu] to a temporal module in terms of a coarser temporal type such that each tuple that is valid at tick i in [Mu] is taken to be valid at each tick j in [v.sub.2] satisfying [Mu](i) [subset or equal to v.sub.2](i). Consider a temporal module [M.sub.2] that registers faculty ranks in terms of the temporal type day. Then Up ([M.sub.2], academic-year) is a temporal module that registers faculty ranks in terms of academic year, with a windowing function that gives for an academic year all the rank(s) of faculty members during the days of the academic year. Notice, however, that the definitions of Down and Up do not require the type argument to be a type respectively finer and coarser than the original type. Furthermore, the Up and Down operations can result in a module whose information is not necessarily implied by the module given as the argument. For example, if a faculty member has a certain rank in a day he does not necessarily have the same rank over all days in the academic year. In this case, Down(Up([M.sub.2], academic-year), day) may contain more than one rank for the faculty member in a given day. In general, M is not necessarily equal to Up(down(M, v), [Mu]) or Down(up(M, v), [Mu]).

Definition. Given temporal modules [M.sub.1], . . . , [M.sub.n] over the same relation scheme R and temporal type [Mu], their union is defined as the module M = (R, [Mu], [Phi]), where [Phi], for each tick, is simply the union of the values of the windowing functions for the same tick in the other modules. Formally, if [M.sub.j] = (R, [Mu], [Phi.sub.j]) for j = 1, . . . , n, then [M.sub.1], [union.sub.T] . . . [union.sub.T] [M.sub.n] = (R, [Mu], [Phi]) where, for each i [is greater than or equal to] 1, [Phi](i) = [union.sub.1 is less than or equal to j is less than or equal to n] [Phi.sub.j](i).

We now introduce the notion of decomposition of a temporal module scheme.

Definition. A decomposition of a temporal module scheme (R, [Mu]) is a set of temporal module schemes [Rho] = {([R.sub.1], [Mu.sub.1]), . . . , ([R.sub.k], [Mu.sub.k])} such that (a) [R.sub.i] [subset or equal to] R for i = 1, . . . , k, (b) R = [R.sub.1 union] . . . [union R.sub.k], and (c) for each type [Mu.sub.1], there is no tick of [Mu] overlapping a tick of [Mu.sub.1]; that is, for all l and j, either [Mu](l) [intersection Mu.sub.i](j) = 0 or [Mu](l) [subset or equal to Mu.sub.i](j).

Conditions (a) and (b) are the same conditions in the traditional definition of decomposition [Ullman 1988]. Condition (c) states a particular property of the temporal types. For each type in the decomposition, each of its ticks having an intersection with a tick of the original type should completely include it. This condition is motivated by the purpose of a decomposition, namely, reducing redundancy. Any type not satisfying that condition is not useful for that purpose. If a tick of the original type contains two or more ticks of a type in the decomposition [hence contradicting condition (c)], some attribute values would be replicated for both ticks in the new type. As an example, consider the (COURSES, day) scheme. A decomposition including a scheme with type hour would obviously increase the redundancy in the decomposed module. It could seem natural to simplify the preceding definition allowing only types in the decomposition that are coarser than the original type. However, this would be too restrictive. In the discussion following the lossless decomposition definition we show a decomposition example where the two types involved are actually both finer than the original type, but the decomposition still satisfies condition (c).

Although in a traditional decomposition the decomposed relations are simply obtained by projection from the original one, in our case the decomposed modules must also have values in terms of a new type. Between the two operations that we have available for type conversion, the most adequate is Up because, due to condition (c) in the definition of decomposition, all values that can be derived using Down on these types can be derived also with Up. Hence if M is the original module and ([R.sub.j], [Mu.sub.j]) is a scheme in the decomposition, the decomposed module [M.sub.j] is given by Up([Pi.sup.T.sub.R.sub.j] (M), [Mu.sub.j]). This transformation consists of taking the projection of M on [R.sub.j] and "raising" the resulting module to the type [Mu.sub.j].

We now define the general notion of a lossless decomposition of a temporal module scheme using the three auxiliary operations Down, Up, and [Union.sub.T].

Definition (Lossless Decomposition). Let (R, [Mu]) be a temporal module scheme and F a set of TFDs. A decomposition [Rho] of (R, [Mu]) with respect to F is said to be a lossless decomposition if there exist subsets [Rho.sub.1], . . . , [Rho.sub.m] of [Rho] such that for each temporal module M on (R, [Mu]) that satisfies all TFDs in F, we have

M = Join([Rho.sub.1]) [union.sub.T] . . . [union.sub.T] Join([Rho.sub.m]),

where

Join([Rho.sub.i])

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

for each [Rho.sub.i] = {(R.sup.i.sub.1], [Mu.sup.i.sub.1]), . . . , (R.sup.i.sub.k], [Mu.sup.i,sub.k])}.

The reason for using subsets of the decomposition and the union of the corresponding joins in the preceding definition can be explained with a simple example. Suppose we have a scheme (R, day): we can decompose it in two schemes, such that in the type of the first ([Mu.sub.1]) we just consider days from Monday through Friday and in the other ([Mu.sub.2]) we consider Saturdays and Sundays. Hence [Mu.sub.1] and [Mu.sub.2] form a partition of the original scheme type (day). It is clear that our modules can be represented in the original scheme or in the two new schemes without losing information. This is captured by the preceding definition considering the two subsets [Rho.sub.1], = {(R, [Mu.sub.1])} and [Rho.sub.2] = {(R, [Mu.sub.2])}. Let M = (R, day, [Phi]) be a module. Because [Mu.sub.1] consists of just a subset of the ticks of day and the relation scheme is still R, the transformation Down(Up(Pi.sup.T.sub.R](M), [Mu.sub.1]), day), corresponding to Join([Rho.sub.1]), gives a module [M.sub.1] = (R, day, [Phi.sub.1]), where [Phi.sub.1](i) = [Phi](i) for each i corresponding to a working day (Monday through Friday), whereas it is empty for other days. Analogously, Join([Rho.sub.2]) gives a module [M.sub.2] = R, day, [Phi.sub.2]) where [Phi.sub.2](i) = [Phi](i) for each i corresponding to a day that is Saturday or Sunday, whereas it is empty for other days. Because it is immediate that M = [M.sub.1] [union.sub.T] [M.sub.2], the decomposition is lossless.

Consider the case when m = 1 in the previous definition; that is, the only condition for a lossless decomposition is M = Join([Rho]). The Join operation intuitively consists of a "join" among the modules decomposed according to [Rho], that is, the modules resulting from Up([Pi.sup.T.sub.R.sub.j](M), [Mu.sub.j]) where ([R.sub.j], [Mu.sub.j]) is the corresponding scheme in [Rho]. However, the temporal natural join ([?.sub.T]) requires the modules to have the same type and the result of the join should be a module with type [Mu], because we have to check for equality with respect to the original module. This motivates the need to convert each module to type [Mu] before taking the join. By the definition of decomposition it is quite clear that the Down operation is the natural candidate for this purpose. As explained earlier, the composition of Down and Up operations could result in values not present in the original module. However, in a lossless decomposition, those values must be eliminated by the temporal natural join.

Example 5. Now we are in a position to describe how we can recover the COURSES temporal module from the four temporal modules in the decomposition. This is a case where we do not need to consider unions of joins. The [Rho] decomposition contains the four schemes corresponding to CREDIT, TA-PAY, TA-ASSIGNMENT, and CLASS-HISTORY, respectively with types [Mu.sub.Top], month, week, and day. The four terms in Join([Rho]) are:

Down(Up([Pi.sup.T.sub.Course#, Credits] (COURSES), [Mu.sub.Top]), day),

Down(Up([Pi.sup.T.sub.Lab-Assist, Lab-Assist-Pay] (COURSES), month), day),

Down(Up([Pi.sup.T.sub.Course#, Lab-Assist] (COURSES), week), day), and

DOWN(Up([Pi.sup.T.sub.Course#, #Students-in-Class] (COURSES), day), day).

It is easily seen that the first argument of the Down operation in each term is equal respectively to the modules CREDIT, TA-PAY, TA-ASSIGNMENT, and CLASS-HISTORY. The last term is exactly the CLASS-HISTORY module, because the Down and Up operations have no effect being COURSES already in terms of days. Hence we take the temporal join of Down(CREDIT, day), Down(TA-PAY, day), Down(TA-ASSIGNMENT, day), and CLASS-HISTORY to recover COURSES.

We introduce an equivalent notion of lossless decomposition for technical reasons. Suppose [Rho] is a lossless decomposition of (R, [Mu]) and M = (R, [Mu], [Phi]) is a temporal module on (R, [Mu]). Then for each tick i of [Mu], the relation [Phi](i) can always be recovered from the projections of M over the decomposition. This leads to the notion of tickwise lossless decomposition. For simplicity of presentation, we introduce the symbol MaxSub; we use it as a function such that for each tick i of [Mu] and a set [Rho] of schemes, MaxSub([Mu](i), [Rho]) is the maximal subset of [Rho] such that for each temporal type associated with these schemes there exists a tick covering tick i of [Mu]. That is,

MaxSub([Mu](i), [Rho]) = {(R, v) [element of Rho|Mu](i) [subset or equal to] v(j) for some j}.

This allows us to consider only that part of the decomposition which contributes to the recovery of the original module at tick i of [Mu].

Definition (Tickwise Lossless Decomposition). Let [Rho] be a decomposition of schema (R, [Mu]) and F a set of TFDs. The decomposition [Rho] is said to be a tickwise lossless decomposition of (R, [Mu]) with respect to F if for each nonempty tick k of [Mu], the following holds: if MaxSub([Mu](k), [Rho]) = {([R.sub.1], [Mu.sub.1]), . . . , ([R.sub.m], [Mu.sub.m])} and, for each 1 [is less than or equal to] i [is less than or equal to] m, ([R.sub.i], [Mu], [Phi.sub.i]) = Down(Up([Pi.sup.T.sub.R.sub.i], (M), [Mu.sub.i], [Mu]) and [k.sub.i] is the integer such that [Mu](k) [subset or equal to Mu.sub.i]([k.sub.i]), then [Phi](k) = [Phi.sub.i]([k.sub.1]) ? . . . ? [Phi.sub.m]([k.sub.m]).

The preceding definition intuitively says that for each tick of [Mu] we can reconstruct the original module from its decomposition.

Proposition 4. Let (R, [Mu]) be a temporal module scheme, F a set of TFDs, and p a decomposition of (R, [Mu]). Then p is a lossless decomposition of (R, [Mu]) with respect to F iff it is a tickwise lossless decomposition of (R, [Mu]) with respect to F.

The following theorem gives a sufficient condition for a lossless decomposition. In particular, this condition ensures that the decomposition given in Example 4 is lossless.

Theorem 4. Let F be a set of TFDs. The decomposition ([R.sub.1], [Mu]) and (R.sub.2], v) of ([R.sub.1] U [R.sub.2], [Mu]), where [Mu] [less than or equal to] v, is lossless with respect to F if for all [i.sub.1] and [i.sub.2, [Mu ([i.sub.1], [i.sub.2]) [subset or equal to] v(j) for some j implies that [R.sub.1] [intersection] [R.sub.2] [[right arrow] [R.sub.2] is in [Pi] [Mu.sup.(i.sub.1,i.sub.2] (F).

5. TEMPORAL BOYCE-CODD NORMAL FORM

In this section we define the temporal analogue of Boyce-Codd Normal Form (BCNF). The temporal BCNF (TBCNF) retains the spirit of BCNF. That is, TBCNF does not allow any redundancy introduced by TFDs. We then give a decomposition algorithm that renders lossless TBCNF decomposition for any given temporal module scheme.

In order to define TBCNF, we need the notion of keys.

Definition. Let (R, [Mu]) be a temporal module scheme and F a set of TFDs involving only attributes in R. A set of attributes X [subset or equal] R is said to be a temporal superkey of (R, [Mu]) if X [right arrow] [Mu] R is logically implied by the TFDs in F.(11)

If M is a temporal module on (R, [Mu]) and X is a superkey of (R, [Mu]), then whenever [t.sub.1] = [t.sub.2] and [t.sub.2] are two tuples with [t.sub.1] [X] = [t.sub.2] [X] for the same tick in [Mu], then [t.sub.1] = [t.sub.2].

A temporal superkey X of (R, [Mu]) is called a temporal candidate key if for each A in X, X - {A} is not a temporal superkey of (R, [Mu]).

We are now ready to define the concept of temporal BCNF.

Definition (Temporal BCNF). A temporal module scheme (R, [Mu]), with a set F of TFDs, is said to be in temporal BCNF (TBCNF) if for each TFD X [right arrow.sub.v] Y that is logically implied by F, where (a) XY [subset or equal to] R, (b) Y ?? X, and (c) at least one tick of [mu] is covered by some tick of v, the following conditions hold.

(i) X [right arrow] [Mu] R is logically implied by F; that is, X is a superkey of (R, [Mu]), and (ii) For all nonempty ticks [i.sub.1] and [i.sub.2] of [Mu], with [i.sub.1] [is not equal to] [i.sub.2,] X [right arrow] Y ?? [Mu.sup.(i.sub.1,i.sub.2)] (F).

Although the first condition is the temporal analogue of the traditional definition of BCNF, the second condition disallows redundancy due to temporal functional dependencies. Indeed, we have redundancy whenever there exists a TFD with a temporal type v such that two ticks of the temporal type of the module are covered by the same tick of v. In this case, if we have two tuples separately on these two ticks, one of them may have redundant information. Thus the two conditions for the TBCNF eliminate all possible data redundancy that may arise by the presence of TFDs.

Example 6. The temporal module scheme (COURSES, day) in Example 1 is not in TBCNF because, for example, F ?? Course # [right arrow.sub.week] Lab-Assist. However, the four schemes resulting from the decomposition and illustrated in Example 4 are in TBCNF.

As in the traditional relational theory, decomposition is used to convert temporal module schemes that are not in TBCNF into schemes that are in TBCNF. Similar to the traditional BCNF, the price we pay for such a decomposition is that some of the TFDs may not be preserved. We discuss dependency-preserving decompositions in Section 6.

5.1 Decomposing Temporal Module Schemes into TBCNF

In this subsection, we describe an algorithm that decomposes temporal schemes into TBCNF.

In the algorithm we use two operators to create a new temporal type from a given one. The first one is called the collapsing operator which, given a temporal type [mu] and a positive integer i, gives a type [Mu.sub.c] by combining tick i and i + 1 of [Mu] into one tick and retaining all other ticks of [Mu] Formally, [Mu.sub.c] is the temporal type defined as follows. For all j [greater than or equal to] 1, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The second operator is called the pruning operator which, given a temporal type [Mu] and a positive integer i, produces a type [Mu.sub.d] by dropping the tick i of [Mu] and keeping all other ticks of [Mu]. Formally, [Mu.sub.d] is the temporal type defined as follows. For all [greater than or equal to] 1, let

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

Figure 4 presents the TBCNF decomposition algorithm. We note that if a scheme ([R.sub.i], [Mu.sub.i]) is not in TBCNF with respect to [Pi.sub.Ri] (F), then there exists a TFD in [Pi.sub.Ri] (F) such that one of the two conditions in the definition of TBCNF is violated. Thus the decomposition required at Step 2 can always be carried out. Indeed, because ([R.sub.i], [Mu.sub.i]) is not in TBCNF with respect to [pi.sub.Ri] (F), there exists X [right arrow.sub.v] A in [Pi.sub.Ri] (F) such that one of the two conditions for TBCNF is violated. By Proposition 3, there exist [v.sub.1,...,sub.n] such that v [less than or equal to] c {[v.sub.1,...,v.sub.n]} and X [right arrow.sub.vj] A is in [Pi.sub.Ri] (F) for each 1 [less than or equal to] j [less than or equal to] n. If X is not a superkey of ([R.sub.i], [Mu.sub.i]) (i.e., the first condition for TBCNF is violated), then clearly each X [right arrow.sub.vj] A violates the first condition for TBCNF. Now suppose there exist integers [k.sub.1], and [k.sub.2], with [k.sub.1] [is not equal to] [k.sub.2], and k such that [Mu.sub.i] ([k.sub.1], [k.sub.2) [subset or equal to] v(k) (i.e., the second condition for TBCNF is violated). Because v [less than or equal to.sub.c] {[v.sub.1,...,v.sub.n]), there exist j and k' such that v(k) [subset or equal to] [v.sub.j] (k'). Hence [Mu.sub.i] ([k.sub.1], [k.sub.2]) [subset or equal to] v(k) [subset or equal to] vj(k'); that is, X [right arrow.sub.vj] A violates the second conditions for TBCNF. We conclude that if (R.sub.i], [Mu.sub.i]) is not in TBCNF with respect to [Pi.sub.Ri] (F), we can always find a TFD in [Pi.sub.Ri] (F) that violates the condition for TBCNF.

Algorithm for TBCNF Decomposition

INPUT: A temporal module scheme (R, [Mu]) and a set F of temporal functional dependencies.

OUTPUT: A lossless decomposition [Rho] of (R, [Mu]) such that each scheme in [Rho] is in TBCNF.

METHOD: We compute a sequence [Rho.sup.(0)], [Rho.sup.(1)],.... of decompositions of (R, [Mu]):

Step 1. Let [Rho.sup.(0)] = {(R, [Mu])}.

Step 2. Suppose [Rho.sup.(j)] is the last set we have computed. If at least one scheme in [Rho.sup.(j)] is not in TBCNF we compute [Rho.sup.(jt1) as follows. Take a scheme ([R.sub.i], [Mu.sub.i]) from [Rho.sup.(j)] that is not in TBCNF and a TFD X [right arrow.sub.v] A from IFRI (F) that violates the TBCNF conditions for this scheme. Let

[Rho.sup.(j+1)] = [Rho.sup.(j) \ {(R.sub.i], [Mu.sub.i])}), ([R.sub.i] - A, [v.sub.2]), (X A, [v.sub.3)],

where [v.sub.1], [v.sub.2], and [v.sub.3] are the new temporal types defined as follows:

* [v.sub.1] is obtained from [Mu.sub.i] by recursively applying the pruning operator to drop all the nonempty ticks of [Mu.sub.i] that are covered by some tick of v. If [v.sub.1] results in an empty type, then the corresponding scheme, i.e., (R.sub.i], [v.sub.1l), is not added into [Rho.sup.(j+1)].

* [v.sub.2] is the complement of [v.sub.1], that is, its ticks are all the ticks of [Mu.sub.i] that are covered by some tick of v.

* [v.sub.3] obtained from [v.sub.2] by recursively applying the collapsing operator to combine each pair of ticks of [v.sub.2 ]that are covered by some tick of v. That is, each non-empty tick of [v.sub.3] is a combination of one or more ticks of [v.sub.2.] Moreover, no two ticks of [v.sub.3] are covered by a single tick of v.

Step 2 is repeated until each scheme in [Rho.sup.(j) is in TBCNF. The algorithm returns [Rho.sup.(j)]

Theorem 5. The algorithm in Figure 4 always terminates and gives a lossless TBCNF decomposition of the input temporal scheme with respect to the input set of TFDs.

In many cases, Step 2 of the preceding TBCNF decomposition algorithm takes a much simpler form. For example, if [Mu.sub.i] is day and v is month, then [Rho.sup.j+1) in Step 2 will be the set obtained by replacing (R, day) with (R \ {A}, day) and (XA, month). In general, at Step 2, if [Mu.sub.i] [less than or equal to] v and the two types [Mu.sub.i] and v cover the same periods of time,(12) then [Rho.sup.(j+1)] is the set obtained by replacing (R, [Mu.sub.i]) in [Rho.sup.(j)] by (R \ {A}, [Mu.sub.i]) and (XA, v); that is,

[Rho.sup.(j+1)] = ([Rho.sup.(j)] \ {(R.sub.i], [Mu.sub.i])}) [union]

{(R.sub.i] - A, [Mu.sub.i]), (XA, v)}.

This is because in this case [v.sub.1] = [Mu.sub.Bottom], [v.sub.2] = [Mu.sub.i] and [v.sub.3] = v, and (R, [v.sub.1]) = (R, [Mu.sub.Bottom) is dropped from the decomposition. It is clear that if [Mu.sub.i] is second (hour, day, month, resp.) and v is hours (day, month, year, resp.), then the preceding simplification can be applied.

It is important to note that in the TBCNF decomposition algorithm, if [Mu] is the temporal type of the input temporal module scheme, all data are in one tick of [Mu], and all TFDs are in terms of the same temporal type [Mu], then the algorithm in Figure 4 reduces to the decomposition algorithm of traditional (nontemporal) BCNF.

Example 7. In this example we illustrate the application of the TBCNF decomposition algorithm on the (COURSES, day) temporal module scheme. We start with the four constraints in COURSES. They can be naturally expressed as TFDs:

(1) Course # [right arrow] [Mu]Top] Credits (2) Course # [right arrow.sub.day] # Students-in-Class (3) Course # [right arrow.sub.week] Lab-Assist (4) Lab-Assist [right arrow.sub.month] Lab-Assist-Pay

The two temporal constraints that could not be expressed as FDs are captured by TFDs (3) and (4). Note also that the FD Day, Course # [right arrow] Lab-Assist, Lab-Assist-Pay is captured by the TFDs (3) and (4). Let F denote the set consisting of the TFDS (1)-(4). In addition to the TFDs in F, the finite closure of F includes Course # [right arrow.sub.glb(month,week)], Lab-Assist-Pay obtained by the extended transitivity axiom. Because, typically, a month does not begin (resp., ends) with the exact beginning (resp., end) of a week, their glb is not week, but a new type. We would not be able to recognize these kinds of dependencies if we ignored the temporal types in the decomposition or if we treat temporal types as attributes.

One possible output of our TBCNF decomposition algorithm is the set of temporal module schemes (cf. Figure 2):

(<Course#, Credits), [[Mu].sub.Top]),

(<Lab-Assist, Lab-Assist-Pay>, month),

(Course#, Lab-Assist), week), and

(Course#, #Students-in-class), day)

This is probably the most intuitive and desirable decomposition for this example. A different order of the TFDs in the finite closure of F used in the decomposition would have given a different set of schemes, possibly involving the new type glb(month, week). Nevertheless, any application order would result in a lossless TBCNF decomposition.

We now show the step-by-step application of the TBCNF algorithm in decomposing the COURSES relation.

Step 1. [[Rho.sup(o)] = {(R, day)) where

R =(Course#, Credits, Lab-Assist, Lab-Assist-Pay,

#Students-in-class>.

Step 2, iteration 1. (R, day) is not in TBCNF because the TFD Course# [right arrow][.sub.[Mu]Top] Credits is in [Pi][.sub.R] (F) and violates the TBCNF conditions for this scheme. To compute [Rho][.sup.(1)] we have to first obtain three types [v.sub.1], [v.sub.2], [v.sub.3]. Note that in this case [Mu][.sub.Iota] = day and v = [Mu][.sub.Top] Because day [is less than or equal to] [Mu][.sub.Top] and both types cover the whole absolute time, we know that [v.sub.1] = [Mu][.sub.Bottom], [v.sub.2] = day, and [v.sub.3] = [Mu][.sub.Top]. Hence [Rho][.sup.(1)] = {(R \ {Credits}, day), (<Course#, Credits>, [Mu][.sub.Top])}.

Step 2, iteration 2. The TFD Lab-Assist [right arrow][.sub.month], Lab-Assist-Pay is in [Pi][.sub.R], (F) where R' = R \ {Credits} and violates the TBCNF conditions for scheme (R \ {Credits}, day). Now [Mu][.sub.Iota] = day and v = month. Even in this case [v.sub.1], [v.sub.2], [v.sub.3] are easily obtained because day [is less than or equal to] month and both types cover the whole absolute time. Hence

[Rho][.sup.(2)] = {(<Course#, Credits>, [Mu][.sub.Top]), (<Lab-Assist, Lab-Assist-Pay>, month),

(<Course#, Lab-Assist, #Students-in-class), day)}.

Step 2, iteration 3. The TFD Course# [right arrow][.sub.weel], Lab-Assist is in [Pi][.sub.R], (F) where R' = R \ {Credits, Lab-Assist-Pay} and violates the TBCNF conditions for the last scheme in [Rho][.sup.(2)]. Now [Mu][.sub.Iota] = day and v = week. Even in this case [v.sub.1], [v.sub.2], [v.sub.3] are easily obtained because day [is less than or equal to] week and both types cover the whole absolute time. Hence [Rho][.sup.(3)] = {(<Course#, Credits>, [Mu][.sub.Top]); (<Lab-Assist, Lab-Assist-Pay>, month), (<Course#, Lab-Assist), week), (<Course#, #Students-in-Class>, day)}.

The algorithm terminates, because each scheme in [Rho][.sup.(3)] is in TBCNF.

6. PRESERVATION OF DEPENDENCIES

It is well known that in order to eliminate all data redundancy due to (nontemporal) functional dependencies, we have to pay the price of not preserving some dependencies [Ullman 1988]. Because (nontemporal) BCNF is a special case of TBCNF, it is no surprise that TBCNF decomposition may not preserve all TFDs. What is a surprise is that even if a module scheme has only two attributes, redundancy may not be totally eliminated without losing TFDs.(13) As an example, consider the temporal scheme (AB, day) with TFDS A [right arrow][.sub.week] B and A [right arrow][.sub.month] B. Clearly, (AB, day) is not in TBCNF. By using the TBCNF decomposition algorithm, {(A, day), (AB, month)} can be a lossless TBCNF decomposition. However, the TFDA A [right arrow][.sub.week] B cannot be enforced in either scheme. In fact, as we show in the following, there is no way of enforcing both TFDs without any data redundancy.

In order to formally capture the intuition of "enforcing TFDs," as in the traditional relational design theory, we define the notion of dependency preservation.

Definition (Dependency Preservation). Given a module scheme (R, [Mu]), a set F of TFDs, we say that a decomposition [Rho] = {([R.sub.1], [Mu][.sub.1]), . . . , ([R.sub.k], [Mu][.sub.k])} preserves the dependencies in F if, for each module M on (R, [Mu]),

Up ([Pi][.sup.T][.sub.Ri] (M), [Mu][.sub.Iota] satisfies [Pi][.sub.Ri] (F) for each i=1, . . ., k implies M satisfies F.

First, we deal with the simple case where each TFD has the same left-and right-hand sides (but in terms of a different temporal type). The example given in the beginning of this section is such a case. We show how these temporal functional dependencies can be preserved.

Given a temporal type [Mu] and a set of TFDs F = {X [right arrow][.sub.[Mu]1] Y, . . ., X [right arrow].[.sub.[Mu]n] Y}, let the function Raise([Mu], F) return a new type [Lambda] obtained starting with [Lambda] = [Mu] and recursively collapsing each pair of contiguous ticks [i.sub.1] and [i.2] such that both of the following conditions are satisfied.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

The Raise function gives a temporal type that is the coarsest of all the temporal types v such that each tick of v is either a tick of [Mu] or a composition of a set of ticks of [Mu] covered by the same TFD in F, and such that {(R, v)} still preserves all TFDs in F.

For example, given the temporal type day and F = {A [right arrow][.sub.week B, A [right arrow][.sub.month] B}, Figure 5 shows the temporal type Raise(day, F). In the figure, each interval of a temporal type corresponds to a tick of that type. We now show that this Raise function indeed has the required properties.

Proposition 5. Let (R, [Mu]) be a module scheme and F = {X [right arrow][.sub.[Mu]1] Y, . . ., X [right arrow][.sub.[Mu]n] Y} a set of TFDs, with XY ?? R, Then for each module M = (R, [Mu], [Phi]), Up(M, Raise([Mu], F)) satisfies F iff M satisfies F.

Preservation of TFDs is in conflict with elimination of redundancy. Returning to our example in the beginning of this section, let [Lambda] = Raise-(day, (F), where F = {A [right arrow][.sub.week] B, A [right arrow][.month] B}. Let [d.sub.1] be March 31, 1994 and [d.sub.2] April 1, 1994. Now suppose that the tuple t = (a, b) is in both [Phi] ([d.sub.1]) and [Phi] ([d.sub.2]). This is redundant. However, [d.sub.1] and [d.sub.2] belong to the same week but different months. Hence by Figure 5, [d.sub.1] and [d.sub.2] are not combined in calculating [Lambda]. Hence these two tuples remain separate in the new module Up(M, [Lambda]). If [d.sub.1] and [d.sub.2] were in one tick of [Lambda], then the decomposition {(A, day), (AB, [Lambda])} of (AB, [Lambda]) would not preserve dependencies.

In the next section, we define the temporal analogue of (nontemporal) 3NF that relaxes, in a restricted way, the conditions of TBCNF to allow certain types of data redundancy in order to preserve TFDs.

7. TEMPORAL THIRD NORMAL FORM

Temporal third normal form (T3NF) is defined as follows.

Definition (Temporal 3NF). A temporal module scheme (R, [Mu]) with a set F of TFDs is in temporal third normal form (T3NF) if for each TFD X [right arrow][.sub.v] A that is logically implied by F, where (a) XA ?? R, (b) A ?? X, and (c) at least one tick of [Mu] is covered by a tick of v, one of the following conditions holds.

(i) A is part of a temporal candidate key of (R, [Mu]), or (ii) X is a temporal superkey of (R, [Mu]), and there do not exist [i.sub.1] and [i.sub.2], with [i.sub.1] [is not equal] [i.sub.2], such that X [right arrow] A [element of] [Pi][.sub.[Mu].([i.sub.1],[i.sub.2] (F), unless there exists [i.sub.3], with [i.sub.3] [is not equal] [i.sub.1], such that X [right arrow] A [element of] [Pi][.sub.[Mu].([i.sub.1],[i.sub.3] (F) but X [right arrow] A ?? [Pi][.sub.Mu].([i.sub.1],[i.sub.2],[i.sub.3] (F).

Turning back to the temporal module scheme (AB, day) with F = {A [right arrow][.sub.week] B, A [right arrow][.sub.month] B}, we can easily see that (A, day) and (AB, Raise([Mu], F)) is a lossless T3NF decomposition of (AB, day).

To see the difference between T3NF and TBCNF, we consider the temporal module scheme (AB, day) with TFDs A [right arrow][.sub.day] B and B [right arrow][.sub.month] A. This scheme is in T3NF and, thus, does not require any further decomposition. In contrast, because B [right arrow][.sub.month] A holds, (AB, day) is not in TBCNF. By the algorithm in Figure 4, it is decomposed into (AB, month) and (B, day). Although these schemes are in TBCNF, we can no longer enforce the TFD A [right arrow][.sub.day] B. In this case, the original scheme which is in T3NF may be preferred over the decomposed schemes in TBCNF.

7.1 Decomposing Temporal Module Schemes into T3NF

In order to present the T3NF decomposition algorithm, we introduce the notions of subtype, partition of a type, and union of subtypes in a partition.

A subtype of a type [Mu] is intuitively a type having only a subset of the ticks of [Mu]. For example, Sunday, intended as a set of ticks corresponding to days that are Sundays, can be considered a subtype of day.

Definition. We say that a type [Mu][.sub.1], is a subtype of a type [Mu] if for each positive integer i, [Mu][.sub.1] (i) = [Mu] (j) for some positive integers j.

A partition of a type [Mu] is intuitively a set of disjoint subsets such that the set of all their ticks is the set of ticks of [Mu]. Referring to the previous example, the set of the seven types {Monday, . . ., Sunday} is a partition of the type day.

Definition. We say that a set of types {[Mu][.sub.1], . . ., [Mu][.sub.n]} is a partition of type [Mu] if:

-- each [Mu].[sub.1] with 1 [is less than or equal to] i [is less than or equal to] n is a subtype of [Mu]; and -- for each positive integer l, [Mu](l) = [Mu][.sub.k] (j) for some k with 1 [is less than or equal to] k [is less than or equal to] n and positive integer j. Moreover, [Mu][.sub.k] (j) [is not equal] [Mu][.sub.i] (r) for each i [is not equal] k and positive integers j and r.

We can also take the union of two subtypes of the same type generating a new subtype. For example, Saturday [union] Sunday is the type whose ticks correspond to days that are only Saturdays and Sundays. This can be useful if we want to store information concerning only these two days, but still be able to relate this information to other information taken for different days.

Definition. Given types [Mu][.sub.1] and [Mu][.sub.2] that are both subtypes of [Mu], we define their union as [Mu][.sub.1] [union] [Mu][.sub.2] such that for each tick of [Mu][.sub.1] [union] [Mu][.sub.2] there exists a tick of [Mu][.sub.1] or of [Mu].sub.2] that is equal to it and, conversely, for each tick of [Mu][.sub.1] and of [Mu][.sub.2] there exists a tick of [Mu][.sub.1] [union] [Mu][.sub.2] that is equal to it. That is, for all i ([Mu][.sub.1] [union] [Mu][.sub.2]) (i) = [Mu][.sub.1] ([j.sub.1]) or ([Mu][.sub.1] [union] [Mu][.sub.2]) (i) = [Mu][.sub.2] ([j.sub.2]) for some [j.sub.1] and [j.sub.2] and, for all [j.sub.1] and [j.sub.2], [Mu][.sub.1] ([j.sub.1]) = ([Mu][.sub.1] [union] [Mu][.sub.2]) ([i.sub.1]) and [Mu][.sub.2] ([j.sub.2]) = ([Mu][.sub.1] [union] [Mu][.sub.2]) ([i.sub.2]) for some [i.sub.1] and [i.sub.2].

Note that the union of two subtypes of [Mu] is still a subtype of [Mu] and the set obtained by taking a partition of [Mu] and replacing two subtypes with their union is still a partition of [Mu].

Similar to the traditional 3NF decomposition [Ullman 1988], we use temporal minimal cover for the T3NF decomposition algorithm.(14)

Definition (Temporal Minimal Cover). Let F be a set of TFDs and NTC(F) the traditional nontemporal minimal cover of [Pi.sub.Phi](F). We say that G is a (temporal) minimal cover of F if G = {X [right arrow] [.sub.Mu.A]/(A, [Mu]) [Epsilon] [X.sup.+] and X [right arrow] A [Epsilon] NTC(F)}.

As an example, consider the temporal types and TFDs given in Example 3. Clearly NTC(F) = [Pi.sub.Phi](F) = {A [right arrow] B, B [right arrow] A}. By definition, the following set is a temporal minimal cover of F: {A [right arrow.sub.wr] B, B [right arrow.sub.month] A}, which is F itself as expected.

We are now ready for the T3NF decomposition algorithm.

THEOREM 6. The algorithm in Figure 6 always terminates and gives a lossless T3NF decomposition of the input temporal module scheme. Furthermore, the decomposition preserves dependencies.

We illustrate the T3NF decomposition algorithm by continuing the example given before Theorem 6. Assume (AB, day) is our temporal module scheme. The minimal cover of F is F itself. Therefore, [Pi.sub.Phi](MinCov(F)) = {A [right arrow] B, B [right arrow] A}. For Step 1, we use the two temporal types: month and [w.sub.r] to partition temporal type day. Here day is partitioned into two types: [d.sub.p] and [d.sub.r] that is, the part days (the days before July 4, 1994) and the recent days (the days on and after July 4, 1994). Thus [F.sub.d.sub.p] = {B [right arrow.sub.month A} and [F.sub.d.sub.r] = F. Now for A [right arrow] B and B [right arrow] A in [Pi.sub.Phi](MinCov(F)), we create the module schemes (AB, Raise([d.sub.r], [F.sub.A right arrow B])) and (AB, Raise([d.sub.p] [union] [d.sub.r], [F.sub.B right arrow A])), respectively. Note that Raise([d.sub.r], [F.sub.A right arrow] B]) = [w.sub.r], and Raise([d.sub.p] [union] [d.sub.r], [F.sub.B right arrow A]) = month. Hence we have created the module schemes (AB, [w.sub.r]) and (AB, month). As the final step, we create (B, [d.sub.p]) and (B, [d.sub.r]), which are combined into one temporal scheme (B, day). In summary, the T3NF decomposition of (AB, day) is (B, day), (AB, [w.sub.r), and (AB, month). All three new schemes are in T3NF, and the decomposition is lossless and preserves dependencies.

As for the TBCNF decomposition algorithm, in many cases, the T3NF decomposition algorithm takes a much simpler form. In general, if [Mu] (the type given in the input) is finer than each type of the TFDs in the temporal minimal cover, then Step 1 only consists of computing the temporal minimal cover and no partitioning is needed. Under the same condition, Step 2 will become:

Let [Rho] = 0. For each X [right arrow] A in [Pi.sub.Phi](MinCov(F)), add the following to [Rho]:

(XA, Raise([Mu], {X [right arrow.sub.vi] A/X [right arrow.sub.vi] A [Epsilon] MinCov(F)}))

It is clear that if the type in the input is day and the types of the TFDs in the minimal cover are among {day, week, month, year}, then the preceding simplifying condition is met.

Example 8. In this example we demonstrate the application of the T3NF decomposition algorithm on the (COURSES, day) temporal module scheme. We assume the TFDs (1)-(4) described in Example 7 still hold. However, because the decomposition obtained by applying the TBCNF algorithm in that example preserves dependencies, we modify the assumptions on the COURSES relation slightly to illustrate when the T3NF algorithm should be used. Suppose that the constraint that "the pay of a lab assistant does not change in a month" (i.e., the TFD Lab-Assist [right arrow.sub.month] Lab-Assist-Pay) is decided by the university central administration. Independently, the course teachers decide to pay the assistants weekly. This results in the constraint that the pay should not change in a week, adding the TFD:

(5) Lab-Assist [right arrow.sub.week] Lab-Assist-Pay.

The presence of this TFD would not have changed the decomposition obtained from the TBCNF algorithm, because the same order in the evaluation of TFDs could have been chosen. Indeed, all the schemes in [Rho](3) are in TBCNF with respect to the new set of TFDs; however, [Rho](3) does not preserve all TFDs.

If we want to preserve dependencies, we should use the T3NF decomposition algorithm. We give its step-by-step application to this example.

Step 1. We first compute MinCov(F) where F is the set of TFDs (1)-(4) and (5). In this case, we have that MinCov(F) = F. No partition has to be done in this step because the type of the module, day, is finer than any type in the minimal cover. That is, P = {day} and [F.sub.day] consists of all five TFDs (1)-(5).

Step 2. The same reason can be used to substitute Step 2 in the algorithm with its simplified version discussed earlier. In this case, the set [Pi.sub.Phi](MinCov(F)) simply includes the four FDs obtained by dropping the type from the five TFDs in F. Considering the FDs corresponding to TFDs (1), (2), and (4), we add to the empty p the following schemes:

(<Course#, Credits), [Mu.sub.Top]), (<Course#, Lab-Assist>, week), and

(<Course#, #Students-in-Class>, day).

Indeed, for these FDs, the Raise( ) function that determines the type of each new scheme has as arguments the type day and a single TFD whose type is coarser than (or equal to) day. The result of Raise in these cases is always the coarser type. The remaining FD, Lab-Assist [right arrow] Lab-Assist-Pay, is more interesting because it determines the addition to [Rho] of a scheme with a new type: (<Lab-Assist, Lab-Assist-Pay), Raise([Mu], F')), where F' consists of the TFDs (3) and (5) that have, respectively, types month and week. The new type is the coarsest type that we can use if we want to preserve dependencies. A graphical representation is shown in Figure 5. Because Steps 3 and 4 of the algorithm do not introduce any change, the four schemes in [Rho] are the result of the T3NF decomposition.

8. DISCUSSION

In the preceding sections we introduced a number of concepts, including temporal functional dependencies, lossless decomposition, and temporal normal forms, that are useful in designing relational temporal databases. We also introduced two decomposition algorithms that produce, respectively, lossless TBCNF decompositions and lossless dependency-preserving T3NF decompositions. In this section, we focus on decomposition algorithms. We examine in more detail the naive approach mentioned in the Introduction, and we investigate the computational complexity of the algorithms.

As mentioned in the Introduction, temporal types can be added to relation schemes and functional dependencies to capture constraints that are related to temporal granularity. In addition, these FDs can then be used to decompose the revised relation schemes using the traditional (nontemporal) decomposition algorithms. In effect, the decomposition shown in Figure 2 can be generated this way. By using the concepts and results of the previous sections, we can easily show that the decomposition is lossless and each relation scheme is in TBCNF and thus free of redundancy.

However, this naive decomposition approach is problematic. Indeed, the resulting BCNF decomposition may not be in TBCNF, and thus may not be free of redundancy. To exemplify the problem, reconsider the COURSES relation (with additional attributes Week and Month) and the related constraints. Suppose the traditional BCNF decomposition algorithm uses the following FDs in the given order.

(Course# [right arrow] Credits), (Week, Course# [right arrow] Lab-Assist), (Day, Course# [right arrow]

#Students-in-Class), (Day [right arrow] Month), and (Day [right arrow] Week).

The following schemes are obtained.

<Course#, Credits), <Course#, Lab-Assist, week), <Course#,

#Students-in-Class, Day>, <Course#, Lab-Assist-Pay, Day>.

(Note that we dropped <Day, month> and <Day, week> assuming that the relationships between these temporal types are encoded somewhere in the database.) However, the scheme <Course#, Lab-Assist-Pay, Day> is not in TBCNF, and hence still contains redundancy: the pay of a laboratory assistant does not change for several days.

The previous example shows that the order of FDs used in this naive approach is important: not every order of FDs used in decomposition gives redundancy-free (i.e., TBCNF) decompositions. Our next example shows that in certain cases, no matter which order of FDs is used, the naive approach does not give TBCNF decompositions.

Consider the temporal relation scheme CT-SCAN = (Patient, Nurse, Doctor, Hour), storing the information about CT (computed tomography) scans performed on patients. A tuple (p, n, d, h) in the relations says that the scan was performed for patient p at hour h by doctor d with the help of nurse n. Suppose doctors and nurses work in different shifts: doctors work 48-hour shifts during weekdays and 2-hour shifts during weekends, whereas nurses work 8-hour shifts every day. Clearly the doctor shifts and nurse shifts can be viewed as two temporal types Doc-Shift and Nurse-Shift respectively, with the two kinds of shifts starting at the same time. Note that there exist several ticks of Nurse-Shift covered by (i.e., contained in) one (weekday) tick of Doctor-shift, and several (weekend) ticks of Doctor-shift covered by one tick of Nurse-shift. The hospital regulation says that (i) each nurse is assigned to assist only one doctor on CT scans and the assignment cannot change until the end of the doctor's shift (the nurse may go off work and come back to be assigned to the same doctor if the doctor's shift has not ended yet), and (ii) a doctor can work with only one nurse and this relationship cannot change until the end of the nurse's shift (again, on weekends, a doctor may go off work and come back to work with the same nurse). These constraints are reflected by the TFDs Doctor [right arrow.sub.Nurse-Shift] Nurse and Nurse [right arrow.sub.Doctor-Shift] Doctor. If Doctor-Shift and Nurse-Shift are added into the CT-SCAN scheme, the TFDs can be written as the FDs Doctor, Nurse-Shift [right arrow] Nurse and Nurse, Doctor-Shift [right arrow] Doctor. Consider now the naive decomposition approach. There are the following possible decompositions(15) depending on the order of the FDs used: (i) (Doctor, Nurse, Nurse-Shift), (Patient, Doctor, Hour) and (ii) (Nurse, Doctor, Doctor-Shift), (Patient, Nurse, Hour). However, (Doctor, Nurse, Nurse-shift) contains redundant information on weekdays, and (Nurse, Doctor, Doctor-Shift) contains redundant information on weekends. This example shows that, unlike the COURSES example, no matter what order of the FDs is used in the traditional BCNF decomposition algorithm, the naive approach does not yield a redundancy-free decomposition.

The approach of this article is to distinguish between temporal and nontemporal attributes, and to give special treatment to temporal attributes by introducing the notion of temporal functional dependencies. This notion allows us to incorporate the relationships among temporal types into the derivation of dependencies and, thus, provide us with decomposition algorithms that give desired properties. For example, the CT-SCAN relation under our approach will produce the four-relation lossless TBCNF decomposition: (<Patient, Doctor), Hour), (<Doctor, Nurse>, Weekend-nurse-shift), (<Doctor, Nurse>, Weekday-doctor-shift), and (<Nurse), Weekday-nurse-shift), where the new temporal types are implied by their names (e.g., Weekday-doctor-shift is the temporal type with 48-hour ticks on weekdays and no tick for weekends).

In general, temporal types generated by the TBCNF and T3NF decomposition algorithms may be quite complex and not intuitive to users. This problem can be solved, however, by implementing a conceptual database level that allows the users to view the data, update, and pose queries assuming that the data are in terms of basic temporal types they intuitively understand. At the lower level, transparent to the users, the underlying implementation may use complex temporal types to facilitate the removal of data redundancy. This idea is similar to views of traditional relational databases. Referring to the COURSES example, the T3NF decomposition algorithm returns the temporal scheme (<Lab-Assist, Lab-Assist-Pay), Raise(day, F')), where F' contains TFDS having types month and week. Type Raise(day, F') could be used by the underlying implementation, whereas the user would see the scheme (<Lab-Assist, Lab-Assist-Pay), day). She will add tuples for Lab-Assist-Pay only on corresponding paydays. The implementation would take care of raising the information to the complex type in order to store it in the temporal module of the scheme (<Lab-Assist, Lab-Assist-Pay), Raise(day, F')). The exact understanding and implementation of this idea for the general case requires further investigation.

We now turn to the complexity of the two decomposition algorithms proposed in this article. To build effective procedures for the algorithms in the article, certain operations on temporal types must be effective. In the algorithm for [X.sup.+], we needed to calculate the glb of a finite number of temporal types; also, we needed to know if [Mu] < v for given temporal types [Mu] and v. In the TBCNF decomposition algorithm, given two temporal types [Mu] and v, we needed temporal types [v.sub.1], [v.sub.2], and [v.sub.3] that are basically obtained from a partition of temporal types and from combining certain ticks of a given temporal type. Finally, in the T3NF algorithm, we needed to compute the Raise function. Because our set of temporal types is uncountably infinite, it does not yield effective procedures for these operations. However, for realistic systems, many (infinite) temporal types can be finitely described in a way so that these operations are effective. For example, for periodic temporal types,(16) the preceding operations do have effective procedures. Note that everyday temporal types such as week, month, and the like are usually represented as periodic.

If the operations on temporal types count as atomic, the worst case complexity of the presented algorithms is exponential. We believe, however, that for practical cases the complexity is manageable. This is because there are essentially the following sources of exponentiality: i) the algorithm for the finite closure of attributes is exponential in the number of temporal types due to the fact that each subset of types may create, by computing glb, a new temporal type; ii) the TBCNF algorithm is based on the nonoptimized standard version in Ullman [1988, page 4041 which is known to be exponential; iii) the number of iterations in the TBCNF algorithm can grow exponentially due to temporal type creation. Similarly, the creation of the partition of the basic type in T3NF can take exponential time. However, this worst-case analysis deserves some practical considerations. Regarding point (i), consider the 12 standard granularities of SQL-92: second (6) through second(0),(17) minute, hour, day, month, year. They constitute just one total order of types, where each type is finer than the next. In such cases, when a set of n types can be covered by a single total order, no new types are created by glb because the glb of any set of types is just the finest one in the set. Hence the complexity of the attribute closure algorithm is bounded by the regular (nontemporal) complexity times n. In general, if k is the minimal number of total orders such that together they include all the types in the input TFDs, then the number of the types created by glb plus the original types is bounded by

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]

where [n.sub.i] is the size of the total order [T.sub.i]. Thus the complexity of the temporal attribute closure algorithm has the O([n.sub.1] x ... x [n.sub.k]) multiplication factor compared with the complexity of the regular (nontemporal) algorithm. Because we believe that the typical number of total orders is bounded by a small constant, the algorithm can be thought of as "practically polynomial." As an example, if one works with the 17 granularities: 12 standard granularities of SQL-92, and week, business-day, business-week, semester, and academic-year,(18) the maximum number of possible glb-derived types is 20 (note that this is lower than the bound given by the formula (12 + 1) x (5 + 1) - 1 - 17 = 60), far below the worst case bound 2(17) for 17 types. Regarding complexity in (ii), a polynomial version of the originally exponential BCNF decomposition algorithm is presented in Ullman. [1988, page 406]. It seems quite straightforward to apply the same technique to our temporal version, hence eliminating this source of exponentiality. To understand (iii), it is necessary to introduce a parameter [r.sub.Mu] that represents, for each type [Mu], the cardinality of a partition of its ticks such that for each type v appearing in the closure of TFDs, either all the ticks of each subset are covered by the ticks of v or none of them is covered. For example, considering the previous set of granularities, [r.sub.week] = 3, where the three subsets in the partition contain, respectively, the weeks fully contained in months, the weeks across two months but included in the same year, and the weeks across two years. The overall complexity of the TBCNF and T3NF algorithms has an exponential factor of r where r is the maximum among the [r.sub.Mu.sub.i] for each type iii appearing in the closure of TFDS. In practice, this parameter is typically very low. Consider the set of 17 granularities used previously: the greatest value for r using that set is 3. Note that this value does not necessarily increase if we add other types. The preceding analysis supports our intuition that the TBCNF and T3NF decomposition algorithms, although exponential in the general case, are practical for sufficiently rich sets of granularities. Detailed complexity analysis as well as various possible optimizations of the algorithms and practical experimentations are beyond the scope of this article.

9. CONCLUSION

In many temporal applications, constraints involving granularities exist. This article introduced temporal functional dependencies, a type of natural temporal constraint. To reduce data redundancy arising from these dependencies, temporal normal forms and the concept of lossless decomposition were given in order to provide a theoretical basis for logical database design in these applications. Also, practical tools, namely, decomposition algorithms, were given that automatically derive lossless normal-form decompositions. These normal forms and algorithms are proper extensions of the traditional normal forms and algorithms in that if all data are in one tick of a temporal type, then the temporal normal forms and their algorithms reduce to their nontemporal counterparts.

A basic structure used in the article is the temporal module. It is important to emphasize that the temporal modules are rather general. We believe that the results of the article are readily applicable to other temporal relational models, especially all models based on tuple timestamping.

Our work is set in the framework of a particular temporal type system. Because a temporal type is a monotonic mapping from positive integers to the power set of the real numbers, in this type system, the time is always positive, and every tick of a temporal type consists of an arbitrary set of real numbers. These choices, however, are not entirely inherent to the results presented in the article; they are motivated by our desire for a simpler and intuitively appealing presentation of the results. The results of this article hold for a more general definition of temporal type using all the integers to represent time ticks. In fact, the only requirement on the temporal types we used for decomposition is the following: a temporal type is a set of pairwise disjoint sets of real numbers.(19) We believe that the results of the article hold with any type system with this property.

Finally, we note that data redundancy exists in spatial data, for example, in geographic information systems due to constraints similar to TFDs. To understand the analogy, let a spatial type be defined as a static collection of spatial objects, such as states, counties, cities, and the like. Consider a spatial relation (company, Branch-Coordinates, Supplies-Contractor) with a spatial FD company [right arrow.sub.state] Supplies-Contractor, which intuitively states that within a state boundary, all branches of a company have to use the same supplies contractor. Because the same supplies contractor is repeated for each branch in the same state, we have data redundancy. We may apply a technique that is very similar to what we have developed for the temporal case, and obtain the decomposition (Company, Branch-Coordinates) and (Company, Supplies-Contractor, State).

REFERENCES

Davey, B. A. and Priestley, H. A. 1990. Introduction to Lattices and Order. Cambridge University Press, Cambridge, England.

Jensen, C. S., Snodgrass, R. T., and Soo, M. D. 1994. Extending existing dependency theory to temporal databases. Tech. Rep. R 94-2050, Dept. of Mathematics and Computer Science, Aalborg University, Denmark (Dec.). IEEE Trans. Knowl. Data Eng., (to appear).

Kabanza, F., Stevenne, J-M., and Wolper, P. 1990. Handling infinite temporal data. In Proceedings of the Ninth ACM Symposium on Principles of Database Systems (Nashville, TN, April).

Navathe, S. B., and Ahmed, R. 1989. A temporal relational model and a query language. Inf Sci. 49, 147-175.

Niezette, M., and Stevenne, J. 1992. An efficient symbolic representation of periodic time. In Proceedings of the First International Conference on Information and Knowledge Management (Baltimore, MD, Nov.).

Segev, A., and Shoshani, A. 1988. The representation of a temporal data model in the relational environment. In Proceedings of the Fourth International Conference on Statistical and Scientific Database Management (Rome, Italy, June).

Ullman, J. D. 1988. Principles of Database and Knowledge-Base Systems. Computer Science Press, Rockville, MD.

Vianu, V. 1987. Dynamic functional dependencies and database aging. JACM 34, 1, 28-59.

WANG, X. S. 1995. Algebraic query languages on temporal databases with multiple time granularities. In Proceedings of the Fourth International Conference on Information and Knowledge Management (Baltimore, MD, Nov.-Dec.).

WANG, X. S., Jajodia, S., and Subrahmanian, V. S. 1995. Temporal modules: An approach toward federated temporal databases. Inf. Sci. 82, 103-128. A preliminary version appeared in Proceedings of the 1993 ACM SIGMOD International Conference on the Management of Data, (Washington, D.C., June), 227-236.

Wiederhold, G., Jajodia, S., and Litwin, W. 1991. Dealing with granularity of time in temporal databases. In Proceedings of the Third Nordic Conference on Advanced Information Systems Engineering (Trondheim, Norway, May).

Wijsen J. 1995. Design of temporal relational databases based on dynamic and temporal functional dependencies. In Proceedings of the International Workshop on Temporal Databases (Zurich, Switzerland, Sept.).

(1) Other dependencies such as multivalued dependencies, join dependencies and tuple-generating dependencies have also been considered [Ullman 1988], but they are outside the scope of our discussion here.

(2) To represent dates, we use the notation "month/day/year."

(3) This naive approach, to the best of our knowledge, has not been proposed in the literature.

(4) More precisely, we also have to add the FDs Day [right arrow] Month and Day [right arrow] Week. The relation scheme obtained by the BCNF algorithm also contains [is less than] Day, month [is greater than] and (Day, week), which can be dropped because we assume that the relationships between these temporal types are encoded somewhere else in the database.

(5) See Jensen et al. [1994] for a brief review and a comparison of the normal forms in Navathe and Ahmed [1989] and Segev and Shoshani [1988]. (6) This TFD is implicit from the fact that Lab-Assist-Pay for a fixed Lab-Assist does not change in a month.

(7) In fact, any infinite set with a total ordering can serve as the absolute time; reals, rationals, and integers are examples.

(8) Note 1900 is not a leap year.

(9) For readability, we use dates in the form "mm/dd/yy" as the argument of [Phi], instead of positive integers that identify the ticks (i.e., days) of day.

(10) For those who are not familiar with the Armstrong's axioms, the axioms are: (i) Reflexivity: X [right arrow] Y if [subset or equal to] X, (ii) Augmentation: XW [right arrow] YW if X [right arrow] Y, and (iii) Transitivity: X [right arrow] Z if X [right arrow] Y and Y [right arrow] Z. The Armstrong's axioms are sound and complete. That is, for each set of functional dependencies, a functional dependency is logically implied by this set iff it is derived by a proof sequence using the three axioms. See Ullman [1988] for

(11) This notion of temporal superkey in terms of snapshots is closely related to the "proper temporal database" notion in Wiederhold [1991], and is identical to the concept of keys in Jensen et al. [1994].

(12) Formally, [Mu.sub.i] and v cover the same periods of time if [union.sub.j.is greater than or equal to.1] [Mu.sub.i] (j) = [union.sub.j.is less than or equal to.1] v(j).

(13) Note that if a (nontemporal) relation scheme has only two attributes, then the scheme is always in BCNF and, therefore, has no redundancy at all.

(14) For those not familiar with the nontemporal minimal cover, a minimal cover of a set F of FDs is a set [F.sub.c] of FDs that is equivalent to F and (15) The second scheme in each decomposition is obtained by further using Hour [right arror] Doctor-Shift, Nurse-Shift and the fact that shifts information is stored somewhere else in the database.

(16) A temporal type is periodic if after certain ticks, each tick is only a shift (of a fixed length) of some earlier (not necessarily the previous) tick. Formally, a temporal type [Mu] is periodic if there exist M and m such that for each k > M, [Mu](k) = {b + m * (k DIV M)l b c Al(k MOD M)). (17) second (6), . . . , second (0) stand in SQL-92 for 10- 6, . . . , 100 seconds, respectively.

(18) We consider semester as defined in terms of week, and academic-year in terms of semester.

(19) The partial order on the temporal types defined this way is given as follows: A :5 B iff VS Cz A 3S' E B such that S C S'.

ACM Transactions on Database Systems, Vol. 22, No. 2, June 1997.

Printer friendly Cite/link Email Feedback | |

Author: | Wang, X. Sean; Bettini, Claudio; Brodsky, Alexander; Jajodia, Sushil |
---|---|

Publication: | ACM Transactions on Database Systems |

Date: | Jun 1, 1997 |

Words: | 20830 |

Previous Article: | Formal query languages for secure relational databases. |

Next Article: | On the semantics of "now" in databases. |

Topics: |