Printer Friendly
The Free Library
14,717,777 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

A logical model of conceptual integrity in data integration.


Conceptual integrity is required for the result of data integration to be cohesive and sensible. Compromised conceptual integrity results in "semantic faults," which are commonly blamed for latent integration bugs. A logical model of conceptual integrity in data integration and a simple example application are presented. Unlike constructive models that attempt to prevent semantic faults, this model allows both correct and incorrect integrations to be described. Imperfect legacy systems can therefore be modeled, allowing a more formal analysis of their flaws and the possible remedies.

Key words: abstraction; data; integration; logic; semantics.

[J. Res. Natl. Inst. Stand. Technol. 108, 395-402 (2003)]

1. Introduction

In the context of software, what is traditionally called "integration" is the engineering process that creates or improves information flows between information systems designed for different purposes. What actually flows between the systems is data, but what is critical to the business process is that all of the right data flows in the right form for the receiving system, and that the receiving system and the people who use it interpret the data correctly.

The term "conceptual integrity" was popularized in Ref. [1] to refer to a kind of consistency in system architecture that allows the system to become a cohesive, sensible whole. A similar kind of conceptual integrity is required for the result of data integration to be cohesive and sensible. Compromised conceptual integrity results in "semantic faults," which are commonly blamed for latent integration bugs.

Most technical approaches to data integration fall squarely into one of two categories. There is the "global schema" category, where every schema is mapped into a common reference schema, and there is the direct translation category, where schemata are mapped directly to one another in a point-to-point fashion. Each category has widely recognized advantages and disadvantages. Among these is the efficiency argument in favor of standardization (i.e., having a standard global schema): to link n different systems directly requires [n.sup.2] - n one-directional mappings, but to link them via a global schema requires only 2n.

It is sometimes claimed that direct translation allows for better conceptual integrity on a technical level (ignoring the human factors of dealing with [n.sup.2] - n different translations) because one can translate only what is necessary for communication and ignore anything that is conflicting but irrelevant. However, after discussion of this point in the Automated Methods for Integrating Systems [2] project, it was realized that such a translation implies a certain "integration schema" which, regardless of whether it is written down or only in the mind of the integrator, is nevertheless equivalent in its impact on conceptual integrity to having used a global schema.

With that perspective, it should be possible to create an abstract model of conceptual integrity that is independent of the technical approach that is chosen for data integration. This paper documents such a model. The goal is not to provide another method for maintaining conceptual integrity, but to provide a logical model of conceptual integrity itself, capable of describing both correct and incorrect integrations resulting from whichever methodology is employed.

2. Related Work

Reference [3] contains a model that is similar in approach yet foundationally different from the one in this paper. Modal logic modal logic

Formal systems incorporating modalities such as necessity, possibility, impossibility, contingency, strict implication, and certain other closely related concepts.
 and logical properties are used to build a detailed model of identity, and the ramifications ramifications nplAuswirkungen pl  for correct and incorrect subsumption sub·sump·tion  
n.
1.
a. The act of subsuming.

b. Something subsumed.

2. Logic The minor premise of a syllogism.
 relationships are examined. Possible analogies to Ref. [3] are discussed in Sec. 7.

An alternate view on the issues discussed in this paper can be found in work having to do with context logic, which traces back to McCarthy [4], [5]. A concise discussion of the application of context logic to information integration is given in Ref. [6]; see also the "Integrating Databases" example in Ref. [7], multicontext (MC) systems as described in Ref. [8], and the context-based schema analysis in Ref. [9]. The relationship between the view of this paper and the context logic view is explored in more detail in Sec. 8.

Logic-based approaches to schema integration, e.g., Refs. [10] and [11], are constructive methods intended to maintain conceptual integrity; i.e., they assume that one works within the method when integrating schemata and that the intensional definitions constructed by the modeler are complete and logically sufficient. A loss of conceptual integrity within the model would be indicated by the presence of a logical contradiction, which would render any subsequent logical inferences meaningless. Consequently, these logic-based approaches to schema integration are not ideal for describing and analyzing potentially imperfect integrations resulting from other methodologies.

The views of class and abstraction in this paper partially reflect ideas appearing in Ref. [12].

3. Logical Notation

Belief and time are critical to integration. Integration is performed at a point in time and from a point of view. Appropriately, this paper uses symbols from temporal modal logic as well as a "doxastic" modal (pertaining per·tain  
intr.v. per·tained, per·tain·ing, per·tains
1. To have reference; relate: evidence that pertains to the accident.

2.
 to belief).

The following descriptions are quoted from Ref. [13].

* It is necessary that ...

* It is possible that ...

G It will always be the case that ...

F It will be the case that ...

H It has always been the case that ...

P It was the case that ...

Bx x believes that ...

[conjunction], [disjunction disjunction /dis·junc·tion/ (-junk´shun)
1. the act or state of being disjoined.

2. in genetics, the moving apart of bivalent chromosomes at the first anaphase of meiosis.
], and ~ are the conjunction, disjunction, and negation NEGATION. Denial. Two negations are construed to mean one affirmation. Dig. 50, 16, 137.  operators of classical logic. [equivalent to] represents logical equivalence In logic, statements p and q are logically equivalent if they have the same logical content.

Syntactically, p and q are equivalent if each can be proved from the other.
; i.e., the left hand side and the right hand side necessarily have the same truth values.

Let p, q, and r represent arbitrary logical sentences. The modals * and * relate to each other as follows.

[MATHEMATICAL EXPRESSION A group of characters or symbols representing a quantity or an operation. See arithmetic expression.  NOT REPRODUCIBLE IN ASCII ASCII or American Standard Code for Information Interchange, a set of codes used to represent letters, numbers, a few symbols, and control characters. Originally designed for teletype operations, it has found wide application in computers. ] (1)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (2)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (3)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (4)

The temporal modals have similar relations.

Fp [equivalent to] ~G~p (5)

Pp [equivalent to] ~H~p (6)

A distinction is made between material implication, represented by the symbol [contains], and strict implication, represented by [right arrow].

Material implication is the truth-functional connective connective - An operator used in logic to combine two logical formulas. See first order logic.  of classical logic.

p [contains] q [equivalent to] ~p [disjunction] q (7)

Strict implication expresses the stronger statement that the consequent necessarily follows from the antecedent ANTECEDENT. Something that goes before. In the construction of laws, agreements, and the like, reference is always to be made to the last antecedent; ad proximun antecedens fiat relatio.  (i.e., is logically entailed or true by definition) [14].

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (8)

Strict implication must not be confused with relevant implication as used in relevance logics [15] or otherwise conflated with "relevance." Relevance is not required. It is acceptable (albeit unhelpful) that a tautology tautology

In logic, a statement that cannot be denied without inconsistency. Thus, “All bachelors are either male or not male” is held to assert, with regard to anything whatsoever that is a bachelor, that it is male or it is not male.
 (a necessarily true sentence) is strictly implied by any sentence whatsoever.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (9)

The need to distinguish between material implication and strict implication arises here because of time. As new individuals are created, strict implications that are true remain true, but the truth values of material implications can change. For example: over time, more people will be born; by definition, they will all be mortal (i.e., being a person strictly implies mortality); however, the fact that all people live on Earth might not remain true (even though "being a person materially implies living on Earth" holds at the present time). If the universe of discourse were static, the distinction would be moot: if an implication happened to be true for the universe as it was, then it would suffice for all discussions about that universe.

The reader is encouraged to consult Refs. [13], [14], and [16] regarding the spectrum of modal and temporal logics that are distinguished by the axioms This is a list of axioms as that term is understood in mathematics, by Wikipedia page. In epistemology, the word axiom is understood differently; see axiom and self-evidence. Individual axioms are almost always part of a larger axiomatic system.  accepted. Reference [16] identifies a series of systems that build on the following axioms (paraphrased):

(SL = Sentential Logic) Every theorem theorem, in mathematics and logic, statement in words or symbols that can be established by means of deductive logic; it differs from an axiom in that a proof is required for its acceptance.  of classic sentential logic is a theorem.

(MP = Modus Ponens In logic, modus ponendo ponens (Latin: mode that affirms by affirming; often abbreviated MP) is a valid, simple argument form. It is a very common rule of inference, and takes the following form:

If P, then Q.
P.
) If p is a theorem, and p [contains] q is a theorem, then q is a theorem.

(Nec = "rule of necessitation") If p is a theorem, then *p is a theorem.

(*) *p is defined as ~*~p.

The above axioms are accepted here. System T is formed by adding the following two axioms, which are also accepted here.

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (10)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (11)

It follows that

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (12)

In addition, the following axioms are accepted. (N.B., since these are theorems This is a list of theorems, by Wikipedia page. See also
  • list of fundamental theorems
  • list of lemmas
  • list of conjectures
  • list of inequalities
  • list of mathematical proofs
  • list of misnamed theorems
  • Existence theorem
, then by the rule of necessitation and the definition of strict implication, their counterparts using strict implication are also theorems.)

Gp [contains] Fp (13)

Hp [contains] Pp (14)

p [contains] GPp (15)

p [contains] HFp (16)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (17)

G(p [contains] q) [contains] (Gp [contains] Gq) (18)

H(p [contains] q) [contains] (Hp [contains] Hq) (19)

4. Model

4.1 Foundation

A schema is a set of identified collections or groupings. Those collections would be called classes in an object-oriented system, tables in a relational system, concepts in a knowledge-based system (artificial intelligence) knowledge-based system - (KBS) A program for extending and/or querying a knowledge base.

The related term expert system is normally used to refer to a highly domain-specific type of KBS used for a specialised purpose such as medical diagnosis.
, etc. For readability, the word "class" will be used for a collection or grouping, and the word "individual" will be used for that which is grouped (instance, tuple (1) In a relational database, a tuple is one record (one row). See record and relational database.

(2) A set of values passed from one programming language to another application program or to a system program such as the operating system.
, etc.).

Let [alpha], [beta], [gamma], and [delta] range over classes, let a range over individuals, and let A range over properties. A Boolean model of properties is assumed. Aa is true if and only if individual a has the property A. Define [bar.A] to be the negation of A.

[bar.A]a [equivalent to] ~Aa (20)

Aa [disjunction] [bar.A]a (21)

~ (Aa [conjunction] [bar.A]a) (22)

Define N([alpha]) as the set of properties that are necessary for membership in [alpha].

A [member of] N([alpha]) [equivalent to] a [member of] [alpha] [right arrow] Aa (23)

Define O([alpha]) as the set of properties that are possible for (consistent with) membership in [alpha].

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (24)

It is assumed that one will abstain from abstain from
verb refrain from, avoid, decline, give up, stop, refuse, cease, do without, shun, renounce, eschew, leave off, keep from, forgo, withhold from, forbear, desist from, deny yourself, kick (
 defining classes that are necessarily empty (also known as "incoherent" classes).

These more intuitive theorems about N and O then follow:

a [member of] [alpha] [conjunction] A [member of] N([alpha]) [right arrow] Aa (25)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (26)

a [member of] [alpha] [conjunction] Aa [right arrow] A [member of] O([alpha]) (27)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (28)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (29)

[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII] (30)

It is possible for both a property and its negation to appear in O, with neither appearing in N.

As with * and *, it would suffice to have only N or only O, but having both allows for more intuitive formulations.

Importantly, it is assumed that membership in classes is primitive. N contains properties that are necessary for membership in a class [as stated in Eq. (25), membership in a class does strictly imply that the individual has the necessary properties], but they are not logically sufficient to determine the membership. Classes are not necessarily characterized by a set of properties: in general, A [member of] N([alpha]) [contains] Aa does not strictly imply a [member of] [alpha]. Ideally, the "intent" of a class would be reflected by the properties in N, but the extent (its membership) is what is assumed to be known.

4.2 Subsumption

N and O display a symmetry with respect to subsumption.

(a [member of] [gamma] [right arrow] a [member of] [delta]) [conjunction] A [member of] N([delta]) [right arrow] A [member of] N([gamma]) (31)

(a [member of] [gamma] [right arrow] a [member of] [delta]) [conjunction] A [member of] O([gamma]) [right arrow] A [member of] O([delta]) (32)

It follows from Eqs. (30) and (32) that a property that is necessary in a subclass In programming, to add custom processing to an existing function or subroutine by hooking into the routine at a predefined point and adding additional lines of code.

subclass - derived class
 must be consistent with the superclass In object technology, a high-level class that passes attributes and methods (data and processing) down the hierarchy to subclasses, the classes below it. Abstract superclasses are used as master structures and no objects are created for it. Concrete superclasses are used to create objects. :

(a [member of] [gamma] [right arrow] a [member of] [delta]) [conjunction] A [member of] N([gamma]) [right arrow] A [member of] O([delta]) (33)

It is not always obvious that defining a subclass has ramifications for the meaning of the superclass, but it is true nonetheless. If someone defines a subclass Six-Legged-Dog, and the subclass is not necessarily empty, it follows that having six legs is consistent with being a dog. This may greatly surprise the person who defined Dog originally, but such is the kind of detail that one needs to know in order to perform a correct integration.

4.3 Conceptual Integrity

Let S and T represent different schemata (e.g., the data models implemented in two separate software systems). S and T do not share individuals or classes; however, they are discussed in terms of the same properties, all of which are within the same logical context.

For [alpha] of S and [beta] of T, the simplest form of integration is a partial "instance map" from members of [alpha] to members of [beta]. Let M(a) for a [member of] [alpha] represent the analog of a (its image under M, if such exists) in [beta]. To maintain conceptual integrity, the following condition must hold for all A:

a [member of] [alpha] [conjunction] M(a) [member of] [beta] [conjunction] Aa [contains] A [member of] O([beta]) (34)

To paraphrase: if an individual with a given property is mapped to an analog in [beta], then that property must be consistent with membership in [beta]. It is not necessary that the analog possess that property if the negation of that property is also in O[beta]; nor is it necessary that every individual in [alpha] have an analog.

5. On Abstraction

Data models as we know them are abstractions, and so are the mental models of the people who construct them. By definition, an abstraction of a thing or event is not identical to the thing or event itself and does not have all of its properties. Moreover, any documented model is at best an approximate expression of a mental model [17], and different data modelers think about different properties even when they believe that they are modeling the same thing. These differences can lead to a wide variety of conflicts [9], [18], [19].

Every thing or event has an unbounded set of properties. A data modeler tries to settle on a finite set In mathematics, a set is called finite if there is a bijection between the set and some set of the form where n is a natural number. (The value n = 0 is allowed; that is, the empty set is finite.) An infinite set is a set which is not finite.  of properties that suffices for a particular application. But when two applications are integrated, the properties that were captured in documented models may no longer suffice.

Consider the acquisition by a leading manufacturer of 100% recycled content corrugated cor·ru·gate  
v. cor·ru·gat·ed, cor·ru·gat·ing, cor·ru·gates

v.tr.
To shape into folds or parallel and alternating ridges and grooves.

v.intr.
 boxes of a relatively obscure company that makes biodegradable biodegradable /bio·de·grad·a·ble/ (-de-grad´ah-b'l) susceptible of degradation by biological processes, as by bacterial or other enzymatic action.

bi·o·de·grad·a·ble
adj.
 bubble wrap bubble wrap
n.
See bubble pack.


bubble wrap
Noun

a type of polythene wrapping containing many small air pockets, used to protect breakable goods
. The box company has a technically superior customer database, but the bubble wrap company has some specialized applications integrated with its own database that would be expensive to change. So it is decided to use the box company's database as the primary one and just replicate the data in the other database for the sake of the specialized applications. This seems to work, and environmentally conscientious mail-order operations the world over rejoice that they can now obtain boxes and bubble wrap through the far-reaching distribution network of the former box company. Then disaster strikes. The box company's best customer, the John Q. Fictional Company of Hanover, calls to complain that the bubble wrap they ordered never arrived. Investigation reveals that the order in question was shipped to the John Q. Fictional Company of Anchorage, a new customer who had simply ordered a small number of corrugated boxes. It turns out that one of the bubble wrap applications was written to key by company name, so it retrieved the wrong John Q. Fictional Company from the merged database and propagated the error.

It is important to understand that the box company's database was not "wrong" to allow two companies to use the same name. Prior to the integration, it made no difference. The box company's applications did not rely on names being unique. Neither was it "wrong" for the bubble wrap application to key by company name. Prior to the integration, company names were unique within the bubble wrap customer base. The problem was created by the integration.

Abstractions themselves have abstractions, and these are not immune to integration faults. For example, a common abstraction of time-of-day (itself an abstraction) constrains seconds to range between 0 and 59. An artifact A distortion in an image or sound caused by a limitation or malfunction in the hardware or software. Artifacts may or may not be easily detectable. Under intense inspection, one might find artifacts all the time, but a few pixels out of balance or a few milliseconds of abnormal sound  that embodied this assumption might integrate successfully with many applications and operate for years without failure. However, cognizant data modelers are aware that an extra second--a "leap second leap second - Coordinated Universal Time "--is occasionally inserted into the Coordinated Universal Time See UTC.

(time, standard) Coordinated Universal Time - (UTC, World Time) The standard time common to every place in the world. UTC is derived from International Atomic Time (TAI) by the addition of a whole number of "leap seconds" to synchronise it with Universal Time 1
 (UTC (Coordinated Universal Time, Temps Universel Coordonné) The international time standard (formerly Greenwich Mean Time, or GMT). Zero hours UTC is midnight in Greenwich, England, which is located at 0 degrees longitude. ) time scale to keep it within [+ or -]0.9 s of the Universal Time (UT1) astronomical time mean solar time reckoned by counting the hours continuously up to twenty-four from one noon to the next.

See also: Time
 scale [20]. The time-of-day corresponding to the leap second is represented as 23:59:60. So if the artifact that constrains seconds to the range 0 to 59 is integrated with any that propagate prop·a·gate
v.
1. To cause an organism to multiply or breed.

2. To breed offspring.

3. To transmit characteristics from one generation to another.

4.
 leap seconds, it might fail all of a sudden one New Year's.

For any given abstraction, it is possible to construct an integration scenario in which a failure will occur because of some property that was not explicitly modeled. The need for the abstractions of S (e.g., customer name according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 the box company) to take an explicit stance with respect to properties that are relevant in T (e.g., uniquely identifying a customer) only arises when integration is attempted. Yet by virtue of numerous undocumented and/or un-thought-about implementation details, any realizations of these abstractions in engineered artifacts artifacts

see specimen artifacts.
 such as software implicitly take stances with respect to all properties. Simplistically, one could say that when confronted with new properties, either they work or they don't.

6. Semantic Faults

"Semantic fault" is an informal term that can now be understood formally to mean a violation of the condition expressed in Eq. (34).

This section demonstrates how the semantic fault stories of Sec. 5 can be formalized for·mal·ize  
tr.v. for·mal·ized, for·mal·iz·ing, for·mal·iz·es
1. To give a definite form or shape to.

2.
a. To make formal.

b.
. However, it is not necessarily the case that all semantic faults would emerge in exactly the same way.

Logical statements below describe the behaviors of the engineered artifacts as built unless preceded by the doxastic qualifier Bi (signifying a belief of the integrator, i).

Consider the following:

A [member of] O([beta]) (35)

a [member of] [alpha] [contains] Aa (36)

Bi(a [member of] [alpha] [right arrow] Aa]) [disjunction] BiG(a [member of] [alpha] [contains] Aa) [disjunction] Bi([bar.A] [member of] O([beta])) (37)

The integrator builds a complete mapping from [alpha] to [beta],

a [member of] [alpha] [contains] M(a) [member of] [beta] (38)

and the integrated system functions normally. Now assume that at some future time, individual x will be born such that

F(x [member of] [alpha] [conjunction] [bar.A]x) (39)

Assuming that a [member of] [alpha] [contains] M(a) [member of] [beta] remains true, conceptual integrity, Eq. (34), will then demand

F([bar.A] [member of] O([beta])) (40)

which is not guaranteed. If the behavior of the engineered artifact is instead described by A [member of] N([beta]), then G([bar.A] [??] O([beta])); the condition of Eq. (34) will be violated, and there will be a semantic fault.

With the bindings shown in Table 1, the above models the examples described in Sec. 5. In the first example, individual x is the customer name "John Q. Fictional, Inc.;" conceptual integrity fails because that name is associated with more than one customer, which is inconsistent with the customer name class as projected from the bubble wrap application. In the second example, individual x is the time-of-day value 23:59:60; conceptual integrity fails because that time-of-day has seconds outside the range 0 ... 59, which is inconsistent with the time-of-day class as represented in the failing application.

7. Analogies to Ref. [3]

Reference [3] defines essential, rigid, non-rigid, and anti-rigid as properties of properties. The definitions are made in terms of properties, individuals, and instances of properties (i.e., individuals that have that property). Classes as such are subsumed by properties that completely characterize them.

* A property is essential to an individual if and only if it necessarily holds for that individual at every possible time in every possible world.

* A property is rigid if and only if, necessarily, it is essential to all of its instances.

* A property is non-rigid if and only if it is not rigid.

* A property is anti-rigid if and only if it is not essential to any of its instances.

The concern whether a property is essential to an individual is different from the concern whether a property is necessary for membership in a class. These two concerns may become inextricable in·ex·tri·ca·ble  
adj.
1.
a. So intricate or entangled as to make escape impossible: an inextricable maze; an inextricable web of deceit.

b.
 when classes are defined intensionally (when the possession of a given set of properties strictly implies class membership), but they do not when class membership is primitive. This divergence makes it difficult to construct valid analogies between the content of this paper and that of Ref. [3], despite apparent similarities.

Returning to the definitions of Sec. 4.1, one could draw limited, perhaps strained, analogies. Given a class [alpha] and a property A, one could say that A is rigid within [alpha] if A [member of] N([alpha]), non-rigid if [bar.A] [member of] O([alpha]). But class-centered analogs to essential and anti-rigid would require an intensional (philosophy) intensional - A description of properties, e.g. intensional equality, that relate to how an object is implemented as opposed to extensional properties which concern only how its output depends on its input.  viewpoint.

8. Relationship to Context Logic

In works about context logic it is common to use the notation ist(c, p) to signify that proposition p is true in the context c [5]. That convention is adopted here.

Context is broadly interpreted and can be used in lieu of many specialized modals. One can identify contexts corresponding to spans of time, a particular person's beliefs, etc.

In the case of data integration, it is natural to identify contexts corresponding to the schemata being integrated and then make assertions about what is true of various classes in those contexts. For example, if [C.sub.1] is the context of a leap seconds cognizant time service, [C.sub.2] is the context of some application, and [tau] is "the" time-of-day class, then one would write the following, or something equivalent:

ist([C.sub.1], SecondsMayExceed59([tau])) (41)

ist([C.sub.2], ~SecondsMayExceed59([tau])) (42)

"The" time-of-day class is an abstraction inherited from a common context, such as a global schema. Its specializations in contexts [C.sub.1] and [C.sub.2] disagree with Verb 1. disagree with - not be very easily digestible; "Spicy food disagrees with some people"
hurt - give trouble or pain to; "This exercise will hurt your back"
 respect to the predicate In programming, a statement that evaluates an expression and provides a true or false answer based on the condition of the data.  SecondsMayExceed59. If the reference to a common context is eliminated, then there is no basis for discussion of whether the classes in [C.sub.1] and [C.sub.2] are compatible.

The model presented in this paper does not require that classes from a common context be made explicit. It does rely on the assumption that properties have equivalent meanings in the contexts of the systems being integrated. However, this is analogous to the assumption that predicates such as SecondsMayExceed59 have the same meaning in multiple contexts.

Of course, there is nothing to prevent one from making logical statements about predicates in different contexts; e.g.,

ist([C.sub.2], ValidTimestamp(a) [contains] Seconds(a) <60) (43)

But the problem repeats itself. Unless Seconds has a common interpretation, nothing has been gained by contextualizing ValidTimestamp.

Ultimately, to make comparisons between two contexts, it is necessary to have some common vocabulary with which to conduct the discussion. The problem can be moved around but cannot be eliminated. As always, "there is no silver bullet No Silver Bullet - essence and accidents of software engineering is a well-known paper on software engineering written by Fred Brooks in 1986. Brooks argues that there will be no more technologies or practices that will serve as "silver bullets" and create a twofold " [1], but a change in viewpoint can sometimes help. The goal is to move the problem to where it causes the least amount of damage.

9. Conclusion

A logical model of conceptual integrity in data integration and a simple example application have been presented. Unlike constructive models that attempt to prevent semantic faults, this model allows both correct and incorrect integrations to be described. Imperfect legacy systems can therefore be modeled, allowing a more formal analysis of their flaws and the possible remedies.

Future work to extend the model could focus on better treatment of several issues that were glossed over or minimized.

* The important temporal dimension of conceptual integrity could be explored in more detail and modeled more precisely.

* The abstractions implicit in Adj. 1. implicit in - in the nature of something though not readily apparent; "shortcomings inherent in our approach"; "an underlying meaning"
underlying, inherent
 the act of integration (pieces of an implicit "integration schema") could be analyzed. A partial mapping from members of [alpha] to members of [beta] suggests an abstraction from [alpha] and [beta] that describes that part of the population that is "interesting" for the integration. A variant of formal concept analysis [21] may be applicable, as may currently evolving work on describing relations between ontologies [22].

* "Fuzzy" properties (i.e., where Aa is neither entirely true nor false, or is not known with certainty to be true--the different interpretations have different ramifications) could be explored. Additional analysis is needed to determine whether they add value. An infinite set (mathematics) infinite set - A set with an infinite number of elements. There are several possible definitions, e.g.

(i) ("Dedekind infinite") A set X is infinite if there exists a bijection (one-to-one mapping) between X and some proper subset of X.
 of Boolean properties may render fuzzy properties redundant: if Aa is only "sort of" true, then it may be possible to derive a narrower "sub-property" that is fully true and another one that is fully false. On the other hand, it would be ill-advised to accept philosophical vague properties [23], [24], which defy objective evaluation.
Table 1. Bindings for Sec. 5 examples

                        Boxes                          Time

[alpha]  Customer name as projected from box   Time-of-day as
         company's database                    delivered by leap
                                               seconds cognizant time
                                               service

[beta]   Customer name as projected from       Time-of-day as
         bubble wrap application               represented in some
                                               application

A        Associated with exactly one customer  Has seconds in range 0
                                               ... 59

[bar.A]  Not associated with exactly one       Does not have seconds
         customer                              in range 0 ... 59


Acknowledgments

The author thanks all those whose reviews and suggestions have improved this paper, including Edward Barkmeyer, Conrad Bock Noun 1. bock - a very strong lager traditionally brewed in the fall and aged through the winter for consumption in the spring
bock beer

lager beer, lager - a general term for beer made with bottom fermenting yeast (usually by decoction mashing); originally
, Peter Denno, Allison Barnard Feeney, Simon Frechette, Michael Gruninger, Nenad Ivezic, Sharon Kemmerer, Donald Libes, Leo Leo, in astronomy
Leo [Lat.,=the lion], northern constellation lying S of Ursa Major and on the ecliptic (apparent path of the sun through the heavens) between Cancer and Virgo; it is one of the constellations of the zodiac.
 Obrst, Steve Ray Steve Ray is a radio broadcaster and actor from the United States. Born in Baltimore, Maryland and raised in Washington D.C. he spent a majority of his career in the Los Angeles area before returning to Washington. , Michelle Steves, and Evan Wallace.

Accepted: November 4, 2003

Available online: http://www.nist.gov/jres

10. References

[1] F. P. Brooks, Jr., The Mythical Man-Month See Brook's law and estimating a programming job. : Essays on Software Engineering, 20th anniversary edition, Addison-Wesley (1995).

[2] E. J. Barkmeyer, A. B. Feeney, P. Denno, D. W. Flater, D. E. Libes, M. P. Steves, and E. K. Wallace, Concepts for Automating Systems Integration, NISTIR NISTIR National Institute of Standards and Technology Interagency Report
NISTIR National Institute of Standards and Technology Internal Report
 6928, National Institute of Standards and Technology National Institute of Standards and Technology, governmental agency within the U.S. Dept. of Commerce with the mission of "working with industry to develop and apply technology, measurements, and standards" in the national interest.  (2003).

[3] N. Guarino and C. Welty, Identity and Subsumption, LADSEBCNR Internal Report 01/2001, August 7, 2001. Available at http://www.ladseb.pd.cnr.it/infor/Ontology/Papers/Identity&Subsumption.pdf.

[4] J. McCarthy, Generality in artificial intelligence, Commun. ACM (Association for Computing Machinery, New York, www.acm.org) A membership organization founded in 1947 dedicated to advancing the arts and sciences of information processing. In addition to awards and publications, ACM also maintains special interest groups (SIGs) in the computer field.  30 (12), 1030-1035 (1987). Also available at http://citeseer.nj.nec.com/mccarthy87generality.html.

[5] J. McCarthy and S. Buvac, Formalizing context (expanded notes), in Working Papers working papers
pl.n.
Legal documents certifying the right to employment of a minor or alien.

Noun 1. working papers
 of the AAAI AAAI American Association for Artificial Intelligence
AAAI Association for the Advancement of Artificial Intelligence (Menlo Park, California)
AAAI American Academy of Allergy, Asthma, and Immunology
 Fall Symposium on Context in Knowledge Representation and Natural Language (1997) pp. 99-135. Also available in A. Aliseda, R. J. van Glabbeek, and D. Westerstahl, eds., Computing Natural Language, CSLI CSLI Center for the Study of Language and Information
CSLI Civil Society and Local Initiatives
 Lecture Notes 81, Stanford University Stanford University, at Stanford, Calif.; coeducational; chartered 1885, opened 1891 as Leland Stanford Junior Univ. (still the legal name). The original campus was designed by Frederick Law Olmsted. David Starr Jordan was its first president.  (1998), pp. 13-50, and at http://citeseer.nj.nec.com/mccarthy97formalizing.html.

[6] A. Farquhar, A. Dappert, R. Fikes, and W. Pratt, Integrating information sources using context logic, in Proceedings of the AAAI Spring Symposium on Information Gathering from Distributed Heterogeneous Environments (1995). Also available as technical report KSL-95-12, Knowledge Systems Laboratory, Stanford University (1995) and at http://citeseer.nj.nec.com/farquhar95integrating.html.

[7] R. V. Guha, Contexts: a formalization for·mal·ize  
tr.v. for·mal·ized, for·mal·iz·ing, for·mal·iz·es
1. To give a definite form or shape to.

2.
a. To make formal.

b.
 and some applications, doctoral dissertation, Stanford University (1991). Also available at http://www-formal.stanford.edu/buvac/guha-thesis.ps.

[8] C. Ghidini and F. Giunchiglia, Local models semantics, or contextual reasoning = locality + compatibility, Artificial Intelligence 127 (2), 221-259 (2001). Also available at http://dit.unitn.it/~fausto/ps/GG97b.ps.

[9] V. Kashyap and A. Sheth, Semantic and schematic similarities between database objects: a context-based approach, Very Large Databases J. 5 (4), 276-304 (1996). Also available at http://lsdis.cs.uga.edu/lib/download/KS95b.pdf.

[10] L. V. S. Lakshmanan, F. Sadri, and I. N. Subramanian, On the logical foundations of schema integration and evolution in heterogeneous database systems A Heterogeneous Database System is an automated (or semi-automated) system for the integration of heterogeneous, disparate database management systems to present a user with a single, unified query interface. , in Lecture Notes in Computer Science Lecture Notes in Computer Science (LNCS) is a computer science series published by Springer Science+Business Media.  #760, Proceedings of the 3rd International Conference on Deductive de·duc·tive  
adj.
1. Of or based on deduction.

2. Involving or using deduction in reasoning.



de·duc
 and Object-oriented Databases (DOOD DOOD Deductive Object-Oriented Database
DOOD Deductive and Object Oriented Databases
 '93), S. Ceri, K. Tanaka, and S. Tser, eds., Springer-Verlag (1993) pp. 81-100. Also available at http://citeseer.nj.nec.com/lakshmanan93logical.html.

[11] P. Johannesson, A logic based approach to schema integration, in Proceedings of the 10th International Conference on Entity-Relationship Approach, T. Teorey, ed., North-Holland (1991) pp. 280-292. Also available at http://www.dsv.su.se/~pajo/abstracts/er91.html.

[12] A. Korzybski, Science and Sanity: An Introduction to Non-Aristotelian Systems and General Semantics gen·er·al semantics  
n. (used with a sing. verb)
A discipline developed by Alfred Korzybski that proposes to improve human behavioral responses through a more critical use of words and symbols.
, 5th edition, Institute of General Semantics The Institute of General Semantics is a not-for-profit corporation established in 1938 by Alfred Korzybski, located in Fort Worth, Texas. Its membership roles include members from 30 different countries.  (1994).

[13] E. N. Zalta, ed., Stanford Encyclopedia of Philosophy The Stanford Encyclopedia of Philosophy (SEP) is a freely-accessible online encyclopedia of philosophy maintained by Stanford University. The SEP was initially developed with U.S. public funding from the NEH and NSF. , http://plato.stanford.edu/ (2003).

[14] H. Kahane, Logic and Philosophy, 5th edition, Wadsworth Publishing Company, Belmont, California Belmont is a city in San Mateo County, California, United States. It is a small suburb in the San Francisco Bay Area, located half-way down the San Francisco Peninsula between San Mateo and San Carlos. The population was 25,123 at the 2000 census.  (1986).

[15] A. R. Anderson, N. D. Belnap, and J. M. Dunn, Entailment: The Logic of Relevance and Necessity, Princeton University Princeton University, at Princeton, N.J.; coeducational; chartered 1746, opened 1747, rechartered 1748, called the College of New Jersey until 1896. Schools and Research Facilities
 Press, Princeton, New Jersey
See also: Princeton Township, New Jersey

Princeton, New Jersey is located in Mercer County, New Jersey, United States. Princeton University has been sited in the town since 1756.
 (1992).

[16] G. Hardegree, Introduction to Modal Logics, http://www-unix.oit.umass.edu/~gmhwww/511/text.htm (2003).

[17] N. Guarino and A. Persidis, Onto Web Deliverable 3.5: Evaluation Framework for Content Standards, http://ontoweb.aifb.uni-karlsruhe.de/Members/ruben/Deliverable%203.5 (2003).

[18] W. Kim, I. Choi, S. Gala, and M. Scheevel, On resolving schematic heterogeneity in multidatabase systems, Distributed Parallel Databases 1, 251-279 (1993).

[19] C. Naiman and A. Ouksel, A classification of semantic conflicts in heterogeneous database systems, J. Organizational Comput. 5 (2), 167-193 (1995).

[20] Leap second and UT1-UTC information, NIST Time Scale Data Archive, http://www.boulder.nist.gov/timefreq/pubs/bulletin/leapsecond.htm (2003).

[21] B. Ganter and R. Wille, Formal Concept Analysis: Mathematical Foundations, Springer-Verlag, Berlin, Heidelberg, and New York New York, state, United States
New York, Middle Atlantic state of the United States. It is bordered by Vermont, Massachusetts, Connecticut, and the Atlantic Ocean (E), New Jersey and Pennsylvania (S), Lakes Erie and Ontario and the Canadian province of
 (1999).

[22] Standard Upper Ontology Standard upper ontology (SUO) is a IEEE P1600.1 term for a near-universal upper ontology (or foundation ontology).

The following ontologies are now competing to be used as the foundation for standard:
  • IFF Foundation Ontology
 Information Flow Framework, starter document, IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields.  P1600.1 Standard Upper Ontology Working Group, http://suo.ieee.org/IFF/ (2003).

[23] T. Williamson, Vagueness, Routledge, London and New York (1996).

[24] R. Keefe and P. Smith, eds., Vagueness: A Reader, MIT MIT - Massachusetts Institute of Technology  Press, Cambridge, Massachusetts This article is about the city of Cambridge in Massachusetts. For the English university town, see Cambridge, England. For other places, see Cambridge (disambiguation).
Cambridge, Massachusetts is a city in the Greater Boston area of Massachusetts, United States.
 (1999).

David Flater

National Institute of Standards and Technology, Gaithersburg, MD 20899-8264

david.flater@nist.gov

About the author: David Flater is a Computer Scientist in the Manufacturing Systems Integration Division of the NIST Manufacturing Engineering Manufacturing engineering

Engineering activities involved in the creation and operation of the technical and economic processes that convert raw materials, energy, and purchased items into components for sale to other manufacturers or into end products for
 Laboratory. The National Institute of Standards and Technology is an agency of the Technology Administration, U.S. Department of Commerce.
COPYRIGHT 2003 National Institute of Standards and Technology
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Author:Flater, David
Publication:Journal of Research of the National Institute of Standards and Technology
Date:Sep 1, 2003
Words:4959
Previous Article:Optical-fiber power meter comparison between NIST and PTB.
Next Article:NIST establishes reference for measuring frequency dependence of capacitors.(News Briefs)



Related Articles
Conceptual Foundations of Occupational Therapy.
Salvaging Information Engineering Techniques In A Data Warehouse Environment.(Industry Trend or Event)
Relational database failing internet: Simon Williams Lazy Software. (Database & Network News and Products).
MERCATOR RELEASES XML SCHEMA IMPORTER VERSION 6.5.(Product Announcement)
System Architect V 8.5. (Products).
Technology in the mathematics classroom: conceptual orientation.
The marriage of physical and logical access: unifying the keys to the kingdom.(Disaster Recovery & Backup/Restore)
Introduction.(Editorial)
Referential integration: an emotional information processing perspective on the process of integration.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles