Printer Friendly
The Free Library
22,695,004 articles and books

Information modeling and relational databases.

Chapter 1. Information Modeling

It's an unfortunate fact of life that names and numbers can sometimes be misinterpreted.

This can prove costly, as experienced by senior citizens who had their social security benefits cut off when government agencies incorrectly pronounced them dead because of misreading MISREADING, contracts. When a deed is read falsely to an illiterate or blind man, who is a party to it, such false reading amounts to a fraud, because the contract never had the assent of both parties. 5 Co. 19; 6 East, R. 309; Dane's Ab. c. 86, a, 3, Sec. 7; 2 John. R. 404; 12 John. R.  "DOD (1) (Dial On Demand) A feature that allows a device to automatically dial a telephone number. For example, an ISDN router with dial on demand will automatically dial up the ISP when it senses IP traffic destined for the Internet. " on hospital forms as "date of death" rather than the intended "date of discharge".

A more costly incident occurred in 1999 when NASA's $125 million Mars Climate Orbiter The Mars Climate Orbiter (formerly the Mars Surveyor '98 Orbiter) was one of two spacecraft in the Mars Surveyor '98 program, the other being the Mars Polar Lander (formerly the Mars Surveyor '98 Lander).  burnt up in the Martian atmosphere. Apparently, errors in its course settings arose from a failure to make a simple unit conversion. One team worked in U.S. Customary units and sent its data to a second team working in metric, but no conversion was made. If a man weighs 180, does he need to go on a drastic diet? No if his mass is 180 lb, but yes if it's 180 kg. Data by itself is not enough. What we really need is information, the meaning or semantics semantics [Gr.,=significant] in general, the study of the relationship between words and meanings. The empirical study of word meanings and sentence meanings in existing languages is a branch of linguistics; the abstract study of meaning in relation to language or  of the data. Since computers lack common sense, we need to pay special attention to semantics when we use computers to model some aspect of reality.

This book provides a modern introduction to database systems, with the emphasis on information modeling. At its heart is a very high level semantic approach that is fact-oriented in nature. If you model databases using either traditional or object oriented See object technology and object-oriented programming.  approaches, you'll find that fact orientation lifts your thinking to a higher level, illuminating il·lu·mi·nate  
v. il·lu·mi·nat·ed, il·lu·mi·nat·ing, il·lu·mi·nates

v.tr.
1. To provide or brighten with light.

2. To decorate or hang with lights.

3.
 your current way of doing things. Even if you're a programmer rather than a database modeler, this semantic approach provides a natural and powerful way to design your data structures.

A database is basically a collection of related data (e.g., a company's personnel records). When interpreted by humans, a database may be viewed as a set of related facts--an information base. In the context of our semantic approach, we'll often use the popular term "database" instead of the more technical "information base". Discovering the kinds of facts that underlie a business domain, and the rules that apply to the facts, is interesting and revealing. The quality of the database design used to capture these facts and rules is critical. Just as a house built from a good architectural plan is more likely to be safe and convenient for living, a well-designed database simplifies the task of ensuring that its facts are correct and easy to access. Let's review some basic ideas about database systems, and then see how things can go wrong if they are poorly designed.

Each database models a business domain--we use this term to describe any area of interest, typically a part of the real world. Consider a library database. As changes occur in the library (e.g., a book is borrowed) the database is updated to reflect these changes. This task could be performed manually using a card catalog catalog, descriptive list, on cards or in a book, of the contents of a library. Assurbanipal's library at Nineveh was cataloged on shelves of slate. The first known subject catalog was compiled by Callimachus at the Alexandrian Library in the 3d cent. B.C. , or be automated with an online catalog Similar to an online library or databases in the information storage respect, ‘’’online catalogs’’’ allow potential customers to browse a company’s items for sale from a different location using the internet. , or both. Our focus is on automated databases. Sometimes these are implemented by means of special-purpose computer A computer designed from scratch to perform a specific function. Contrast with general-purpose computer.  programs, coded in a general purpose programming language (e.g., C#). More often, database applications are developed using a database management system (DBMS (DataBase Management System) Software that controls the organization, storage, retrieval, security and integrity of data in a database. It accepts requests from the application and instructs the operating system to transfer the appropriate data. ). This is a software system for maintaining databases and answering queries about them (e.g., DB2, Oracle, SQL Server An earlier relational DBMS from Sybase and from Microsoft. Sybase introduced SQL Server in 1988 for various Unix versions. In that same year, with help from IBM, Sybase created an OS/2 version that Microsoft licensed and branded as Microsoft SQL Server. ). The same DBMS may handle many different databases.

Typical applications use a database to house the persistent data Data that exists from session to session. Persistent data are stored in a database on disk or tape. Contrast with transient data. See persistent name. , an in-memory object model to hold transient data Data that is created within an application session. At the end of the session, it is discarded or reset back to its default and not stored in a database. Contrast with persistent data. , and a friendly user interface for users to enter and access data. All these structures deal with information and are best derived from an information model that clearly reveals the underlying semantics of the domain. Some tools can use information models to automatically generate not just databases, but also object models and user interfaces.

If an application requires maintenance and retrieval of lots of data, a DBMS offers many advantages over manual record keeping. Data may be conveniently captured via electronic interfaces (e.g., screen forms), then quickly processed and stored compactly on disk. Many data errors can be detected automatically, and access rights to data can be enforced by the system. People can spend more time on creative design rather than on routine tasks more suited to computers. Finally, developing and documenting the application software can be facilitated by use of computer-assisted software engineering Computer-Assisted Software Engineering - Computer-Aided Software Engineering  (CASE) tool support.

In terms of the dominant employment group, the Agricultural Age was supplanted late in the 19th century by the Industrial Age, which is now replaced by the Information Age. With the ongoing information explosion and mechanization mechanization

Use of machines, either wholly or in part, to replace human or animal labour. Unlike automation, which may not depend at all on a human operator, mechanization requires human participation to provide information or instruction.
 of industry, the proportion of information workers is steadily rising. Most businesses achieve significant productivity gains by exploiting information technology. Imagine how long a newspaper firm would last if it returned to the methods used before word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and  and computerized typesetting typesetting: see printing.
typesetting

Setting of type for use in any of various printing processes. Type for printing, using woodblocks, was invented in China in the 11th century, and movable type using metal molds had appeared in Korea by the 13th
. Apart from its enabling employment opportunities, the ability to interact efficiently with information systems empowers us to exploit their information content.

Although most employees need to be familiar with information technology, there are vast differences in the amount and complexity of information management tasks required of these workers. Originally, most technical computer work was performed by computer specialists such as programmers and systems analysts. However, the advent of user-friendly software and powerful, inexpensive personal computers led to a redistribution of computing power. End users now commonly perform many information management tasks, such as spreadsheeting, with minimal reliance on professional computer experts.

This trend toward more users "driving" their own computer systems rather than relying on expert "chauffeurs" does not eliminate the need for computer specialists. There is still a need for programming in languages such as C# and Java. However, there is an increasing demand for high level skills such as modeling complex information systems.

The area of information systems engineering includes subdisciplines such as requirements analysis (project) requirements analysis - The process of reviewing a business's processes to determine the business needs and functional requirements that a system must meet. , database design, user interface design, and report writing. In one way or another, all these subareas deal with information. Since the database design phase selects the underlying structures to capture the relevant information, it is of central importance.

To highlight the need for good database design, let's consider the task of designing a database to store movie details such as those shown in Table 1.1. The header of this table is shaded to help distinguish it from the rows of data. Even if the header is not shaded, we do not count it as a table row. The first row of data is fictitious Based upon a fabrication or pretense.

A fictitious name is an assumed name that differs from an individual's actual name. A fictitious action is a lawsuit brought not for the adjudication of an actual controversy between the parties but merely for the purpose of
.

Different movies may have the same title (e.g., The Secret Garden). Hence movie numbers are used to provide a simple identifier. We interpret the data in terms of facts.

For example, movie 5 has the title The DaVinci Code, was released in 2006, was directed by Run Howard, and starred Tom Hanks Noun 1. Tom Hanks - United States film actor (born in 1956)
Hanks, Thomas J. Hanks
, Ian McKellen, and Audrey Tautou. Movie 1, titled Cosmology cosmology, area of science that aims at a comprehensive theory of the structure and evolution of the entire physical universe. Modern Cosmological Theories
, had no stars (it is a documentary). This table is an output report. It provides one way to view the data. This might not be the same as how the data is actually stored in a database.

In Table 1.1 each cell (row--column slot) may contain many values. For example, Movie 3 has two stars recorded in the row 3, column 5 cell. Some databases allow a cell to contain many values like this, but in a relational database relational database

Database in which all data are represented in tabular form. The description of a particular entity is provided by the set of its attribute values, stored as one row or record of the table, called a tuple.
 each table cell may hold at most one value. Since relational database systems are dominant in the industry, our implementation discussion focuses on them. How can we design a relational database to store these facts?

Suppose we use the structure shown in Table 1.2. This has one entry in each cell. Here, "?" denotes a null A character that is all 0 bits. Also written as "NUL," it is the first character in the ASCII and EBCDIC data codes. In hex, it displays and prints as 00; in decimal, it may appear as a single zero in a chart of codes, but displays and prints as a blank space.  (no star is recorded for Cosmology). Some DBMSs display nulls differently (e.g., "<NULL>" or a blank space Noun 1. blank space - a blank area; "write your name in the space provided"
space, place

surface area, expanse, area - the extent of a 2-dimensional surface enclosed within a boundary; "the area of a rectangle"; "it was about 500 square feet in area"
). To help distinguish the rows, we've included lines between them. But from now on, we'll omit o·mit  
tr.v. o·mit·ted, o·mit·ting, o·mits
1. To fail to include or mention; leave out: omit a word.

2.
a. To pass over; neglect.

b.
 lines between rows. Each relational table must be named. Here we called the table "Movie". See if you can spot the problem with this design before reading on.

The table contains redundant information. For example, the facts that movie 5 is titled The DaVinci Code, was released in 2006, and was directed by Ron Howard are shown three times (once for each star). We might try to fix this by deleting the extra copies in the movieTitle, releaseYr, and director columns, but this artificially makes some rows special and introduces problems with nulls.

In addition to wasting space, the Table 1.2 design can lead to errors. For example, there is nothing to stop us adding a row for movie 2 with a different title (e.g., Kung Fun), a different release year, a different director, and another star. Our database would then be inconsistent with the business domain, where a movie has only one title and release year, and only one director is to be recorded (1). The corrected design uses two relational tables, Movie and Starred (Figure 1.1). The table design is shown in schematic A graphical representation of a system. It often refers to electronic circuits on a printed circuit board or in an integrated circuit (chip). See logic gate and HDL.  form above the populated pop·u·late  
tr.v. pop·u·lat·ed, pop·u·lat·ing, pop·u·lates
1. To supply with inhabitants, as by colonization; people.

2.
 tables. In this example, a movie may be identified either by its movie number or by the combination of its title, release year, and director. In database terminology, each of these identifiers provides a candidate key for the Movie table, shown here by underlining un·der·lin·ing  
n.
1. The act of drawing a line under; underscoring.

2. Emphasis or stress, as in instruction or argument.
 each identifier.

In this case, we chose movieNr as the primary way to identify movies throughout the database. This is shown here by doubly underlining the movieNr column to indicate that it is the primary key of the Movie table. If a table has only one candidate key, a single underline underline

an animal's ventral profile; the shape of the belly when viewed from the side, e.g. pendulous, pot-belly, tucked up, gaunt.
 denotes the primary key.

The constraints that each movie has only one title, release year, and director are enforced by checking that each movie number occurs only once in the Movie table. The constraints that each movie must have a title, release year, and director are enforced by checking that all movies occur in the Movie table and excluding nulls from the title, release year, and director columns. In the schema, this is captured by the totted arrow (indicating that if a movie is listed in the Starred table it must be listed in the Movie table) and by not marking any columns as optional. In relational database terms, this arrow depicts a foreign key constraint, where the movieNr column in the Starred table is a foreign key referencing the primary key of the Movie table. The primary key of the Starred table is the combination of its columns, indicated here by underlining.

These concepts and notations are fully explained later in the book. Even with this simple example, care is needed for database design. With complex cases, the design problem is much more challenging. The rest of this book is largely concerned with helping you to meet such challenges. Designing databases is both a science and an art. When supported by a good method, this design process is a stimulating and intellectually satisfying activity, with tangible benefits gained from the quality of the database applications produced. The next section explains why Object-Role Modeling (ORM ORM - Object Role Modeling ) is chosen as our first modeling method. Later sections provide historical background and highlight the essential communication skills. The chapter concludes with a summary and a supplementary note section, including references for further reading.

1.2 Modeling Approaches

When we design a database for a particular business domain, we create a model of it. Technically, the business domain being modeled is called the universe of discourse (UoD), since it is the universe (or world) that we are interested in discoursing (or talking) about. The UoD or business domain is typically "part" of the "real world". To build a good model requires a good understanding of the world we are modeling, and hence is a task ideally suited to people rather than machines. The main challenge is to describe the UoD clearly and precisely.

Great care is required here, since errors introduced here filter through to later stages in software development. The later the errors are detected, the more expensive they are to remove.

A person who models the UoD is called a modeler. If we are familiar with the business domain, we may do the modeling ourselves. If not, we should consult with others who, at least collectively, understand the business domain. These people are called domain experts or subject matter experts. Modeling is a collaborative activity between the modeler and the domain expert.

Since people naturally communicate (to themselves or others) with words, pictures, and examples, the best way to arrive at a clear description of the UoD is to use natural language, intuitive diagrams, and examples. To simplify the modeling task, we examine the information in the smallest units possible: one fact at a time.

The model should first be expressed at the conceptual level, in concepts that people find easy to work with. Figure 1.1 depicted a model in terms of relational database structures. This is too far removed from natural language to be called conceptual. Instead, relational database structures are at the level of a logical data model. Other logical data models exist (e.g., network, XML schema The definition of an XML document, which includes the XML tags and their interrelationships. Residing within the document itself, an XML schema may be used to verify the integrity of the content. , and object-oriented approaches), and each DBMS is aligned with at least one of these. However, in specifying a draft conceptual design, the modeler should be free of implementation concerns. It is a hard enough job already to develop an accurate model of the UoD without having to worry at the same time about how to translate the model into data structures specific to a chosen DBMS.

Implementation concerns are of course important, but should be ignored in the early stages of modeling. Once an initial conceptual design is created, it can be mapped down to a logical design in any data model we like. This flexibility also makes it easier to implement and maintain the same application on more than one kind of DBMS.

Although most applications involve processes as well as data, we'll focus on the data, because this perspective is more stable, and processes depend on the underlying data. Three information modeling approaches are discussed: Entity-Relationship modeling (database, specification) entity-relationship model - An approach to data modelling proposed by P. Chen in 1976. The model says that you divide your database in two logical parts, entities (e.g. "customer", "product") and relations ("buys", "pays for").  (ER), fact-oriented modeling, and object-oriented modeling Object-Oriented Modeling, or OOM, is a modeling paradigm mainly used in computer programming. Prior to the rise of OOM, the dominant paradigm was functional programming, which emphasised the use of discreet reusable code blocks that could stand on their own, take variables, perform .

Any modeling method comprises a notation as well as a procedure for using the notation to construct models. To seed the data model in a scientific way, we need examples of the kinds of data that the system is expected to manage. We call these examples data use cases, since they are cases of data being used by the system. They can be output reports, input screens, or forms and can present information in many ways (tables, forms, graphs, etc.). Such examples may already exist as manual or computer records. Sometimes the application is brand new, or an improved solution or adaptation is required. If needed, the modeler constructs new examples by discussing the application area with the domain expert.

As an example, suppose our information system has to output room schedules like that shown in Table 1.3. Let's look at some different approaches to modeling this. It is not important that you understand details of the different approaches at this stage. The concepts are fully explained in later chapters.

Entity-Relationship modeling was introduced by Peter Chen (person) Peter Chen - The developer of the Entity-Relationship model.  in 1976 and is still the most widely used approach for data modeling. It pictures the world in terms of entities that have attributes and participate in relationships. Over time, many versions of ER arose. There is no single, standard ER notation.

Different versions of ER may support different concepts and may use different symbols for the same concept. Figure 1.2 uses a popular ER notation long supported by CASE tools from Oracle Corporation. Here, entity types are shown as named, soft rectangles (rounded corners). Attributes are listed below the entity type names. An octothorpe octothorpe - hash character  "#" indicates the attribute is a component of the primary identifier for the entity type, and an asterisk (1) See Asterisk PBX.

(2) In programming, the asterisk or "star" symbol (*) means multiplication. For example, 10 * 7 means 10 multiplied by 7. The * is also a key on computer keypads for entering expressions using multiplication.
 "*" means the attribute is mandatory. Here, an ellipsis A three-dot symbol used to show an incomplete statement. Ellipses are used in on-screen menus to convey that there is more to come.  "..." indicates other attributes exist but their display is suppressed.

[FIGURE 1.2 OMITTED]

Relationships are depicted as named lines connecting entity types. Only binary relationships are allowed, and each half of the relationship is shown either as a solid line (mandatory) or as a broken line (optional). For example, each RoomHourSlot must have a Room, but it is optional whether a Room is involved in a RoomHourSlot. A bar across one end of a relationship indicates that the relationship is a component of the primary identifier for the entity type at that end. For example, RoomHourSlot is identified by combining its hour and room. Room is identified by its room number, and Activity by its activity code.

A fork or "crow's foot Noun 1. crow's foot - a wrinkle in the skin at the outer corner of your eyes
crow's feet, laugh line

crinkle, wrinkle, furrow, crease, seam, line - a slight depression in the smoothness of a surface; "his face has many lines"; "ironing gets rid of most
" at one end of a relationship indicates that many instances of the entity type at that end may be associated (via that relationship) with the same entity instance at the other end of the relationship. The lack of a crow's foot indicates that at most one entity instance at that end is associated with any given entity instance at the other end. For example, an Activity may be allocated many RoomHourSlots, but each RoomHourSlot is booked for at most one Activity.

To its credit, this ER diagram portrays the domain in a way that is independent of the target software platform. For example, classifying a relationship end as mandatory is a conceptual issue. There is no attempt to specify here how this constraint is implemented (e.g., using mandatory columns, foreign key references, or object references). However, the ER diagram is incomplete (can you spot any missing constraints?).

Moreover, the move from the data use case to the model is not obvious. While an experienced ER modeler might immediately see that an entity type is required to model RoomHourSlot, this step might be challenging to a novice modeler.

Let's see Let's See was a Canadian television series broadcast on CBC Television between September 6, 1952 to July 4, 1953. The segment, which had a running time of 15 minutes, was a puppet show with a character named Uncle Chichimus (voice of John Conway), which presented each  if fact-oriented modeling can provide some help. Our treatment of factorientation focuses on Object-Role Modeling. ORM began in the early 1970s as a semantic modeling approach that views the world simply in terms of objects (things) playing roles (parts in relationships). For example, you are now playing the role of reading this book, and the book is playing the role of being read. ORM has appeared in a variety of forms such as Natural-language Information Analysis Method (NIAM NIAM - Natural Language Information Analysis Method (or Nijssen IAM). ).

The version discussed in this book is based on extensions to NIAM and is supported by industrial software tools. Regardless of how data use cases appear, a domain expert familiar with their meaning should be able to verbalize their information content in natural language sentences.

It is the modeler's responsibility to transform that informal verbalization into a formal yet natural verbalization that is clearly understood by the domain expert. These two verbalizations, one by the domain expert transformed into one by the modeler, comprise steps la and lb of ORM's conceptual analysis procedure. Here we verbalize sample data as fact instances that are then abstracted to fact types. Constraints and perhaps derivation derivation, in grammar: see inflection.  rules are then added, and themselves validated by verbalization and sample fact populations.

To get a feeling of how this works in ORM, suppose that our system is required to output reports like Table 1.3. We ask the domain expert to read off the information contained in the table, and then we rephrase re·phrase  
tr.v. re·phrased, re·phras·ing, re·phras·es
To phrase again, especially to state in a new, clearer, or different way.
 this in formal English. For example, the subject matter expert might express the facts on the top row of the table as follows:

Room 20 at 9 a.m. Monday is booked for the activity 'ORC' which has the name'ORM class'.

As modelers, we rephrase this into two elementary sentences, identifying each object by a def'mite description: the Room numbered '20' at the HourSlot with day-hourcode 'Mon 9 a.m.' is booked for the Activity coded 'ORC'; the Activity coded 'ORC' has the ActivityName 'ORM class'. Once the domain expert agrees with this verbalization, we abstract from the fact instances to the fact types (i.e., the types or kinds of fact). We might then depict this structure on an ORM diagram and populate To plug in chips or components into a printed circuit board. A fully populated board is one that contains all the devices it can hold.  it with sample data and counter data (explained shortly) as shown in Figure 1.3.

[FIGURE 1.3 OMITTED]

By default, entity types are shown in ORM as named, soft rectangles (rounded comers) and must have a reference scheme, i.e., a way for humans to refer to instances of that type. Simple reference schemes may be shown in parentheses See parenthesis.

parentheses - See left parenthesis, right parenthesis.
 (e.g., "(.nr)"), as an abbreviation abbreviation, in writing, arbitrary shortening of a word, usually by cutting off letters from the end, as in U.S. and Gen. (General). Contraction serves the same purpose but is understood strictly to be the shortening of a word by cutting out letters in the middle,  of the relevant association, e.g., Room has RoomNr. Value types such as types of character strings need no reference scheme and are shown as named, dashed, soft rectangles (e.g., ActivityName).

This book uses the notation of ORM 2 (second generation ORM), as supported by the NORMA Norma

priestess betrays her vows and sacrifices herself in atonement. [Ital. Opera: Bellini Norma in Benét, 720]

See : Sacrifice
 (Neumont ORM Architect) tool, an open source plug-in to Microsoft Visual Studio Microsoft Visual Studio is Microsoft's flagship software development product for computer programmers. It centers on an integrated development environment which lets programmers create standalone applications, web sites, web applications, and web services that run on any platforms  .NET. The previous version of ORM, as supported by Microsoft Visio Microsoft Visio is diagramming software for Microsoft Windows. It uses vector graphics to create diagrams.

The 2007 Standard and Professional editions share the same interface, but the latter has additional templates for more advanced diagrams and layouts as well as unique
 for Enterprise Architects, depicts object types as ellipses Ellipses is the plural form of either of two words in the English language:
  • Ellipse
  • Ellipsis
, not soft rectangles. As a configuration option, NORMA allows object types to be displayed as ellipses or hard rectangles. Unless indicated otherwise, in this book the term "ORM" is understood to mean ORM 2. When specific reference is made to the previous version of ORM, the term "ORM 1" is used. The ORM glossary A term used by Microsoft Word and adopted by other word processors for the list of shorthand, keyboard macros created by a particular user. See glossaries in this publication and The Computer Glossary.  at the end of this book includes a side-by-side comparison of ORM 1 and ORM 2 notations.

In ORM, a role is a part played in a fact type (relationship or association). A relationship is shown as a named sequence of one or more role boxes, each connected to the object type whose instances play that role. Figure 1.3 includes a ternary (programming) ternary - A description of an operator taking three arguments. The only common example is C's ?: operator which is used in the form "CONDITION ? EXP1 : EXP2" and returns EXP1 if CONDITION is true else EXP2.  (threerole) association, Room at HourSlot is booked for Activity, and a binary' (two-role) association Activity has ActivityName.

Unlike ER, ORM makes no use of attributes in its base models. All facts are represented in terms of objects (entities or values) playing roles. Although this often leads to larger diagrams, an attribute-free approach has advantages for conceptual analysis, including simplicity, stability, and ease of validation. If you are used to modeling in ER or the Unified Modeling Language See UML.

(language) Unified Modeling Language - (UML) A non-proprietary, third generation modelling language. The Unified Modeling Language is an open method used to specify, visualise, construct and document the artifacts of an object-oriented software-intensive system
 (UML (Unified Modeling Language) An object-oriented analysis and design language from the Object Management Group (OMG). Many design methodologies for describing object-oriented systems were developed in the late 1980s. ), this approach may seem strange at first, but please keep an open mind about it. ORM allows relationships of any arity (number of roles). Each fact type has at least one predicate In programming, a statement that evaluates an expression and provides a true or false answer based on the condition of the data.  reading, corresponding to one way of traversing tra·verse  
v. tra·versed, tra·vers·ing, tra·vers·es

v.tr.
1. To travel or pass across, over, or through.

2. To move to and fro over; cross and recross.

3.
 its roles. Any number of readings may be provided for each role ordering. For a binary association, forward and inverse (mathematics) inverse - Given a function, f : D -> C, a function g : C -> D is called a left inverse for f if for all d in D, g (f d) = d and a right inverse if, for all c in C, f (g c) = c and an inverse if both conditions hold.  predicate readings may be shown separated by a slash "/". As in logic, a predicate is a sentence with object holes in it.

Mixfix notation enables the object terms to be mixed in with the predicate reading at various positions (as required in languages such as Japanese). An object placeholder place·hold·er  
n.
1. One who holds an office or place, especially:
a. One who acts as a deputy or proxy.

b. One who holds an appointed office in a government.

2.
 is indicated by an ellipsis "..." (e.g., the ternary predicate "... at ... is booked for ...").

For unary Meaning one; a single entity or operation, or an expression that requires only one operand.

1. (programming) unary - (or "monadic") A description of a function or operator which takes one argument, e.g. the unary minus operator which negates its argument.
 postfix post·fix  
tr.v. post·fixed, post·fix·ing, post·fix·es
To suffix.

n.
A suffix.



post·fix
 predicates (e.g., "... smokes") or binary infix in·fix  
tr.v. in·fixed, in·fix·ing, in·fix·es
1. To fix in the mind; instill.

2. Linguistics To insert (a morphological element) into the body of a word.

n.
 predicates (e.g., "... has ...") the ellipses may be omitted.

For each fact type, a fact table may be added with a sample population to help validate the constraints. Each column in a fact table is associated with one role. The lines beside the role boxes depict internal uniqueness constraints, indicating which roles or role combinations must have unique entries. ORM schemas Schemas
Fundamental core beliefs or assumptions that are part of the perceptual filter people use to view the world. Cognitive-behavioral therapy seeks to change maladaptive schemas.
 may be represented in diagrammatic or textual form, and some ORM tools can automatically transform between the two representations. Models are validated with domain experts in two main ways: verbalization and population.

For example, the uniqueness constraints on the ternary association verbalize as: For each Room and HourSlot, that Room at that HourSlot is booked for at most one Activity; For each HourSlot and Activity, at most one Room at that HourSlot is booked for that Activity.

The ternary fact table shows a satisfying population (each Room-HourSlot combination is unique, and each HourSlotActivity combination is unique). The uniqueness constraints on the binary verbalize as: Each Activity has at most one ActivityName; Each ActivityName refers to at most one Activity. The 1:1 nature of this association is illustrated by the population, where each column entry occurs only once in its column.

The solid dot on Activity is a mandatory role constraint, indicating that each instance in the population of Activity must play that role. This verbalizes as Each Activity has some ActivityName. A role that is not mandatory is optional. Since sample data are not always significant, additional data (such as STM (Scanning Tunneling Microscope) A microscope that can image down to the atomic level. An STM uses a piezoelectric tube with a tiny sharp tip at the end that is moved within nanometers of the object being sampled.  in the binary fact type) may be needed to illustrate some rules. The optionality of the other role played by Activity is shown by the absence of STM in its population.

Since ORM schemas can be specified in unambiguous sentences backed up by illustrative il·lus·tra·tive  
adj.
Acting or serving as an illustration.



il·lustra·tive·ly adv.

Adj. 1.
 examples, it is not necessary for domain experts to understand the diagram notation at all. Modelers, however, find diagrams very useful for thinking about the universe of discourse. To double check a constraint, a counterexample coun·ter·ex·am·ple  
n.
An example that refutes or disproves a hypothesis, proposition, or theorem.

Noun 1. counterexample - refutation by example
 to that constraint may be presented.

The counterrows appended to the fact tables test the uniqueness constraints. For instance, the first row and counterrow of the ternary indicate that room 20 at 9 a.m. Monday is booked for both the ORC Orc

monstrous sea creature; devours human beings. [Ital. Lit.: Orlando Furioso]

See : Monsters
 and XQC activities. This challenges the constraint "For each Room and HourSlot, that Room at that HourSlot is booked for at most one Activity". This constraint may be recast re·cast  
tr.v. re·cast, re·cast·ing, re·casts
1. To mold again: recast a bell.

2.
 in negative form as: It is impossible that the same Room at the same HourSlot is booked for more than one Activity. The counterexample provides a test case to see if this situation is actually possible.

Concrete examples help domain experts to decide whether something really is a rule. This additional validation step is very useful in cases where the domain expert's command of language suffers from imprecise im·pre·cise  
adj.
Not precise.



impre·cisely adv.
 or even incorrect use of logical terms (e.g., "each", "at least", "at most", "exactly", "the same", "more than", "if').

To challenge the constraint that at most one room at the same time is booked for the same activity, the first row and second counterrow of the ternary fact table in Figure 1.3 indicate that both room 20 and room 33 are used at 9 a.m. Monday for the ORC activity. Is this kind of thing possible? If it is (and for some application domains it would be) then this constraint is not a rule, in which case the constraint should be dropped and the counterrow added to the sample data. However, if our business does not allow two rooms to be used at the same time for the same activity, then the constraint is validated and the counterexample is rejected (although it can be retained as an illustrative counterexample).

Compare Figure 1.2 with Figure 1.3. ER is often better than ORM for displaying compact overviews. However, ER models are further removed from natural language and may be harder for the domain expert to conceptualize con·cep·tu·al·ize  
v. con·cep·tu·al·ized, con·cep·tu·al·iz·ing, con·cep·tu·al·iz·es

v.tr.
To form a concept or concepts of, and especially to interpret in a conceptual way:
. In this case, it was more natural to verbalize the first schedule fact as a ternary, but all popular ER notations with industrial support are restricted to binary (two-role) relationships.

Being only binary does not make a language less expressive, since an n-ary association (n > 2) may always be transformed into binaries by co-referencing or nesting. However, such a transformation may introduce an object type that appears artificial to the domain expert, which can hinder communication. Wherever possible, we should try to formulate the model in a way that appears natural to the domain expert.

ER notation is less expressive than ORM for capturing constraints or business rules. For example, the ER notation used for Figure 1.2 was unable to express the constraint that activity names are unique or the constraint that it is impossible that more than one room at the same hour slot is booked for the same activity.

ER encourages decisions about relative importance at the conceptual analysis stage.

Sometimes this may be seen as an advantage. For example, it is fairly natural to think of activity names as attributes of activities, and hence treat names as less important than activities themselves.

Sometimes, however, early distinctions on relative importance can be disadvantageous dis·ad·van·ta·geous  
adj.
Detrimental; unfavorable.



dis·advan·ta
.

For example, instead of using RoomHourSlot in Figure 1.2, we could model the room schedule information using ActivityHourSlot. Which of these choices is preferable may depend on what other kind of information we might want to record. However, because we have been forced to make a decision about this without knowing what other facts need to be recorded, we may need to change this part of the model later.

In general, if you model a feature as an attribute and find out later that you need to record something about it, you are typically forced to remodel re·mod·el  
tr.v. re·mod·eled also re·mod·elled, re·mod·el·ing also re·mod·el·ling, re·mod·els also re·mod·els
To make over in structure or style; reconstruct.
 it as an entity type or relationship because attributes can't have attributes or participate in relationships.

For instance, suppose we record phone as an attribute of Room and then later discover that we want to know which phones support voice mail. Since you rarely know what all the future information requirements The information needed to support a business or other activity. Systems analysts turn information requirements (the what and when) into functional specifications (the how) of an information system.  will be, an attribute- based model is inherently unstable. Moreover, applications using the model often need to be recoded when a model feature is changed. Since ORM is essentially immune to changes like this, it offers far greater semantic stability.

We have already seen that ORM models facilitate validation by both verbalization and population. Attributes make it awkward to use sample data populations. Moreover, populating optional attributes introduces null values A value in a field or variable that indicates nothing was ever derived and stored in it. For example, in a decimal-based amount field, a null value might be all binary 0s (null characters), but not a decimal 0. , which may be a source of confusion to nontechnical people. In light of the aforementioned considerations, it appears that ORM's fact-oriented approach offers at least some advantages over ER modeling for conceptual analysis.

This doesn't mean that you should discard ER, since it has advantages too (e.g., compact diagrams). You can have your cake and eat it too by using ORM for the initial conceptual analysis and automatically generating an ER view from it when desired.

Even if you decide to use ER throughout, ignoring the ORM notation completely, you should find that applying or adapting the modeling steps in ORM's conceptual schema A conceptual schema, or conceptual data model is a map of concepts and their relationships. This describes the semantics of an organization and represents a series of assertions about its nature.  design procedure to the ER notation will help you design better ER models.

Now let's consider Object-Oriented (OO) modeling, an approach that encapsulates both data and behavior within objects. Although used mainly for designing objectoriented program code, it can also be used for database design. Many object-oriented approaches exist, but by far the most influential is the Unified Modeling Language, which has been adopted by the Object Management Group (OMG (1) See Object Management Group.

(2) "Oh my God!" See digispeak.

OMG - Object Management Group
). Among its many diagram types, UML includes class diagrams In the Unified Modeling Language (UML), a class diagram is a type of static structure diagram that describes the structure of a system by showing the system's classes, their attributes, and the relationships between the classes.  to specify static data structures. Class diagrams may be used to specify operations as well as low level design decisions specific to object-oriented code (e.g., attribute visibility and association navigability nav·i·ga·ble  
adj.
1. Sufficiently deep or wide to provide passage for vessels: navigable waters; a navigable river.

2. That can be steered. Used of boats, ships, or aircraft.
). When stripped of such implementation detail, UML class diagrams may be regarded as an extended version of ER.

A UML class diagram for our example is shown in Figure 1.4. To overcome some of the problems mentioned for the ER solution, a ternary association is used for the schedule information. Because of its object-oriented focus, UML does not require conceptual identification schemes for its classes. Instead, entity instances are assumed to be identified by internal object identifiers (programming) object identifier - (OID) Generally an implementation-specific integer or pointer that uniquely identifies an object.  (oids).

UML has no standard notation Standard notation refers to a general agreement in the way things are written or denoted. The term is generally used in technical and scientific areas of study like mathematics, physics, chemistry and biology, but can also be seen in areas like business, economics and music.  to signify sig·ni·fy  
v. sig·ni·fied, sig·ni·fy·ing, sig·ni·fies

v.tr.
1. To denote; mean.

2. To make known, as with a sign or word: signify one's intent.
 that attribute values must be unique for their class. However, UML does allow user-defined constraints to be added in braces or notes in any language. We've added {P} to denote de·note  
tr.v. de·not·ed, de·not·ing, de·notes
1. To mark; indicate: a frown that denoted increasing impatience.

2.
 primary uniqueness and {U1} for an alternate uniqueness--these symbols are not standard and hence not portable. The uniqueness constraints on the ternary are captured by the 0.. 1 (at most one) multiplicity mul·ti·plic·i·ty  
n. pl. mul·ti·plic·i·ties
1. The state of being various or manifold: the multiplicity of architectural styles on that street.

2.
 constraints. Here "*" is shorthand shorthand, any brief, rapid system of writing that may be used in transcribing, or recording, the spoken word. Such systems, many having characters based on the letters of the alphabet, were used in ancient times; the shorthand of Tiro, Cicero's amanuensis, was used  for "0..*", meaning "0 or more". Attributes are mandatory by default.

[FIGURE 1.4 OMITTED]

How well does this UML model support validation with the domain expert? Let's start with verbalization. Although often less than ideal, implicit use of "has" could be used to form binary sentences from the attributes, but what about the ternary? About the best we can do is something like "Booking involves Room and HourSlot and Activity"--which is pretty useless. What if we replaced the association name with a mixfix predicate, as we did in ORM, e.g., "... at ... is booked for ..."?

This is no use, because UML association roles (or association ends as they are now called) are not ordered. So formally we can't know if we should read the sentence type as "Room at HourSlot is booked for Activity", or "Activity at HourSlot is booked for Room" etc. This gets worse if the same class plays more than one role in the association (e.g., Person introduced Person to Person). UML requires association roles to have names (ORM allows role names, but does not require them), but role names don't form sentences, which are always ordered in natural language. UML's weakness with regard to verbalization of facts carries over into its verbalization of constraints and derivation rules.

The UML specification recommends the Object Constraint Language (language) Object Constraint Language - (OCL) A formal specification language extension to UML. The Object Constraint Language is a precise text language that provides constraint and object query expressions on an object-oriented model that cannot otherwise be expressed by  (OCL (language) OCL - 1. Operator Control Language.

2. Object Constraint Language.
) for formal expression of such rules, but OCL is simply too mathematical in nature to be used for validation by nontechnical domain experts. In principle, a higher level language could be designed for UML that could be automatically transformed to OCL.

Since verbalization in UML has inadequate support, let's try validation with sample populations. Not much luck here either. To begin with, attribute-based notations are almost useless for multiple instantiation (programming) instantiation - Producing a more defined version of some object by replacing variables with values (or other variables).

1. In object-oriented programming, producing a particular object from its class template.
 and they introduce nulls into base populations, along with all their confusing properties.

UML does provide object diagrams In the Unified Modeling Language (UML), an object diagram is a diagram that shows a complete or partial view of the structure of a modeled system at a specific time. This snapshot focuses on some particular set of object instances and attributes, and the links between the instances.  that enable you to talk about attributed single instances of classes, but that doesn't help with multiple instantiation. For example, the 1:1 nature of the association between activity codes and names is transparent in the ORM fact table in Figure 1.3, but is harder to see by scanning several activity objects.

In principle, we could introduce fact tables to instantiate In object technology, to create an object of a specific class. See instance.

instantiate - instantiation
 binary associations in UML, but this wouldn't work for non-binary associations. Why not? InUML you can't specify a reading direction for an association unless it's a binary. So there is no obvious connection between an association role and a fact column as there is in ORM.

The best we can do is to name each role and then use role names as headers to the fact table. However, the visual connection of the fact columns to the class diagram would be weak because of the nonlinear A system in which the output is not a uniform relationship to the input.

nonlinear - (Scientific computation) A property of a system whose output is not proportional to its input.
 layout of the association roles, and the higher the arity of the association, the worse it gets.

In its favor, UML is far richer than ORM or ER in its ability to capture other aspects of application design (e.g., operations, activities, component packaging, and deployment). UML includes diagramming techniques, such as state machine and activity diagrams In the Unified Modeling Language, an activity diagram represents the business and operational step-by-step workflows of components in a system. An activity diagram shows the overall flow of control. , to capture business processes. Any full specification of a business domain needs to address these dynamic aspects. If the application is to be implemented in object- oriented o·ri·ent  
n.
1. Orient The countries of Asia, especially of eastern Asia.

2.
a. The luster characteristic of a pearl of high quality.

b. A pearl having exceptional luster.

3.
 code, UML enables more precise descriptions of the programming code structures to be specified (e.g., attribute visibility and association navigability).

[FIGURE 1.5 OMITTED]

If we restrict our attention to conceptual data modeling, however, the ORM notation is significantly richer than ER or UML in its capacity to express business constraints on the data, as well as being far more orthogonal At right angles. The term is used to describe electronic signals that appear at 90 degree angles to each other. It is also widely used to describe conditions that are contradictory, or opposite, rather than in parallel or in sync with each other.  and less impacted by change. As a simple example, consider the output report of Table 1.4. You might like to try modeling this yourself before reading on.

One way to model this report in UML is shown in Figure 1.5. Although the population of the sample report suggests that movie titles are unique and that a person can direct only one movie, let's assume that the domain expert confirms that this is not the case. We should adapt our sample population to illustrate this (e.g., add a new movie 4 with the same title 'Star Peace' directed by Ron Howard).

Assuming people are identified simply by their name, Movie and Person classes may be used as shown. The role names "director" and "reviewer re·view·er  
n.
One who reviews, especially one who writes critical reviews, as for a newspaper or magazine.


reviewer
Noun

a person who writes reviews of books, films, etc.

Noun 1.
" are used here to distinguish the two roles played by Person. Similarly, role names are provided to distinguish the roles played by Movie. In this example, all four role names are required. Association names may be used as well if desired.

Unlike Chen's original ER notation, UML binary associations are typically depicted by lines without a In contrast, ORM's depiction of relationships as a sequence of one or more roles, where each role is associated with a fact table column, provides a uniform, general notation that facilitates validation by both verbalization and sample populations.

The multiplicity constraints indicate that each movie has exactly one director but may have many reviewers and that a person may direct or review many movies. But there is still a missing business rule. Can you spot it? Figure 1.6 models the same domain in ORM. Here the "<" before "has" reverses the normal left-to-right reading direction. The rule missing from the UML model is captured graphically by the circled "X" constraint between the role-pairs comprising the "directed" and "reviewed" associations. This is called an exclusion constraint.

[FIGURE 1.6 OMITTED]

This exclusion constraint verbalizes as No Person directed and reviewed the same Movie or, reading it the other way, No Movie was directed by and was reviewed by the same Person. To validate this rule with the domain expert, you should verbalize the rule and also provide a counterexample. For example, in your model is it possible for Movie 1 to be directed by Run Howard and also reviewed by Ron Howard? Figure 1.6 includes this counterexample. If the exclusion constraint really does Warren Trotter, better known as Really Doe, is an American rapper from Chicago, Illinois. He is affiliated with Kanye West and his G.O.O.D. Music family and label. Discography
Songs
  • "Day By Day"
  • "Plastic"
  • "The Love"
 apply, at least one of those two facts must be wrong.

Some domain experts are happy to work with diagrams and some are not. Some are good at understanding rules in natural language and some are not. But all domain experts are good at working with concrete examples. Although it is not necessary for the domain expert to see the diagram, being able to instantiate any role directly on the diagram makes it easy for you as a modeler to think clearly about the rules.

Although UML has no graphic notation Musical graphic notation is a form of music notation which refers to the use of non-traditional symbols and text to convey information about the performance of a piece of music. It is used for experimental music, which in many cases is difficult to notate using standard notation.  for general exclusion constraints, it does allow you to document constraints in a note attached to the relevant model elements. If a concept is already part of your modeling language, it's easier to think of it.

Since the exclusion constraint notation is not built in to the UIVIL language, it is easy to miss the constraint developing the model. The same thing goes for ER. In contrast, the ORM modeling procedure prompts you to consider such a constraint and allows you to visualize and capture the rule formally. An ORM tool can then map the constraint automatically into executable code Software in a form that can be run in the computer. It typically refers to machine language, which is comprised of native instructions the computer carries out in hardware. Executable files in the DOS/Windows world use .EXE and .  to ensure that the rule is enforced in the implementation.

[FIGURE 1.7 OMITTED]

ORM diagrams always display semantic domains Semantics is a term that refers to how meaning is assigned in language (Oxford, 1989). A domain is essentially a specific place or territory (Oxford, 1989). A semantic domain  as object types. The ORM diagram in Figure 1.7(a) includes role names for birthdate and deathdate, shown in square brackets square bracket
n.
One of a pair of marks, [ ], used to enclose written or printed material or to indicate a mathematical expression considered in some sense a single quantity.
 next to the relevant roles.

These roles are clearly compatible, as they are both played by the object type Date. In ORM, role names may be used like attribute names in automatically generated attribute-views, as well as in rules specified in attribute style (e.g., deathdate > birthdate).

ER diagrams typically hide attribute domains The attribute domain is the set of values allowed in an attribute.

For example: Rooms in hotel (1-300) Age (1-99) Married (yes or no) Nationality (Sri Lankan, Indian, American, or British)
. For example, the birthdate and deathdate attributes in the Barker ER model shown in Figure 1.7(b) should be based on the domain Date, but this is not represented visually. In ER, attribute domains can be listed in another document.

In UML class diagrams, attribute domains may be listed after the attribute name and multiplicity (if shown), as in Figure 1.7(c). The "[0..1]" multiplicity indicates "at most one", so the attribute is optional and single-valued. All too often in practice, only syntactic Dealing with language rules (syntax). See syntax. , or value, domains are specified (e.g., String).

An ER diagram might show population and elevation as attributes of City, and an associated table might list the domains of these attributes simply as Integer integer: see number; number theory , despite the fact that it is nonsense to equate e·quate  
v. e·quat·ed, e·quat·ing, e·quates

v.tr.
1. To make equal or equivalent.

2. To reduce to a standard or an average; equalize.

3.
 a population with an elevation.

Conceptual object types, or semantic domains, provide the conceptual "glue" that binds the various components in the application model into a coherent picture. Even at the lower level of the relational data model (database) relational data model - (Or "relational model") A data model introduced by E.F. Codd in 1970, particularly well suited for business data management. In this model, data are organised in tables. The set of names of the columns is called the "schema" of the table. , E.F. Codd, the founder of the relational model See relational database.

relational model - relational data model
, argues that "domains are the glue that holds a relational database together" (Codd 1990).

The object types in ORM diagrams are the semantic domains, so the connectedness of a model is transparent. This property of ORM also has significant advantages for conceptual queries, since a user can query the conceptual model directly by navigating through its object types to establish the relevant connections. This notion is elaborated further in later chapters.

ER and UML diagrams often fail to express relevant constraints on, or between, attributes. Figure 1.8 provides a simple example. Notice the circled dot over an "X" in the ORM model in Figure 1.8(a). This specifies two constraints: the dot is a mandatory constraint over the disjunction disjunction /dis·junc·tion/ (-junk´shun)
1. the act or state of being disjoined.

2. in genetics, the moving apart of bivalent chromosomes at the first anaphase of meiosis.
 of the two roles (each truck is either bought or leased) and the "X" indicates the roles are exclusive (no truck is both bought and leased). The two constraints collectively provide an xor (exclusive-or) constraint (each truck plays exactly one of the roles).

Unlike most versions of ER, UML does provide an xor constraint, but only between associations. Since the UML model in Figure 1.8(b) models these two fact types as attributes instead of associations, it cannot capture the constraint graphically (other than adding a note). Notice again how the ORM diagram reveals the semantic domains. For instance, tare tare (târ), name sometimes used as a synonym for any vetch, most frequently for the common vetch. The tare of the Scriptures, a weed of grainfields and considered a seed of evil, is thought to have been the unrelated darnel (see rye grass).  may be meaningfully compared with maximum load (both are masses) but not with length. In UML this can be made explicit by appending domain names to the attributes. At various stages in the modeling process, it is helpful for the modeler to see all the relevant information in the one place.

Another ORM feature is its flexible support for subtyping, including multiple inheritance In object-oriented programming, a class that can contain more than one parent. Contrast with single inheritance.

(programming) multiple inheritance - In object-oriented programming, the possibility that a sub-class may be derived from multiple parent classes which are
, based on formal subtype (programming) subtype - If S is a subtype of T then an expression of type S may be used anywhere that one of type T can and an implicit type conversion will be applied to convert it to type T.  definitions. For example, the subtype LargeUSCity may be defined as a City that is in Country 'US' and has a Population > 1000000. As discussed in a later chapter, subtype definitions provide stronger constraints than declarations about whether subtypes are exclusive or exhaustive.

In principle, because there are infinitely many kinds of constraints, a textual constraint language is often required for completeness to supplement the diagram. This is true for ER, ORM, and UML models. However, the failure of ER and UML diagrams to include standard notations for many important ORM constraints makes it harder to develop a comprehensive model or to perform transformations on the model.

For example, suppose that in any movie an actor may have a starring role or a supporting role supporting role nsecond rôle m

supporting role nruolo non protagonista 
 but not both. This can be modeled by two fact types: Actor has starring role in Movie; Actor has supporting role in Movie. The "but not both" condition is expressed in ORM as a pair-exclusion constraint between the fact types. Alternatively, these fact types may be replaced by a single longer fact type: Actor in Movie has role of RoleKind {star, support}.

Transformations are rigorously controlled in ORM to ensure that constraints in one representation are captured in an alternative representation. For instance, the pairexclusion constraint is transformed into the constraint that each Actor-Movie pair has only one RoleKind. The formal theory behind such transformations is easier to apply when the relevant constraints can be visualized.

Unlike UML and ER, ORM was built from a linguistic basis. To reap the benefits of verbalization and population for communication with and validation by domain experts, it's better to use a language that was designed with this in mind. The ORM notation is easy to learn and has been successfully taught even to high school students.

[FIGURE 1.8 OMITTED]

We are not arguing here that ER and UML have no value. They do. We are simply suggesting that you consider using ORM's modeling techniques, and possibly its graphic notation, to facilitate your original conceptual analysis before using an attribute-based notation such as that of ER, UML, or relational tables.

Once you have validated the conceptual model with the domain expert, you need to map it to a DBMS or program code for implementation. At this lower level, you will want to use an attribute-based model, so that you have a compact picture of how facts are grouped into implementation structures. For database applications, you will want to see the table structures, foreign key relationships, and so on. Here a relational or object-relational model offers a compact view, similar to an ER or UML model.

ORM models often take up much more space than an attribute-based model, since they show each attribute as a relationship. This is ideal for conceptual analysis, where we should validate one fact type at a time. However, for logical design, we typically group facts into attribute-based structures such as tables or classes. At the logical design stage, attribute-based models are more useful than ORM models. For example, relational schema diagrams provide a simple, compact picture of the underlying tables and foreign key constraints between them. Also, UML is well suited for the logical and physical design of object-oriented code, since it allows implementation detail on the data model (e.g., attribute visibility and association navigation) and can be used to model behavior and deployment.

Having used ER, ORM, and UML in practice, we've found that ORM often makes it easier to get the model right in the first place and to change the model as the business domain evolves. We believe in the method so strongly that we've made it the basis for much of the modeling discussion in this book. Once you understand ORM's principles, you'll find it much easier to gain a proper understanding of data modeling in ER and UML.

Arguments about modeling approaches can become heated, and not everyone is as convinced of the virtues of ORM as we are. All we ask is that you look objectively at the ideas presented in this book and consider using whatever you find helpful.

Although the book focuses on ORM, it also covers data modeling in other popular notations (e.g., ER, IDEF1X IDEF1X Integrated Definition for Data Modeling , UML, and relational). These other notations have value too. Even if you decide to stay with ER or UML as your conceptual analysis approach, an insight into ORM should make you a better modeler regardless.

1.3 Some Historical Background

This section briefly overviews the evolution of computing languages for information systems and then outlines the historical development of the main kinds of logical data structures used in database systems. We begin with a simple example to illustrate how the level of a language impacts how easy it is to formulate questions. Table 1.5 summarizes how five generations of computing languages might be used to request a computer to list the name, mass, and moons (if any) of each planet, assuming the information is stored in an astronomical as·tro·nom·i·cal   also as·tro·nom·ic
adj.
1. Of or relating to astronomy.

2. Of enormous magnitude; immense: an astronomical increase in the deficit.
 database. The higher the generation, the closer to natural language, and Usually the less you have to say. Nowadays nobody uses machine code or assembler Software that translates assembly language into machine language. Contrast with compiler, which is used to translate a high-level language, such as COBOL or C, into assembly language first and then into machine language.  to access databases. Most database applications are coded using fourth generation languages (language) fourth generation language - (4GL, or "report generator language") An "application specific" language, one with built-in knowledge of an application domain, in the way that SQL has built-in knowledge of the relational database domain.  (4GLs), perhaps in combination with third generation languages (3GLs).

Third generation languages, such as C# and Java, are procedural, emphasizing the procedures used to carry out the task. With 3GLs we typically need to specify how to access data one record at a time. Fourth generation languages, such as SQL SQL
 in full Structured Query Language.

Computer programming language used for retrieving records or parts of records in databases and performing various calculations before displaying the results.
, are primarily declarative de·clar·a·tive  
adj.
1. Serving to declare or state.

2. Of, relating to, or being an element or construction used to make a statement: a declarative sentence.

n.
 in nature: one declares what has to be done rather than how to do it.

[FIGURE 1.9 OMITTED]

With a 4GL, a single statement can be used to perform operations on whole tables, or sets of rows, at once. Hence 4GLs are set oriented rather than record oriented.

Fifth generation languages (language, artificial intelligence) fifth generation language - A myth the Japanese spent a lot of money on. In about 1982, MITI decided it would spend ten years and a lot of money applying artificial intelligence to programming, thus solving the software crisis.  (5GLs), such as ConQuer (an ORM query language A generalized language that allows a user to select records from a database. It uses a command language, menu-driven method or a query by example (QBE) format for expressing the matching condition. ), allow you to specify queries naturally, without knowing the underlying data structures used to store the information. The widespread use of fifth generation languages is still in the future.

The first database management systems were developed in the early 1960s, starting with simple file managers. Various logical data architectures have been proposed as a basis for specifying the structure of databases. In the hierarchic data model, the database schema The definition of a database. It defines the structure and content in each data element within the structure. Schemas are often designed with visual modeling tools that automatically create the SQL code necessary to define the table structures. See subschema and XML schema.  is basically a tree of linked record types, where each record type has a different structure (unlike many trees where each node is of the same type) Records may include one or more fields, each of which can hold only a single value.

Record types are related by parent-child links (e.g., using pointers), where a parent may have many children but each child has only one parent. Hence the type structure is that of a tree, or hierarchy.

For example, in Figure 1.9 the parent record type Department has two child record types: Product and Employee. Each record type contains a sequence of named fields, shown here as boxes, and the parent--child links are shown as connecting lines. For discussion purposes, one record instance has been added below each record type. As an exercise, try reading off all the facts that are contained in this database before reading on.

To begin with, there are five facts stored in the record instances. To make these facts more obvious, Figure 1.9 includes arcs connecting the relevant fields, one arc for each fact. Although these arcs are a useful annotation 1. (programming, compiler) annotation - Extra information associated with a particular point in a document or program. Annotations may be added either by a compiler or by the programmer. , they are not part of the schema notation.

If we are familiar with the business domain, we can verbalize these arcs into relationships.

For example, we might verbalize the five facts as follows.

Department 10 is located in Building 69.

Department 10 has Budget 200000 USD USD

In currencies, this is the abbreviation for the U.S. Dollar.

Notes:
The currency market, also known as the Foreign Exchange market, is the largest financial market in the world, with a daily average volume of over US $1 trillion.
.

Product 'IS2' has ProductName 'InfoStar 2'.

Employee 357 has EmployeeName 'Jones E'.

Employee 357 is of Sex 'F'.

Are there more facts? Yes! The parent--child links encode (1) To assign a code to represent data, such as a parts code. Contrast with decode.

(2) To convert from one format or signal to another. See codec and D/A converter.

(3) The term is sometimes erroneously used for "encrypt.
 the following two facts:

Department 10 develops Product 'IS2'.

Department 10 employs Employee 357.

Hierarchic DBMSs such as IBM's Information Management System can efficiently manage hierarchic structures (e.g., a file directory system). However, having to explicitly navigate over predefined record links to get at the facts can be somewhat challenging. The complexity rapidly rises if the application is not hierarchic in nature.

Suppose that the same product may be developed by more than one department. Conceptually, the Department develops Product association is now many:many. Since parent--child links are always 1 :many, a workaround (jargon, programming) workaround - A temporary kluge used to bypass, mask or otherwise avoid a bug or misfeature in some system. Customers often find themselves living with workarounds for long periods of time rather than getting a bug fix.  is needed to handle this situation.

For example, to record the facts that departments 10 and 20 both develop product 'IS2' we could have the two department record instances point to separate copies of the record instance for product 'IS2'.

The most significant feature of the relational model is that all the facts are stored in tables, which are treated as mathematical relations Noun 1. mathematical relation - a relation between mathematical expressions (such as equality or inequality)
relation - an abstraction belonging to or characteristic of two entities or parts together
. For example, Figure 1.10 shows the relational database for our sample application. Again, the database is annotated with arcs corresponding to the facts stored. Notice the extra deptNr columns in the Employee and Product tables. The facts that Department 10 employs Employee 357 and develops product 'IS2' are stored in the table rows themselves. Access paths between tables are not used to specify facts (as allowed in hierarchic or network models).

To specify queries and constraints, table columns may be associated by name. This allows ad hoc queries A non-standard inquiry. An ad hoc query is created to obtain information as the need arises. Contrast with a query that is predefined and routinely performed. See query and ad hoc.  to be specified at will and simplifies management of the application.

Note that constraints specified between tables are not the same as access paths.

For example, in Figure 1.11, arrows "link" the deptNr column of the Employee and Product tables to the deptNr column of the Department table. However these "links" merely express the constraints that any value in the deptNr column of the Employee and Product tables must also occur as a value in the deptNr column of the Department table. These constraints do not express employment and product development facts.

Although the type and instance link structures are still trees, the fact that product 'IS2' is named 'InfoStar 2' now appears twice in the database. Hence, we need to control this redundancy. Moreover, while retrieving products developed by a given department is easy, retrieving all the departments that developed a product is not so easy.

The network data model was developed by the Conference on Data Systems and Languages (CODASYL (COnference on DAta SYstems Languages) An organization founded in 1959 by the U.S. Department of Defense. It evolved into a variety of volunteer committees and ultimately disbanded by the mid-1990s. ) Database Task Group. This model is more complex than the hierarchic model. Most of the data is stored in records, a single field of which may contain a single value, a set of values, or even a set of value groups. Record types are related by owner-member links, and the graph of these connections may be a network: a record type may have many owners and also own many record types.

As in the hierarchic model, facts are stored either in records or as record links. An owner-member link between record types is restricted to a 1 :many association. To handle a many:many association, such as the case discussed earlier, we might introduce a new record type (e.g., Development) with many: 1 associations to the other record types (in this case, Department and Product).

In general, encoding See encode.  of facts in access paths such as interrecord links complicates the management of the application and makes it less flexible. For example, some new queries will have to wait until access paths have been added for them, and internal optimization efforts can be undone as the application structure evolves.

Partly to address such problems, Dr. Edgar ("Ted") Codd introduced a simpler model: the relational data model. A year after his original 1969 IBM research IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects.  report on the subject, Codd published a revised version Revised Version
n.
A British and American revision of the King James Version of the Bible, completed in 1885.


Revised Version
Noun
 for a wider audience (Codd, 1970) where he first argued that relations should be normalized so that each data entry would be atomic--we now call this first normal form. Other normal forms were defined later.

The relational model is logically cleaner than the network and hierarchic models, but it initially had poor performance, which led to its slow acceptance. However, by the late 1980s, efficient relational systems had become commonplace. Although network and hierarchic database systems are still in use today, relational DBMSs See relational database and DBMS.

relational DBMS - relational database
 are the preferred choice for developing most new database applications. A DBMS should ideally provide an integrated data dictionary A database about data and databases. It holds the name, type, range of values, source, and authorization for access for each data element in the organization's files and databases. , dynamic optimization, data security, automatic recovery, and a user-friendly interface. The main query languages used with relational databases are SQL (informally known as "Structured Query Language See SQL.

Structured Query Language - SQL
") and QBE (Query By Example) A method for describing a database query originally developed by IBM for mainframes. A replica of an empty record is displayed and the search conditions are typed in under their respective columns (fields).  (Query By Example See QBE.

(database, language) Query By Example - (QBE) A user-friendly query language developed by Moshé Zloof of IBM in 1975.

http://informatik.uni-trier.de/~ley/db/indices/a-tree/z/Zloof:Mosh=eacute=_M=.html.

[Moshé M.
). Many systems support both of these. SQL has long been accepted as an international standard and is commonly used for communication of queries between different database systems. For this reason, SQL is the main query language discussed in this book.

[FIGURE 1.10 OMITTED]

[FIGURE 1.11 OMITTED]

Recently the eXtensible Markup Language See XML.

(language, text) Extensible Markup Language - (XML) An initiative from the W3C defining an "extremely simple" dialect of SGML suitable for use on the World-Wide Web.

http://w3.org/XML/.
 (XML XML
 in full Extensible Markup Language.

Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations.
) has also become widely used for communication between different systems, but this is currently focused on sharing data for purposes such as electronic commerce and web publication. Many SQL-based DBMSs now support storing of XML data, as well as querying XML data directly using a query language such as XQuery.

Data architectures exist for object-oriented databases See object database.

(database) object-oriented database - (OODB) A system offering DBMS facilities in an object-oriented programming environment. Data is stored as objects and can be interpreted only using the methods specified by its class.
 and deductive databases (database) deductive database - A combination of a conventional database containing facts, a knowledge base containing rules, and an inference engine which allows the derivation of information implied by the facts and rules. , but these have a long way to go in terms of standardization standardization

In industry, the development and application of standards that make it possible to manufacture a large volume of interchangeable parts. Standardization may focus on engineering standards, such as properties of materials, fits and tolerances, and drafting
 and maturity before they have a chance of widespread acceptance. Although relational systems give adequate performance for most applications, they are inefficient for some applications involving complex data structures (e.g., VLSI VLSI: see integrated circuit.


(1) (Very Large Scale Integration) Between 100,000 and one million transistors on a chip. See SSI, MSI, LSI and ULSI.

(2) (VLSI Technology, Inc., Tempe, AZ, www.semiconductors.
 design).

To overcome such difficulties, many relational systems are being enhanced with object- oriented features, leading to object relational database systems. It appears that such extended relational systems will ensure the dominance of relational databases for the near future, with XML databases A database that stores XML documents. There are two types. The first is the "XML-enabled database," which is a relational or object-oriented database that has been extended to hold XML data.  and object databases being their main competitors.

1.4 The Relevant Skills

Since relational database systems are dominant, they are the main focus of our implementation discussion, with some attention also being given to XML. Although conceptual interfaces to databases are still in their infancy, developing models at this higher level is more productive and avoids wasting time acquiring knowledge and skills that will rapidly become obsolete. Consider the impact of the electronic calculator on school mathematics curricula (e.g., removal of the general square root algorithm).

Fundamentally, there are two skills that will always be relevant to interacting with an information system. Both of these skills relate to communicating with the system about our particular application area. Recall that the application domain is technically known as the universe of discourse.

The two requirements are to:

* describe the universe of discourse

* query the system about the universe of discourse

The first skill entails describing the structure or design of the UoD and describing its content or population: the structural aspect is the only challenging part of this. Obviously, the ability to clearly describe the application domain is critical if you wish to add a model to the system. Complex models should normally be prepared by expert modelers working with domain experts. The main aim of this text is to introduce you to the fundamentals of information modeling. If you master the methods discussed, you will be well on your way to becoming an expert in modeling information systems.

Issuing queries is often easy using a 4GL, but the formulation of complex queries can still be difficult. Occasionally, the ability to understand some answers given by the system requires knowledge about how the system works, especially its limitations.

This book provides a conceptual basis for understanding relational structures and queries, explains the algebra algebra, branch of mathematics concerned with operations on sets of numbers or other elements that are often represented by symbols. Algebra is a generalization of arithmetic and gains much of its power from dealing symbolically with elements and operations (such as  behind relational query A question asked about data contained in two or more tables in a relational database. The relational query must specify the tables required and what the condition is that links them; for example, matching account numbers.  languages, and includes a solid introduction to SQL.

No matter how sophisticated the information system, if we give it the wrong picture of our UoD, we can't expect to get much sense out of it. This is one aspect of the GIGO (Garbage In Garbage Out) principle. Most problems with database applications result from bad database design. This book shows how to model information at a very high level using natural concepts. While providing due attention to popular data modeling approaches such as ER and UML, it also provides an in-depth treatment of the higher level ORM approach.

For immediate use, conceptual schemas can be mapped onto the lower level structures used by today's database systems. This mapping can be performed automatically using an appropriate CASE tool or manually using an appropriate procedure. This book discusses how to map a conceptual model to a relational database system, as well as to XML schema, and how query languages may be used to retrieve data from such systems. It also provides an overview of other related methods and modem trends.

1.5 Summary

This chapter provided a motivation for studying conceptual modeling and presented a brief historical and structural overview of information systems. Database management systems are widely used and are a major productivity tool for businesses that are information oriented. For a database to be used effectively, its data should be correct, complete, and efficiently accessed. This requires that the database is well-designed.

Designing a database involves building a formal model of the business domain or universe of discourse (UoD). To do this properly requires a good understanding of the UoD and a means of specifying this understanding in a clear, unambiguous way.

Object-Role Modeling (ORM) simplifies the analysis and design process by using natural language, intuitive diagrams, and examples, and by examining the information in terms of simple, elementary facts. By expressing the model in terms of natural concepts, such as objects and roles, this fact-oriented method provides a truly conceptual approach to modeling.

Other valuable modeling approaches include Entity-Relationship (ER) modeling and object-oriented modeling.

In practice, ER is still the most popular, high level approach to the design of databases. While many popular versions of ER exist, the Unified Modeling Language (UML) is by far the most influential object-oriented approach.

Although ER and UML models are typically more compact than ORM models, they are arguably ar·gu·a·ble  
adj.
1. Open to argument: an arguable question, still unresolved.

2. That can be argued plausibly; defensible in argument: three arguable points of law.
 less suitable than ORM for formulating, transforming, or evolving a conceptual information model. ER models and UML class diagrams are further removed from natural language, lack the expressibility and simplicity of a role-based constraint notation, are less stable in the face of domain evolution, are harder to populate with fact instances, and may hide information about the semantic domains that glue the model together. However, ER and UML models better highlight the major features of the domain being modeled by representing currently less important features as attributes.

In this book, ORM is used as our basic conceptual modeling method. ER models and UML class diagrams are useful as well, especially for providing compact summaries, and are best developed as views of ORM models. For database applications, conceptual models typically need to be mapped to attribute-based logical and physical models. ER models provide designs that are closer to relational database structures.

For object-oriented applications, UML models can incorporate implementation details as well as behavior and deployment aspects not covered not covered Health care adjective Referring to a procedure, test or other health service to which a policy holder or insurance beneficiary is not entitled under the terms of the policy or payment system–eg, Medicare. Cf Covered.  by the ORM and ER approaches.

Programming tasks are typically coded in third generation languages such as C# and Java. Fourth generation database languages such as SQL are declarative in nature, enabling users to declare what has to be done without the fine detail of how to do it, and are set oriented rather than record oriented.

Fifth generation languages such as ConQuer enable users to query conceptual models directly. Hierarchic and network database systems store some facts in record types and some facts in links between record types. Relational database systems store all facts in tables. No matter how "intelligent" software systems become, people are needed to describe the universe of discourse and to ask the relevant questions about it.

Chapter Notes

Full bibliographic entries for references are included in the bibliography at the back of the book. Codd (1969, 1970) introduced the relational model of data. For a historical discussion of these two papers, see Date (1998). Codd (1990) suggests future directions for the relational model.

The classic paper that introduced Entity-Relationship modeling is Chen (1976). Kent (2000) is a reprint reprint An individually bound copy of an article in a journal or science communication  of a classic book that provides many insights into the nature of information and data.

Many papers on Object-Role Modeling are accessible at http://www.orm.net and at www.ormfoundation.org. Simsion and Witt (2005) provide a readable coverage of various data modeling topics. An overview of UML can be found in Booch et. al. (1999). Muller Mul·ler , Hermann Joseph 1890-1967.

American geneticist. He won a 1946 Nobel Prize for the study of the hereditary effect of x-rays on genes.



Mül·ler , Johannes Peter 1801-1858.
 (1999) discusses use of UML for database design. Halpin and Bloesch (1999) compare data modeling in ORM and UML. Date (2000) provides a clear introduction to most aspects of database systems.

Printed with permission from Morgan Kaufmann, a division of Elsevier. Copyright 2008. "Information Modeling and Relational Databases, 2e" by Terry Halpin and Tony Morgan Tony Morgan is a pastor and the Chief Strategic Officer at NewSpring Church where he develops creative solutions for communications, technology and NewSpring Ministries--the church's ministry that equips other church leaders. . For more information about this title and other similar books, please visit www.elsevier.direct.com.

(1) In rare cases, a movie may have multiple directors, but in this business domain we are interested in only one director per movie.
Table 1.1 An output report about some motion pictures.

Movie#   Movie Title         Released   Director       Stars

1        Cosmology             2006     Lee Lafferty
2        Kung Fu Hustle        2004     Stephen Chow   Stephen Chow
3        The Secret Garden     1937     Alan Grint     Gennie James
                                                       Barret Oliver
4        The Secret Garden     1993     Agnieszka      Kate Maberly
                                          Holland      Heydon Prowwse
5        The DaVinci Code      2006     Ron Howard     Tom Hanks
                                                       Ian McKellen
                                                       Audrey Tautou

Table 1.2 A badly-designed relational database table.

Movie:
                              release
movieNr   movieTitle            Yr      director       star

1         Cosmology            2006     Lee Lafferty   ?
2         Kung Fu Hustle       2004     Stephen Chow   Stephen Chow
3         The Secret Garden    1987     Alan Grint     Gennie James
3         The Secret Garden    1987     Alan Grint     Barret Oliver
4         The Secret Garden    1993     Agnieszka      Kate Maberly
                                          Holland
4         The Secret Garden    1993     Agnieszka      Heydon Prowse
                                          Holland
5         The DaVinci Code     2006     Ron Howard     Tom Hanks
5         The DaVinci Code     2006     Ron Howard     Ian McKellen
5         The DaVinci Code     2006     Ron Howard     Audrey Tautou

Table 1.3 A simple data use case for room scheduling.

Room   Time         Activity Code   Activity Name

20     Mon 9 a.m.   ORC             ORM class
20     Tue 2 p.m.   ORC             ORM class
33     Mon 9 a.m.   XQC             XQuery class
33     Fri 5 p.m.   STP             Staff party
...    ...          ...             ...

Table 1.4 Another sample output report about Movies.

Movie                   Director              Reviewers
Nr   Title              Name           Born   Name         Born

1    The DaVinci Code   Ron Howard     US     Fred Blogs   US
                                              Ann Green    US
2    Crocodile Dundee   Peter Faiman   AU     Ann Green    US
                                              Ima Viewer   GB
                                              Tom Sawme    AU
3    Star Peace         Ann Green      US     ?            ?

Table 1.5 Five generations of computer languages.

Generation   Language example    Sample code for same task

5            ConQuer             [check] Planet that has [check]  Mass
                                   and possibly is orbited by [check]
                                   Moon
4            SQL                 select X1.planetName, X1.mass,
                                   X2.moonName from Planet as X1 left
                                   outer join Moon as X2 on
                                   X1.planetName = X2.planetName
3            Pascal              Two pages of instructions like:
                                   for i := 1 to n do begin
                                   write planetName[i], mass[i]);
2            8086 Assembler      Many pages of instructions like:
                                   ADDI AX, 1
1            8086 machine code   Many pages of instructions like:
                                   00000101 00000001 00000000

Figure 1.1 A relational database representation of Table 1.1.

Movie (movieNr, movieTitle releaseYear, director)

Starred (movieNr, star)

                                         release
Movie:     movieNr   movieTitle           Year     director

           1         Cosmoloby            2006     Lee Lafferty
           2         Kung Fu Hustle       2004     Stephen Chow
           3         The Secret Garden    1987     Alan Grint
           4         The Secret Garden    1993     Agnieszka Holland
           5         The DaVinci Code     2006     Ron Howard

Starred:   movieNr   star

           2         Stephen Chow
           3         Gennie James
           3         Barnet Oliver
           4         Kate Maberly
           4         Heydon Prowse
           5         Tom Hanks
           5         Ian McKellen
           5         Audrev Tautou
COPYRIGHT 2008 A.P. Publications Ltd.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2008 Gale, Cengage Learning. All rights reserved.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:DATABASE AND NETWORK INTELLIGENCE: NEW BOOK EXCERPT
Author:Halpin, Terry; Morgan, Tony
Publication:Database and Network Journal
Geographic Code:1USA
Date:Jun 1, 2008
Words:10912
Previous Article:Protegrity's Defiance Data Protection System For Teradata Warehouse.
Next Article:Akamai report--inaugural "State of the Internet".
Topics:



Related Articles
SYMANTEC DEBUTS VISUALCAFE WITH POINTBASE 100% PURE JAVA.
Gartner's Dataquest Says Worldwide Database Software Market Grew 18% To $8B In 1999.
E-BUSINESSES SEEK CONFIGURATIONS FOR RANGE OF CLIENT DEVICES.
Impedance mismatch in databases: Mary Finn - Intersystems Corp. (Database Systems).
INTEGRATED SOFTWARE SOLUTIONS ANNOUNCES LAUNCH OF NEW RDBMS CONSULTING PRACTICE.
Relational Database Management Systems IBM's DB2 or Oracle 9i? (Network Products).
Db* ..... open source database management.
Database system concepts, 5th ed.
Databases and information systems; proceedings.

Terms of use | Copyright © 2014 Farlex, Inc. | Feedback | For webmasters