KEY FEATURES OF THE ASSOCIATIVE MODEL.
Abandoning the Record
The associative model docs not use records. From punched cards through to the object and object/relational models, the basic unit of data storage has been a record that comprises all of the individual pieces of information about an object or an entity, stored contiguously. The chief argument in favour of the record has been efficiency: given that visiting the disk is a slow, mechanical process, the more data that can be retrieved during each visit the better.
Efficiency has been at the forefront of concerns about the binary model, and hence the associative model also, because both models abandon the record-based approach used by all the other data models in favour of storing data items individually. But as the power of hardware continues to increase, absolute efficiency is progressively sacrificed to gain other benefits, as happened in the evolution of programming languages from machine code through assembler to third and fourth generation languages. In this light, the benefits of adopting a more granular approach to data storage and retrieval - that is, storing data in smaller units - should now be considered.
A record comprises all of an entity's data items, stored contiguously. The concept of the record originates with and is best exemplified by the punched card. On a card columns 1 through 20 might have held the customer's name, columns 21 through 30 their outstanding balance, 31 through 40 their credit limit and so on. The record is hierarchical and network data models, and closely corresponds to the tuple in the relational model. Abandoning the record is rather like cutting up each punched card into vertical sections, I through 20, 21 through 30 and so on, and maintaining an index of where to find each section. This means that an entity's data items are no longer necessarily stored contiguously (either conceptually or physically) and so to retrieve all of them usually requires more than one visit to the disk, whereas a record comprising all of an entity's data items can usually be retrieved in a single visit, as a punched card could, be read in one operation.
To this extent non-record-based models (I shall call them granular models) are inherently less efficient than record-based models, including the relational model. However, the margin of difference is not so great as might be thought.
In a well-normalised relational database, most relations contain a fairly high proportion of foreign keys - in certain types of complex transactions such as sales orders, it is not unusual to find foreign keys in more than half the columns. Working interactively, good user interface design dictates that some meaningful data is presented from each tuple whose primary key appears as a foreign key, so that the user can have visual confirmation that the application has got it right. For example, if customers are identified by account numbers, and an order carries an account number as a foreign key, it would be usual to present the customer's name alongside the account number. Similarly, working in batch mode, it is often necessary to retrieve the tuples identified by foreign keys in order to get the full picture about an entity: in extending a sales order to create an invoice, prices, product descriptions, discount rates, sales tax rates and so must all be retrieved by means of foreign keys. The bottom line is that at least one additional tuple is likely to be retrieved for every foreign key.
In a modern, well-normalised sales order processing application, it is not unusual to find that tuples must be retrievedf rom a dozen or more different relations in order to present a single sales order on the screen. Suppose that such an order comprises one header tuple with twenty columns, plus ten detail line tuples each with eight columns, where half of the columns in each relation are foreign keys. Under the relational model, the number of tuples that need to be retrieved to assemble the whole order is not the number of tuples in the order - 11 - but this number plus one for each of the 50 foreign keys, giving a total of 61. Under the granular model the number of items and links to be retrieved approximates to (depending on the exact design) the original number of columns - 100 - plus one for the target of each column, giving 200 in total.
So although in practice granular models, are indeed less efficient in minimising disk accesses than record-based ones, the margin of difference is not nearly so great as it might appear to be: in this case, just over three to one. Anyone who uses the relational model has already accepted a substantial trade-off in efficiency; if minimising disk access was the sole consideration, sales orders would be stored in un-normalised form. Each could then be retrieved in a single visit to the disk, yielding a margin of efficiency over the relational model of more than sixty to one.
Most software innovators agree that it is important not to underestimate by how much the power of hardware will increase during the lifetime of their product, and consequently how the trade-off between functionality and performance will alter. In terms solely of the amount of work that a computer has to do to present a screen-full of information to a user, the relational model is more efficient than the associative model. But the same can be said of second generation programming languages compared to third generation. As computer power becomes ever cheaper, the right question to ask is not "Is A more efficient than B?", but rather "How much benefit does B offer in return for the cost of some of A's efficiency, and is the trade worth it?". From this more enlightened standpoint, the associative model wins.
Distinguishing Entities and Associations
The associative model divides things into two sorts: entities and associations: entities are things that have discrete, independent existence, whilst associations are things whose existence depends on one or more other things. Previous data models have made no useful distinction between the two, or, to be more precise, have demanded that associations be modelled as entities if their properties are to be recorded. The associative model acknowledges the distinction as one that occurs in the real world, and thus one that allows the creation of more accurate models of the real world. As we discussed in Chapter 5, a series of benefits flow from this.
One of Codd's principal objections to the binary model is that, in his words, one person's entity is another person's relationship, and there is no general and precisely defined distinction between the two concepts. I disagree. Firstly, I believe that most sensible people, once the distinction is pointed out to them, are readily able to decide whether something is an entity or an association.
Secondly it is possible to define the distinction between entities and associations in a simple, reasonably intuitive but nevertheless rigorous way. Such design decisions are almost trivial compared to some of the decisions a relational practitioner is called upon to make when designing base relations.
Codd also objects to the entity-relationship model on the grounds that it does not allow associations to have properties. He is quite right to do so, and the associative model rectifies this, without requiring that they be modelled as entities.
However, much of Codd's ammunition is wasted because the version of the binary model at which he aims his criticisms is not the one contemplated by most researchers. Codd assumes that there is one distinct two-column table per entity per association type, and the two columns of each table are the two associated entities. In fact, most interpretations of the binary model assume that the association type forms the third column, and as we shall see, when this is the case a relational schema comprising any number of relations can be represented in the binary model by just two relations in total: one for entities, one for associations.
Using References, Not Values
The associative model recognises scalar values and strings as things in their own right, with independent existence and identity, instead of as isolated values that represent objects. This approach substantially reduces the amount of work needed to execute queries, and has other benefits - if today's databases had incorporated this capability, the Millennium bug would have been resolved with a fraction of the resources actually consumed. However, the approach depends at a fundamental level on the use of references or pointers to values instead of values themselves, Both Codd and Date have issued stem injunctions against the use of pointers in the relational model. Date has taken the argument furthest in his book "Relational Database Writings 1994 - 1997" , which contains two chapters on pointers and a third on object identifiers, and he describes the introduction of pointers into relations as the Second Great Blunder.
The question at the heart of the issue is whether pieces of data should be represented in a database solely by values, in accordance with Codd's information feature for the relational model, or by references to variables that contains values, in accordance with the associative model, or either, at the user's election.
There is also a secondary question of whether things whose properties are recorded in a database should be identified by keys or by surrogate keys. A key is some unique combination of a thing's existing properties, whilst a surrogate key is a new property assigned as the thing enters the database, solely for the purpose of identifying it and for no other purpose.
Surrogate keys look like object identifiers (as they are commonly used in the object model) in many respects, but Date makes a distinction between surrogate keys and object identifiers and rightly concludes that, whilst object identifiers perform some of the same functions as surrogate keys, they carry a lot of additional baggage with them, and thus are not the same thing as pointers.
In , Codd excludes pointers from the relational model because he believes that both programmers and end-users find them difficult to understand. He cautions us that "the manipulation of pointers is more bug-prone than is the act of comparing values, even if the user happens to understand the complexities of pointers." However, Codd also makes it clear that his prohibition extends only to pointers that are visible to users: "It is a basic role in relational databases that there should be no pointers at all in the user's or programmer's perception." (My italics.) He goes on to concede that "For implementation purposes, however, pointers can be used in a relational database management system `under the covers', which may in some cases allow the DBMS vendor to offer improved performance."
In , Date and Darwen issue a specific proscription, namely "No value shall possess any land of ID (identif er) that is somehow distinct from the value per se", and consequently reject the notions that other objects might make use of such IDs to share values and that users might have to de-reference such IDs, either explicitly or implicitly, in order to obtain values." ("De-reference" means to retrieve whatever it is'that a pointer points to.)
Regarding the use of pointers in the relational model, I agree with Codd and Date, with the exception of a single caveat which I shall describe in a moment. The relational model has no need of visible pointers to achieve its goals and was explicitly designed to dispense with them. Moreover the relational model relies extensively on the use of predicate logic to compare values directly, and this function is undermined and rendered more complex by the use of pointers. Certainly you can add pointers to the relational model, but to do so would be a significant departure from the relational model, and the clear and sturdy conceptual basis of the relational model would be degraded. If the modification adds value without undesirable side-effects, well and good. However in this case the case for the added value is not clearly made and the side-effects have not been explored. At some point the custodians of a conceptual model must defend it from further degradation.
Now for the caveat. The relational model's use of primary and foreign keys has sufficient similarities to a pointer mechanism (albeit one entirely exposed to the user) to cause me to wonder whether Codd and Date protest too much. Moreover, as a pointer mechanism it is fragile: unless the prohibition of duplicate tuples in relations is rigorously enforced, which it is not in many commercial implementations of the relational model, one cannot guarantee always to be able to unambiguously de-reference a foreign key.
Date's aversion to pointers does not extend to surrogate keys. In the context of the relational model, a surrogate key is a key like any other and identifies a single row, but it is not composite, it serves no other purpose and is never reused, even after the thing that it identifies is removed from the database. In  he says "Surrogate keys are a good idea (frequently, if not invariably..) More specifically surrogate keys can help avoid many of the problems that occur with ordinary undisciplined user keys." So, the associative model's use of surrogate keys that are invisible to both the programmer and the user, and are not object identifiers, does not of itself violate the principles that Codd and Date have articulated.
(Date doesn't say explicitly whether a row with a surrogate key would be identified within a database solely by its surrogate key, or by the name of its relation together with surrogate key. He perhaps implies the former by saying that surrogate keys would never be reused, but this further implies that there must be a way to infer from a surrogate key the name of the relation in which it can be found.)
Where the associative model is most fundamentally at variance with the relational model is in the second question: should data be represented by values, or pointers to variables, or either? The relational model, in accordance with Codd's information feature, does only the former. The associative model docs only the latter. There are two cases to consider where the database is representing relationships between one entity and another (which the relational model implements using foreign keys) and where the database is storing a scalar value or a string. Before you pass judgement, I shall examine the associative model's behaviour more closely.
Within any reasonable problem domain, the integer 12, the monetary value $12.00 or the string "QWERTY" all have unequivocal identity. They also qualify as entities according to our test: there is nothing in the real world which, if it ceased to exist immediately, would render the thing in question nonexistent or meaningless. They also each have an obvious identifier, which is their own value.
Most modelling systems and programming languages (except Smalltalk) do not treat scalars and strings as objects or entities: instead they use a value that represents the object. But there is a crucial difference between the entity that represents the decimal integer 100, and the different values that may also be used to represent it, such as 100, or 100.00, or 000000100.0000000, or I.OOE+002. To illustrate the point, we simply have to alter the number system that we are using from decimal to hexadecimal, and the values then refer to a different integer entirely.
Suppose we are building a database that stores addresses. If we put the string "London" in several different columns of several different relations, each time we enter the string again we create an entirely new representation of it, and the database makes no attempt to see if it has already stored the string "London" before, or to try to reuse it. So we may end up with the string "London" stored, say, 1,000 times in the database.
There is nothing to say whether all these values refer to one town, or to more than one - our database may refer to any number of towns called London between 1 and 1,000. If one of these Londons were to change its name, first we would have to locate each one, and then decide whether it was the one which had changed its name or not.
The mechanism that the relational model provides to address this is to allow us to create a relation called Towns, and within it a tuple for each different London. The primary key of each tuple can then be used as a foreign key in various tuples of other relations to refer back to the appropriate London. However, as the issue arises every time for every scalar and every string, it is fair to say that whilst the relational model does not prohibit this approach, if it had wished to endorse it, it would have made it much simpler to implement. Thus in practice if not in theory, it prohibits it.
These observations are equally relevant when we are dealing with, say an amount of money or a date; however there is usually less scope for ambiguity with scalar values. "01-Jan- 2000" or "$100" are pretty unambiguous whether they occur as the identity of instances or as values. But there is still a world of difference between a value that represents an instance and the instance itself. If our database had stored identities of dates instead of dates as values, the Millennium bug would have had a fraction of the impact that it is currently having.
Moving Away From Object Orientation
The associative model is intentionally not object oriented and is not compatible with the object model of data. Object orientation is a powerful and important programming technique. But the guiding principle behind its invention was to restrict or prohibit access to data in main memory in order ensure its integrity. In fact, to borrow Date's words from , "The `object model' is a storage model, not a data model." Date puts the phrase "object model" in quotes because, as he points out, there is no universally agreed, abstract, formally defined "object model". This is simply not an adequate starting point for tools whose primary function is to provide, in Codd's elegantly simple phrase, shared access to large data banks.
It should not be inferred from this that the associative model is not compatible with object-oriented programming languages: nothing could be further from the truth. To use an object- oriented programming language in conjunction with a database based on the associative model (or, indeed, on the relational model) is simply to acknowledge that relatively small amounts of transient data in a computer's memory should not necessarily be organised, managed or protected in the same way as significantly larger volumes of persistent data in a shared database.
Our own implementation of the associative model is written in Java, and its APIs are delivered as Java packages.
Re-asserting the Nature of the Problem Domain
The associative model reasserts the nature of the problem domain that database management systems should be addressing. Over the past decade, object oriented database technology has failed to find a commercially sustainable market either as a repository for multimedia files or as persistent storage for object-oriented programming languages.
The opportunity for the next generation of database management systems lies not with objects or universal servers, but in using vastly increased hardware resources to improve on the way that we store and query our core mission-critical enterprise and transactional data, on which the financial and sometimes physical well-being of enterprises and individuals depends.
REFERENCES AND BIBLIOGRAPHY
[1.] Charles W. Bachman: "The Programmer as Navigator", Communications of the ACM, Vol 16, No 1 1, November 1973.
[2.] Ramez Elrnasri and Shamkant B. Navathe: "Fundamentals of Database Systems, Second Edition", The Benjamin/Cummings Publishing Company, Inc, 1994.
[3.] C.J. Date: "An Introduction to Database Systems, Six Edition", Addison-Wesley, 1994.
[4.] E.F. Codd: "A Relational Model of Data for Large Shared Data Banks", Communications of the ACM, Vol 13, No 6, June 1970.
[5.] E.F. Codd: "Extending the Database Relational Model to Capture More Meaning", ACM Transactions on Database Systems, Vol 4, NO 4, December 1979.
[6.] E. F. Codd: "The Relational Model for Database Management: Version 2", Addison-Wesley, 1990.
[7.] C.J. Date with Hugh Darwen: "Relational Database Writings 1989 - 1991", Addison-Wesley, 1992.
[8.] Judith Jeffcoate and Christine Guilfoyle: "Databases for Objects: the Market Opportunity", Ovum Ltd, 1991.
[9.] Michael Stonebraker with Dorothy Moore: "Object-Relational DBMSS: The Next Great Wave", Morgan Kaufmann Publishers, Inc, 1996.
[10.] C. J. Date and Hugh Darwen: "Foundation for Object/Relational Databases, The Third Manifesto",Addison-Wesley,1998
[11.] Frederick P. Brooks, Jr: "The Mythical Man-Month, Essays on Software Engineering, Anniversary Edition", Addison-Wesley, 1995.
[12.] Geoffrey A. Moore: "The Gorilla Game", HarperBusiness, 1998.
[13.] George Koeh and Keyin Loney: "OracleS: The Complete Reference", Osborne/McGraw-Hill, 1997.
[14.] International Organization for Standardization (ISO): Database Language SQL. Document ISO/IEC 9075:1992.
[15.] Kraig Brockschmidt: "Inside OLE, Second Edition", Microsoft Press, 1995.
[16.] J. A. Feldman: "Aspects of Associative Processing", Technical note 1965-13, MIT Lincoln Laboratory, 1965.
[17.] Levien and Maron in "A Computer System for Inference and Retrieval" CACM Vol. 10, 1967 [171;
[18.] Feldman and Rovner in "An ALGOL-based Associative Language", CACM Vol. 12,1969 ;
[19.] Sharman and Winterbottom in "The Universal Triple Machine; a Reduced Instruction Set Repository Manager", Proceedings of BNCOD, 1981 ;
[20.] R. A. Frost: "Binary-Relational Storage Structures", Computer Journal Vol. 25, No 3, 1982.
[21.] J. Nievergelt, H. Hinterberger and K. C. Seveik: "The Grid File: An Adaptable, Symmetric Multikey File Structure", ACM Transactions on Database Systems, Vol 9, No 1, March 1984.
[22.] Anton Guttman: "R-Trees: A Dynamic Index Structure for Spatial Searching", ACM SIGMOD, 1984.
[23.] P. P. Chen: "The Entity-Relationship Model - Towards a Unified View of Data",ACM TODS Vol 1, No 1, March 1976.
[24.] R. E. Levien and M. E. Maron: "A Computer System for Inference Execution and Retrieval", CACM Vol 10, No 11, 1967.
[25.] W. L. Ash and E. H. Sibley: "TRAMP: an Interpretive Associative Processor with Deductive Capabilities", Proc ACM 23d National Conference, Brandon/Systems Press 1968.
[26.] J. A. Feldman and P. D. Rovner: "An ALGOL-based Associative Language", CACM Vol 12 No 8, 1969.
[27.] P. Titman: "An Experimental Database System Using Binary Relations", Database Management, Proceedings of the IFIP-TC-2 Working Conference ed. Klimbie and Kofferman, North- Holland, 1974.
[28.] G. Bracchi, P. Paolini and G. Pelagatti: "Binary Logical Associations in Data Modelling", reproduced in "Modelling in data Base Management Systems", ed. G. M. Nijssen, North-Holland, 1976.
[29.] G. C. H. Sharman and N. Winterbottom: "The Universal Triple Machine: a Reduced Instruction Set Repository Manager", Proe 6"' BNCOD, 1988.
[30.] D. R. McGregor and J. R. Malone, "The FACT Database System", Proc of Symposium on R&D in Information Retrieval, Cambridge, 1980.
[31.] R. A. Frost: "ASDAS - A Simple Database Management System", Proc of 6th ACM European Regional Conference, 1981.
[32.] J. A. Mariani: "Oggetto: An Object Oriented Database Layered on a Triple Store", The Computer Journal, Vol 35, No 2, 1992.
[33.] P. King, M. Derakhshan, A. Poulovassilis and C. Small: "TriStarp - An Investigation into the Implementation and Exploitation of Binary Relational Storage Structures", Proc 8th BNCOD, York, 1990.
[34.] Nicholas Roussopoulous and John Mylopoulos: "Using Semantic Networks for Data Base Management", proc of 1st Very Large Database Conference. Framingham, 1975.
[35.] C.J. Date with special contributions by Hugh Darwen and David McGovern:" Relational Database Writings 1994-1997",Addison Wesley
[36.] Barry Devlin: "Data Warehouse: from Architecture to Implementation",Addison Wesley, 1997
Lazy Software Literature
1.0 The associative model of data Simon Williams,Lazy Software
2.0 Sentences VI Users Guide
3.0 Sentences Evaluation Guide and Tutorial
Technology Audit: www.butlergroup.com Fax: O1628 642301 This contains a review of the whole Sentences concept offering a Business Viewpoint, Market Strategy, Product Description, Platforms, Future Development.
|Printer friendly Cite/link Email Feedback|
|Publication:||Database and Network Journal|
|Date:||Oct 1, 2000|
|Previous Article:||Sentences - a unique new database system.|
|Next Article:||Sentences and the associative model of data.|