Databases as Information Management Tools.
As any records manager knows, the regulatory, business, and legal requirements for organizational recordkeeping must be met, regardless of the media used to store the information -- paper, microfilm, or computer disks. When called upon to retrieve business records five or 10 years from now -- records likely to exist only in electronic format -- information managers will have to attest to accuracy, completeness, and authenticity, possibly without much assistance from computer professionals. Compounding the issue is the fact that new versions of software -- though faster, more powerful, and more capable than their predecessors -- are not necessarily compatible with previous versions and are much more complicated to operate. Faced with a growing volume of corporate records created and stored in difficult to access and maintain repositories, information managers will need a thorough knowledge of database management systems (DBMS) software technology.
According to software authority Chris Date, "A database system is basically a computerized recordkeeping system -- that is, a system whose overall purpose is to maintain information and to make that information available on demand." (Date 1986) The potential complexity of computer database software used to store electronic business records can add new challenges for records managers, computer system auditors, or individuals assessing the adequacy of an organization's quality assurance programs. It may not be easy to view, inspect, or assess business records stored as electronic data or digitized images, without a firm understanding of the inner workings of computer database software. For these reasons, occasional reviews of how contemporary computer software technology works will help information managers understand the implications of this technology for the creation and storage of business records.
Terminology: Business Records vs. Database Records
When an individual uses computer software to create a business record, such as a letter created with a word processor, that letter's content is usually stored in a single file on a computer's hard disk. Contained within that single file are the actual text characters, font designations for the characters, paragraph formatting, page formatting, tables, columns, or any other information that is needed to display or print the letter by the appropriate word processing software. It is possible to find that single file on a computer disk and re-create the original business record, if the software used to create that business record is available. When a search for information within the file is executed, the computer software searches in a linear manner from the beginning of the file to the end of the file, looking for the text string of interest. The word processor file stored on the computer's hard disk can be the complete authoritative record of a business communication.
In contrast, databases can be one file or a set of related files. A single database file contains an accumulation of "records" within the database. As an example, a mailing list database file could contain records of individuals, including such distinct data elements as a person's name, address, and phone number. When considered together, a single instance of these data elements (name, address, and phone number) within the database is a database record. This difference in use of the term "records" is often a source of confusion when information managers and computer systems personnel begin discussions about managing electronic records. It is important to keep in mind that a database file contains data records and may be considered a business record itself. For example, a business database that contains accounting information is a dynamically changing business record of the financial status of a company, even as the individual data records within that database change each minute, hour, or day.
The importance of understanding database software technology becomes readily apparent when one realizes that computer-based records tracking software, electronic mail systems, corporate financial applications, electronic document management systems (EDMS), and many other organizational information resources rely on database software.
Indexes to records within relational databases may be stored separately from the actual database files containing data records. Long text fields, such as memo fields, may also be stored in separate files within the database. Large digital images may be stored in separate database files and compressed with special software to reduce the space needed for image storage. All of these files -- the data file, the index file, the memo field file, and the image file -- can be managed through one DBMS software. When information must be retrieved from the DBMS, the software searches all related files to present the relevant information. This DBMS software is often simply referred to as a "database." Common features of a DBMS are (1) a set of related data files, (2) a query language used for retrieval, (3) a report generator for creating formatted reports, and (4) a set of utilities for importing and exporting data and performing other maintenance activities.
Inside The DBMS
All computer systems must store data in some form of repository that enables efficient information retrieval. Due to the growing variety of information types that must be stored today -- such as text, images, graphics, audio, and video -- the variety of database software appropriate for information storage is also growing. The simple "flat file" formats used by word processors are inefficient methods of storing complex data and records. Relational database software, however, easily stores and retrieves sets of related records within complex data sets, such as separate (but related) files about people, bank accounts, and their financial status. A relational database, for example, could answer, "What are the bank account numbers of people named `Phillips,' and how much money is in each of their accounts?"
The variety of relational DBMS software available today also continues to grow as businesses create new computing applications to meet special needs. Relational database software performs well for storing and retrieving tabular types of data -- data that can be visualized in rows and columns of related small data elements. A good example of such data is the information contained in accounting systems -- names, account numbers, checking account balances, and other financial data best displayed in columns and rows.
Well-known companies producing DBMS software include Microsoft, IBM, Oracle, Sybase, and Informix. Each company produces software that meets its perceived customer's requirements, and as these customer requirements change, so does the nature of the software being produced. Many companies are now producing more than one kind of relational DBMS to meet differing customer needs. Microsoft, for instance, produces the relational DBMS Access for personal or departmental level database applications. It has a good user interface, a rich variety of data management capabilities, and reasonably easy-to-learn report generation features. However, it is not intended for the creation of large multi-user databases, as it does not have the more robust database capabilities needed for large complex computer applications. Capabilities such as sophisticated user access controls, the ability to "rollback" database record changes to those records entered before a system crash, and strong records locking capabilities to prevent one user overwriting the records entered by another user are important. These features assure data integrity in multi-user computing environments with large numbers of simultaneous users. Microsoft offers SQL Server as its recommended relational DBMS for enterprise-level databases where access by many users with different security levels is required.
Object-oriented database software -- stores and retrieves even more complex data such as graphics, audio, and video clips. These databases can be used to display and play simultaneously the text, graphics, photographic images, and sound needed to show an annotated video clip from a movie such as "Gone with the Wind" that is stored on a CD-ROM.
Object-oriented computing applications work somewhat differently from relational DBMS-based systems. (Slater 1997) The database software uses some data types that are similar to traditional attributes found in relational models, but also includes data types with built-in functions and methods for object access or manipulation. Several software manufacturers are taking advantage of object-oriented programming capabilities to create more complex and powerful business applications. Microsoft will use OLE DB (object linking and embedding database), for instance, as a method to enable database objects on the front-end of an application to map to a SQL Server database on the back-end of an application through DCOM, Microsoft's Distributed Component Object Model. This approach to integrating object oriented software technology into computing applications can allow the use of existing SQL Server databases and data, and thus reduce the need for completely migrating business systems to new software.
Many organizations support the development of SQL3, an enhanced structured query language (SQL) that addresses some object oriented database concepts, although efforts to define a specification for this standard are moving rather slowly. (Melton 1998) This new database query language will allow traditional relational database management system products to model and access objects in an "object-oriented manner." The older SQL-92 has no facilities for creating complex data types. However, SQL3 offers the ability to declare abstract data types (ADTs) that let programmers capture application-specific behavior as part of the database. Software that understands SQL3 will produce database records stored with this new query language feature. By combining a more traditional approach to constructing a DBMS with some of the advantages of object-oriented database technology, software vendors hope to provide the best technology for creating databases. Doing so can potentially reduce the requirements for retraining existing computer programming staff and postpone the need to migrate existing applications to a completely new software environment.
Tracking Electronic Documents
Today, software for managing records ranges from simple databases that use bar codes to track paper documents, to complex, multi-user databases handling electronic documents in distributed computing architectures. Surveys of records management software commonly include information about the application database used for record or image storage. (Phillips 1998) Such information can be used to establish the product's compatibility with other existing organizational databases and to determine if it can handle large volumes of data. Detailed studies of records management software indicate that "lower-end databases are unlikely to be able to handle the volumes and large numbers of users associated with enterprise records management." (Medina 1998) For these reasons, the database software that a records management application uses is very important to information managers.
All records management software requires some form of database software to store data about records retention schedules, indexes to record locations, and digitized (bit-map) images of electronic documents. Software packages such as Foremost (Provenance Systems Inc.), TRIM (Tower Software), and Extempore (Select Technologies Inc.) require database management software to operate. Although Visual Basic or C++ are the programming languages used in these packages, they all use database management systems to store data to be processed.
This is also true for electronic document management systems like Documentum and Docs Open, and for groupware products like Lotus Notes or Microsoft Exchange. Documentum, for example, stores document indexes and metadata in Oracle database tables. However, it uses a proprietary database and file system to store actual electronic records, including word processing documents, spreadsheets, and graphics files. In order to understand how business records are stored within these kinds of software, one must understand the operation of the underlying database software.
When information managers reproduce electronic business records for operational, reference, or legal purposes, they may be called upon to explain how those records were first stored, then retrieved using a particular software. The information manager must know how the recordkeeping software stores index data and database records in order to fully explain why a particular set of electronically recalled business records should be considered authentic and therefore admissible in court.
Managing Records in Electronic Systems
A key concept for records managers is the records series (i.e., groups of records used or filed together). As an example, a records series titled "Budget Records," numeric designation 501.00, is described as "Budget proposals and reports for specific fiscal years." However, when records are captured in an EDMS, either by scanning paper or directly importing an electronic data file, it is necessary to identify and categorize them, functions that are entirely dependent on properly functioning computer software. Although misfiled documents can be a problem in paper-based systems, it is theoretically possible to browse through shelves or folders and eventually locate them. But the volume of records associated with computer systems and the more complex requirements for finding and displaying the records makes browsing serially through electronic records a prohibitively expensive proposition. In addition, the searcher must understand the software used to know if indexes to the records are available, as well as know how to use those indexes.
Information managers will be in a much better position to retrieve business records held in database repositories if they have been involved in the original design of the system used to hold those records and are familiar with the software. As an example, if certain electronic mail messages are considered official records of business communications, those messages should be identified within the electronic mail system or be copied to a formal electronic recordkeeping system. Even if they have been identified as official records, anyone trying to find the messages will want to have some knowledge of how they were stored and how to best access them. Issues to investigate might include the following:
* Were the official communications stored on a local PC hard disk or a network e-mail system hard disk? (Some e-mail systems allow this to be a user configurable option.)
* What database software does the e-mail system use for records storage? (Microsoft's Outlook, Qualcomm's Eudora, and Lotus's cc:Mail all use different databases for storing electronic records.)
* Can the e-mail system database software search for records by any text string, or are all records simply filed in folders? (It may not be easy to understand the index terms someone else used to file e-mail messages.)
* Did the e-mail system preserve all relevant data about the communication -- the time, date, sender, receiver, etc.? (This may determine admissibility for legal purposes.)
* How were official records identified within the system and is the identifying information consistent with that used for records in other systems? (Is a records series title common to both paper-based communications and e-mail messages?)
Excellent technical reference books are available to explain database management systems, but it is often better to consult business periodicals that present contemporary market changes in software technology. (Date 1994; Date and Darwin 1998) Many periodical publishers have publicly accessible Web sites where the full text of back issues is available. Sites to consult are:
CIO Magazine <http://www.cio.com>
DBMS Magazine <http://www.dbmsmag.com/>
PC Magazine <http://www.zdnet.com/pcmag/ pcmindex.htm>
E-Business Advisor <http://www.advisor.com/>
Business records must be managed responsibly, despite being stored as electronic files or data. They must be tracked, managed, and archived, even though some operations may be performed within a computer environment. At some point, database software that is either proprietary or publicly available will be used by a records management application, an EDMS, an electronic mail system or another computer application to store an electronic record and the associated index files. To find, retrieve, manage, and archive electronic business records will require an increasing understanding of how database software works and stores information.
Date, Chris J. An Introduction to Database Systems. 4th ed. 1986.
--. An Introduction to Database Systems. 6th ed. 1994.
-- and Darwen, Hugh. Foundation for Object/Relational Databases: The Third Manifesto. 1998.
Medina, Richard. "Doculab's Records Management Systems Benchmark," Inform. September 1998.
Melton, Jim. "SQL3 Final Committee Draft Editing Meeting," Database. September 1998.
Phillips, John T. Software Directory of Automated Records Management Systems. 1998.
Slater, Derek. "Your Database is About to Get More Complicated." CIO Magazine. November 1, 1997.
John T. Phillips, CRM, is the owner of Information Technology Decisions, a management consulting firm. He has more than 20 years' experience information resources management, specializing in automated records management systems and other technology-related areas. Phillips has authored two publications related to information management and technology, and has presented at international conferences on related subjects.
|Printer friendly Cite/link Email Feedback|
|Author:||PHILLIPS, JOHN T.|
|Publication:||Information Management Journal|
|Date:||Jan 1, 1999|
|Previous Article:||Copyright Law and the Internet.|
|Next Article:||The Promise of Project Files: A Case Study.|