Printer Friendly

Database/MARC.

While the emergence of nearly unanimous support for the MARC format on PC-based library computing products is to be praised and welcomed, it brings with it some frustrations as well. Chief among them is the difficulty of viewing and modifying MARC records in their native state.

For many users this is not a burning issue. If you have a local online system (we'll assume a micro-based system here), most of your dealings with MARC records will likely consist of importing them into that system in the most convenient and quick manner possible. It is probably enough that viewing and editing can take place inside the local system.

Hopefully, your system observes the niceties of MARC format conventions and offers you the eventual option of exporting full MARC format records for use in some other system.

Those in processing centers, those creating and maintaining union databases, and those accumulating MARC records prior to installing a local online system have additional needs, however. Database/MARC, or DBM, is oriented to meeting their needs. It is a structure, a couple of programs, and some associated techniques that make it relatively easy to view and modify MARC data in a generic fashion while maintaining the integrity of content and content designators (all those pesky tags, subfield tags, indicators, etc.) without imposing artificial limits on the data.

Although DBM depends on dBASE IV (or a compatible package like Foxpro or Arago), it transcends the limits of dBASE that are usually encountered in working with textual information. MARC fields of any length can be accommodated. Fields and subfields can repeat with no practical limit. While there are some limits in the particular implementation of the structure described below, simply adding a character to a field size value win expand the upper limits of repeatability by a factor of ten.

While I am still developing new techniques for working with DBM data, the approach has proven itself on a number of occasions, making possible global changes that would otherwise have been impossible given the software tools at hand.

The software components of DBM consist of MARC2DBM, a QuikBASIC program that converts MARC files into the comma-delimited pre-load format required for DBM data, and OUTMARC.PRG, a dBASE IV program that converts DBM records back to standard MARC records once changes are made. After a MARC data file has been run through MARC2DBM, it is loaded with the DBASE command:

APPENDFROM<filename> DELIMITED

The key to the elasticity of DBM is the database structure . In DBM, the subfield is the basic unit of information. The data for a given subfield goes in the TAGDATA field. The subfield tag specific to that element is stored in SUBTAG.TAG holds the MARC tag of which the subfield is a part. The INDICATORS assigned to a given field are stored with the information derived from each of its subfields.

If a subfield requires more than 45 characters, the data is broken at a blank space and the rest of the subfield is continued into the next DBM record(s). The ability to handle unlimited amounts of text derives from the practice of parsing subfield contents into 45-character packages.

It should be noted that the length of TAGDATA could easily be greater or less without compromising the DBM approach. Forty-five was the first value tried. It worked well enough that I never got around to determining optimum value from the standpoint of keeping DBM data files to a manageable size. Any changes would have to be reflected in the preload utility MARC2DBM.

In order to keep track of MARC subfields that are continued from one DBM record to the next, the field DATASEQ is included. The value in that field indicates whether a given TAGDATA element is the first portion of a subfield or a specified subsequent segment of that subfield.

Other sequencing fields assist in resolving additional potential ambiguities. RECSEQ indicates which record in the current file a particular DBM record belongs to. TAGSEQ uniquely identifies each NURC tag. Since many tags are repeatable, TAGSEQ is necessary to determine beyond doubt which data goes with a given tag. Since some subfield tags can repeat within a given MARC field, SUBTAGSEQ is required to pin down which of several possible MARC subfield tags lays claim to the current subfield data.

Figure 2 is a sample record in DBM format.

To view (and if necessary edit) records in the file by title, one night type:

SET FILTER TO tag="245".AND. subtag="a"

BROWSE

Conditional indexes make viewing even easier, as well as much faster. They allow dBASE to instantly skip over DBM records that don't meet one's criteria. One might create a series of indexes once, then let dBASE take care of keeping them up to date so that one can BROWSE by whatever index one wishes:

INDEX ON tag="100".AND. subtag="a- TAG author

INDEX ON tag="245".AND. subtag="a- TAG title

INDEX ON tag="5".AND. subtag ="a' TAG notes

To BROWSE through all the notes fields (any NURC field from 500 through 599) type:

SET ORDER TO notes

BROWSE

Global changes can sometimes be very easy. If a given holding symbol must be changed, as happened recently, it is easy enough to type:

REPLACE ALL tagdata WITH MOUB" FOR tagdata="BTSM"

Deletions are sometimes just as easy. Here, we delete all holdings except those for a given library:

DELETE FOR tag="049".and. tagdata<>"cr.hs-

PACK

Unfortunately, one frequently must delete or make changes to two or more subfields, depending on the contents of a given field. This requires a few lines of program code.

Let's assume that a series of MARC records have repeating 969 fields with Library symbol in subfield $a and call number in subfield $n. For simplicity's sake, we assume $n always follows $a. We would file to move through the DBM format file and delete all holdings information that does not pertain to a particular library. STRIP.PRG does the job Figure 3).

While this may not look particularly simple, I have found no other approach that offers the flexibility for performing ad hoc modifications on large numbers of MARC records. If you are relatively familiar with dBASE IV and have occasion to deal with inaccessible MARC records, you may wish to give this approach a try.

Given the efficiency of dBASE conditional indexes and the flexibility of the FILTER command, it is not totally out of the question to imagine creating a -rough-hewn- bibhographic database of local library holdings upon the foundation of the DBM format.

DBM is itself pretty rough in some respects. Inefficient use of storage capacity is a major trade-off. MARC records will increase in size by as much as 300 percent when converted to DBM format. Such is the nature of software based on fixedlength data fields. One can fill a 40-megabyte hard drive pretty fast at the rate of 1,800 bytes per MARC record. But then anyone doing this kind of work is going to need, one way or another, a 100MB+ drive.

If you have access to DBASE III+, DBXL, dBASE IV, or similar programs, you may wish to acquire a copy of the shareware preload and DBM to MARC utilities, some sample files, and additional information on disk. If so, send me a formatted floppy and be sure to enclose a return mailing label and postage.

Karl Beiser is the library systems coordinator at the Maine State Library and the software editor of Computers in Libraries. He may be reached at 18 Elizabeth Ave., Bangor, Maine 04401.
COPYRIGHT 1992 Information Today, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1992 Gale, Cengage Learning. All rights reserved.

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Hands-On Library Computing
Author:Beiser, Karl
Publication:Computers in Libraries
Article Type:Column
Date:May 1, 1992
Words:1265
Previous Article:Viri again and again and again.
Next Article:Effective use of Usenet.

Terms of use | Privacy policy | Copyright © 2019 Farlex, Inc. | Feedback | For webmasters