Systems librarian and automation review.
Although the words have a holy ring to them, the fact is even IBM isn't IBM compatible, not since they stomped out of the industry-standard theater thinking everyone would follow them to purchase "Micro-Channel Architecture" machines. The only ones who did are those locked into massive state purchasing agreements. Everyone else still buys clones.
Today the query usually revolves not around hardware, but software. The questions are, "Is it Novell compatible?" or the now more common, "Is it Windows compatible?" Companies advertise this compatibility as a selling point: it's certified Novell compatible. That's all you need to know. The rest follows suit. And it gives you an indication of what's really important: software. Who cares what box you run it on?
Compatibility is an issue with printers as well. The standards are Epson for dot-matrix printers of all sorts and Hewlett-Packard for laser printers. Yes, there's PostScript, too, but I maintain this is not as important unless you are heavily involved in desktop publishing and/or graphics efforts. Then PostScript counts in a big way. If not, it's Epson or HP.
Recently I managed to plop myself in the middle of one of these controversies. Here's the background. I wrote the accounting program we use at the library. It does payroll, accounts receivable, and accounts payable. It adds all these numbers up in various ways and prints them out on neat sheets that Boards of Trustees love and grow to expect. We've had it in use in one form or another for almost a decade. We use a system called BARS: Budget Accounting and Reporting System. It's like the Dewey Decimal system for numbers: complex and largely indecipherable. But you don't always need to know what it means to make it work. The numbers just have to come out in the right places.
One of the modules of this program governs printer controls. As you might expect, I support Epson or Hewlett-Packard printers. You just choose which one you have and the program takes it from there. To print out the Auditor's Summary reports, you absolutely must use an HP laser printer. It uses condensed print in landscape format, just flips the numbers on their sides and prints out an entire year, month by month, by BARS Code. You can't do that with an Epson.
A couple of years ago I offered the entire program free to other districts in the county that use the same system. It's now in use at a couple of Fire Districts, a Park District, and a couple of other library districts in the state as well. We gave it to the other districts partly for public relations reasons. We compete in the tax base around here. We wanted to build some bridges of communication before the Fire District put another add in the paper that said, "Next time your house is on fire, call up the public library."
The last fire district to join up with "The B.A.R.S. Tender" asked what kind of printer to get. I told them, "Hewlett-Packard Laserjet II." They were on sale at the time in preparation for the introduction of the Laserjet III. You could get one around here for about $900 brand new. Well, one thing led to another and that person quit and jumped on a sailboat, then another person took over, so a year later they called up and said, "We're ready."
It turned out they had purchased a Canon laser printer, not a LaserJet, because the salesperson at the local computer store told them the Canon was HP compatible. After all, it uses the same "printer engine" used by HP. HP uses a Canon, so it has to be compatible.
One look at the manual and I knew I was in trouble. The Canon doesn't use the HP printer control language (PCL) at all, but has its own very esoteric coding sequences to get the printer into its various modes, including condensed landscape mode. The manual was terribly cryptic. I figured it would take a couple days of my time to puzzle it out, through trial and error, all the time sitting at the fire district headquarters under the watchful eye of the chief, who had been told this program worked just fine.
So I gulped and told them it was their problem. I said HP. I meant HP. I didn't mean Canon. The computer store had misled them to the point where they had a machine that was not compatible with an HP and one that my program did not directly support.
Repeated calls to the computer store and a couple of site visits later, the store itself (which has a large repair and maintenance staff) couldn't make the Canon emulate the HP. They finally found a third-party software program that was supposed to perform this trick. But by that time it was too late. The Fire District moved the Canon over to another machine to print out word processing documents. Then they purchased a new and faster LaserJet III, a truly beautiful machine for the price. Soon they had the auditor's reports they needed for their board, and everyone was happy (phew!).
The bottom line? Watch out for that term "compatible." Make them prove it first.
We all know about "The Internet," right? It's a collective noun used for all those
Michael Schuyler is the systems librarian for the Kitsap Regional Library System, Bremerton, Washington. academic, military, and industrial computers that are hooked together and pass messages back and forth. Although many Internet folks will sneer at the analogy, it's like FidoNet, only bigger, and the nodes are Unix-based minicomputers at least, not mere PCs sitting in the den. Compared to a river, FidoNet is a tiny stream leading to a small bay. The Internet is the Mississippi headed for the Gulf of Mexico. The Internet's a big deal. Just ask anyone who is hooked up.
A few public libraries have access to The Internet (Cleveland is one, for example), but I think it fair to say this is largely an academician's game. Public libraries can get on, but I don't get the impression that it is encouraged. NREN (the National Research Education Network) may change this, but we'll just have to wait and see.
One thing the Internet has is hundreds (yes, hundreds) of online library catalogs. All you need is the Internet address and some patience. Then you, too, can search the catalogs of some of the biggest libraries in this country, Great Britain, or Canada. Just pick one. Maybe you'll find it on the Internet.
My book Dial In caused quite a controversy last year when we sent questionaires in a blanket mailing all across the country. Many folks on the Internet took offense that I would dare compile a book with dial-in telephone numbers, then attempt to actually sell the book.
There were many reasons why a few folks were upset, and maybe I'll go into that sometime. But basically, when you peeled away the chaff of the arguments, some folks were upset that someone might actually find their telephone number, dial in, and use the catalog. And these people might be unauthorized users. Lord knows we don't need any more unauthorized users.
This was all sort of ironic in view of the many library catalogs available as easy pickings for people who had access to The Internet. It was also ironic in view of the fact that several different library lists are available for file request on the Internet itself. "File Request" (known as "freq" on FidoNet or "ftp" on the Internet) is a method by which a local user can request that a file be sent from one computer to another and delivered to the requestor, sort of like interlibrary loan. Except you don't have to return the file.
Those same lists are available on selected boards on FidoNet, including Quicksilver (206-780-2011). They are likely to be on any board that carries the Info Power! echo moderated by Janet Murray (mentioned in last month's column and also in a recent article by Janet Balas [vol. 11 no. 5, May 1991]).
My favorite of these lists was compiled by Peter Scott of the University of Saskatchewan Libraries (Quicksilver filename: HYTELN30.ZIP). It appears to closely follow the list compiled by Bill Barron at the University of North Texas. Both lists show hundreds of libraries including many in Canada and the United States, of course, but also Germany, Australia, Israel, and the United Kingdom, among others.
Peter sent me a copy of his list and encouraged me to give it away. The advantage of Peter's approach is that it comes in the form of a HyperCard stack for MS-DOS. It uses HyperCard functions to jump you, for example, from a section detailing a specific library to directions on how to use that specific type of catalog. Therefore, for a Dynix library you can move to the word "Dynix" and be taken directly to an instructions page that tells you how to navigate Dynix catalogs. There are sections on other catalogs, including Geac and DRA among others. Complete instructions to access these libraries with telnet are given.
Another advantage of Peter's approach is that the program uses a TSR (terminate-and-stay-resident) approach and can be invoked on top of another program with a "hot key." In theory you could have this running while you were hooked online, then just pop it up to get the directions for the particular library you wanted to access.
I caution you that this list does not provide dial-up telephone numbers. (In fact, it specifically excludes them.) It provides access instructions for telnet use only. As a useful tool showing off the concept of HyperCard stacks, it is a superb real-world example.
If you already have access to the Internet, you probably don't need this because you can use the Internet to get it from Peter Scott at SCOTT@SKLIB.USASK.CA, but just in case, it's available as a Free Disk Offer. Just send a stamped, self-addressed disk mailer to the address in the sidebar, and it's yours.
MARC Record Redux
Our C3PO Committees are moving along just fine. We're about to start our fourth in a series and discuss the holds situation, which is up 800 percent since we automated in 1983. We'll consider what needs to be done to streamline the process. We just finished discussing the query system. From that I have come away with the conclusion that from a programming standpoint we have a major problem on our hands: the MARC record.
Now I appreciate the MARC record. Really. It's the closest thing we've got to a standard. It has allowed us to transfer records with relative ease. If all the tags and subfields, delimiters, and whatnot are in the right place, we all ought to be able to handle a MARC record on our local systems or our networks. Just ask OCLC, WLN, or any vendor. They have, by and large, accommodated a standard. We have all benefitted. Indeed, it's a serious problem if any library does not have records in the MARC format.
Therefore, to simply attack MARC as archaic doesn't benefit anyone. No one to my knowledge has come up with a replacement that meets the needs of libraries, given the fact that we already have MARC in place.
But that's not to say MARC doesn't have its little peculiarities, or that those peculiarities don't have a massive effect on our local systems. A discussion of said peculiarities might help us understand why our local systems sometimes act the way they do. Our system probably doesn't act like your system, but somewhere in your system, no matter which kind it is, you will likely see the same sorts of peculiarities, even if the manifestation is totally different.
This matter came to the fore when discussing the way our system allows you to query the database for subjects. The question was, "Why can't you always see the subject you searched for in the index display? You ought to be able to."
You can't? Well, no. That's because the subject is embedded somewhere in the MARC subject string well beyond the width of the allotted space. Thus, when you search for the subject "Aeronautics -- accidents -- 1972" you come up with a hit on "Ghosts" and "Spiritualism." There may not be room to show that the hit on "Aeronautics" was the third-named subject far beyond the screen. But since "The Ghost of Flight 401" has all three subjects, you get the ones you didn't search for first because those other two happened to be first in the MARC record.
What's Our Subject?
In our system, then, the subject search displays the subject string (The 650 field) in its entirety, starting from the beginning. If the beginning doesn't happen to be the subject you're searching for, that's too bad.
That's also why a subject extract on our system will only yield about one-third of the items with that subject attached (assuming an average of three subjects per record). The extract program only searches at the beginning of the subject field. If it doesn't find a match immediately, it moves on to the next record. Yet embedded in the field is the subject somewhere else, two times out of three. Therefore, a printed extract by subject does not work on our system, period. An extract on "Aeronautics" would miss the example listed above.
This problem manifests itself with the subject extract program, and also when you query subjects out of the circulation program. (The OPAC in our system is an entirely different program that even runs off an entirely different database.)
The question is, "How could this possibly happen?" and the answer is: "Because of the MARC record." The MARC record was designed originally to transfer records via tape, and tape is a serial device. One record comes after another in a very long row that is as long as the tape.
Fixed vs. Variable Lengths
There are two ways to make up a database record. The first is by so-called "fixed length" records. This means that you must establish the length of a field in the record at a fixed length that can accommodate the largest possible amount of data in that field, even if the data in a given record will be shorter than this maximum length.
For example, in an address field you might need to establish a length of thirty characters. That means "9160 Fox Cove Lane" would fit nicely in eighteen characters, but twelve characters would be left over and wasted. However, "27921 Lindermann Parkway Northwest" is a few characters over the thirty-character limit. To fit, it must be truncated. Obviously, you don't want to truncate a MARC record to fit into an arbitrary field limit. You never know what is going to happen.
Fixed-length records work well for a number of reasons, particularly when space is not a consideration. If every record is a fixed length, you automatically know where every record begins because it is "X" number of characters from the beginning of the last one. That also means you know where every field inside a record is. The record keeping here is relatively simple. This is the way dBASE and a number of other microcomputer database systems work.
The second way to establish a database is via variable-length records where the length of each field is indeterminate. With this method you cut off the field at the very end of the data. If the data takes three characters, the length of that field is three. If the data takes twenty-three, then for that record, the length of the field is twenty-three as well. You keep track of where one field ends and the other begins by means of "pointers" that "point" to the precise character for the beginning of a new field.
Under this system life is much more complex because the pointers must be carefully determined and you are totally dependent upon them. Otherwise there is no way of determining where a given record begins or ends, and no way of determining where the fields are, either. Where with fixed-length records all this was determined mathematically, with variable length you have extra files to handle.
But this method has several advantages as well. First, it can save a tremendous amount of space. It would be terribly inefficient to transfer records on tape if much of the data were spaces instead of "meat." It's much more efficient to cram all the data together as tight as it will fit, both for each field in a record and for each record in a file. For serial devices, including tape, variable-length records make a lot of sense.
The space saved is not trivial. Take the dBASE record structure as an example. I use file cruncher programs such as PKZIP for many reasons, mostly to save space while transmitting data online. When I crunch a dBASE file, it is not unusual for the file to reduce so that the crunched file is only 30 percent of the size of the original. That means it was 70 percent air.
Look at this in terms of subjects. A given MARC record may have one subject or half a dozen. The number depends on the cataloger. Plus, a given subject heading could be a few characters, or it could stretch across the page.
It would be foolish to establish a fixed length for a subject heading because it would have to be long enough to accommodate the longest subject heading possible. Not only that, you would also have to establish enough separate subject fields to accommodate the largest number of subjects ever to be assigned. The result would be that most MARC records would be mostly full of air -- a lot of empty, unused subject fields and the subject fields that were filled, only partially so.
So no wonder MARC records have variable-length fields. There really was not much choice. Besides, when the MARC system was established, memory was precious. Every byte saved was worth serious money. A variable-length record with variable-length subfields is a very efficient way to store variable length data. It makes sense.
There is only one major caveat to this, and that is that subject headings are continually repeated. If you have a thousand records with the same subject heading, that heading gets written a thousand times into the file. Of course, that's not efficient, but all things being equal, the approach is the easiest to implement.
It's easy to fall into the trap of saying "Memory is cheap, so why be careful?" Although today you can buy a gigabyte of disk storage out of a relatively small budget, this is only a recent phenomenon. Just a few years ago, every byte counted, quite literally.
The subject field itself is rather a special case because it accentuates the problem. How can you keep a subject field (singular) and accommodate more than one subject? Basically by establishing subfields within the larger field. And that's exactly how subjects are accommodated. The field itself is of indeterminate length. You fill it up with the appropriately correct subject heading set off by a subfield delimiter. A second subfield delimiter denotes the beginning of another subject in the same field. It can appear anywhere in the larger subject field itself, and it can also appear more than once.
But a given subject can appear in any order within the subject field. Aeronautics may be the first-named field or the fourth-named field. To the MARC record, it doesn't matter.
Now what happens when such a record is loaded into a local system? It is taken apart (parsed) and pointers are established for every field in the record. The fields are stored in specific locations in a large text file, and the pointers point to the exact byte in this large file where the given field begins. Thus the reference to a field is governed by its placement in the overall text file and does not reference a particular MARC record in total.
What Happens When You Edit?
Furthermore, if you edit such a record, very likely the entire record is not moved. Instead, the pointers are changed. The new edited version of a field gets written to the end of the large text file, but the rest of the record that remains unedited stays where it was originally, including the original data.
This results in a segmented record. Every time you search for a record, the program goes to the pointer file and reassembles the MARC record from its constituent parts scattered throughout the text file. (This method, incidentally, is exactly the same method used to store a file in MS-DOS.)
In our system, subjects are indexed based on the subfields. In other words, the subject index program runs through the records and looks for subfield indicators. When it finds one, it indexes whatever comes next. This way all the subjects, including the embedded ones, are represented by the subject index.
But now when you search for a subject, the index points to the beginning of the subject field, and that is what is displayed. It doesn't start at the subfield indicator on which the index originally indexed. That's why when we search for a given subject, it often doesn't display outright, but appears further on in the subject field. The system is displaying the subject field, not the subject itself. It is displaying the MARC record.
So Where's MARC?
True as far as it goes, but what has happened to the original MARC record? It has been taken apart permanently. The data in your local system no longer resembles the pure MARC record that arrived on tape, or the one you downloaded off the OCLC terminal. It is not the case that the MARC record is simply stored "in toto" on your system.
So do you still even have a MARC record? Yes and no. Most of the parts are there, but they must be reassembled every time a so-called MARC record is displayed, or exported to a new system.
In other words, the MARC record in your local system depends entirely on local pointers unique to your system and no other! Your MARC record is not exactly the same as the MARC record for the library next door. Without your vendor's software for reassembly, that original downloaded MARC record is effectively gone.
When you downloaded the MARC record into your system, there was every reason to use the variable-length records. It really was the only way to keep the parts of a MARC record together during the transfer process. But now that the MARC record is on your local system intact, other possibilities for storage present themselves.
You Want Authority?
Okay, would you like to have authority control over your entire database? Sure you would. But how do you do it when you've got the same subject listed in subfields inside subject fields dispersed throughout your database? Look what would happen if you attempted to change the subject term for just one subject.
First the system would have to search through every subject record for a match, then write the new subject heading in place of the old subject heading, probably at the end of the file while zeroing out the old heading. Needless to say, this is not good. It's convoluted and messy.
The solution is to move to a relational approach, much more in line with modern database systems than the MARC record. Here the subject is stored only once in a separate file. Pointers still show to what bibliographic record each subject points, but it is no longer the case that the subjects are stored repeatedly throughout the MARC record. In this manner, a subject heading can be changed by changing it one time in the authority file. The change is reflected in each relevant record of the database because there really is only one record for each subject to begin with. The duplications are gone.
The relational approach has a couple of additional advantages. First, it saves space. By separating the subjects, you need only list them one time, not every time a subject is assigned. The result can be a much tighter file. Second, indexing is greatly simplified. You just run through the file of subjects and index them. You needn't run through pointer files and large text files to extract the subjects first. Theoretically, at least, this ought to speed up subject indexing considerably.
But, once again, the MARC record is nowhere in sight. To get a MARC record out of the system, the pointer files must be used as the data is written out onto tape (or another form of report). The pointer file is read to find out which subjects currently belong to which bibliographic record. The MARC record must be precisely reassembled from its now relational parts back into a variable-length, serial record.
This is why an authority file is a big deal. You can't just dump records once controlled by an authority file into a local system and expect to retain the authority control, unless you also have an authority system that traps the record as it is entered. Using an authority file is a fundamentally different approach to bibliographic record maintenance. It permeates everything.
What is MARC?
That's a long explanation for a small irritant, but it does show how a fundamental structural decision made in the sixties when the MARC format was designed affects the display of data in our local systems today. Our system really shouldn't display subjects the way it does. But it does for a reason.
This also shows why asking vendors questions about how their systems operate is terribly difficult. You have to ask the right question and also know what answer you want to hear before you even attempt it. I can imagine our asking the questions like this:
"Is your system MARC compatible?"
"Does your system display the full MARC record?"
"Does your system display subjects?"
"How does it display subjects?"
Multiply this by about a thousand and you'll have an RFP. But what are you getting, really?
The Bit Bucket: Then, Do You Sell Advil?
Salesmen in suits always show up around here the day I've been crawling around on the floor troubleshooting a wiring problem. Whenever I dress up, no one shows up. My staff just makes snide comments to me, wondering aloud whom I'm trying to impress. One once told me she liked it when her boss dressed up. Hint.
I spend a lot of time in wiring closets, poking around those white, plastic blocks (M-blocks) the telephone company uses to connect wires together. They're all color-coded, but they're tiny, and trying to actually see one of the connections with over forty eyes is a near impossibility. There are fifty wires on one side of an M-block, each maybe a quarter of an inch apart. The wires are colored with a base color and a stripe of a contrasting color to form pairs: blue/white, white/blue, green/white, white/green, and so on.
I usually wear a baseball cap with "John Deere" written on the front to keep my hair away from the connections (okay, so I'm a sixties leftover), but I turn the bill around back so I can get closer to the wires. This way I am also able to exhibit the adjusto-strap to advantage. It's either that or try to find the wires with a pair of binoculars and a spotlight connected to a long stick. Either way works. I never have understood optics.
This day I had found a tiny wire that had been inadvertently disconnected from an M-Block. All this tiny wire did was take down the administrative leg of our local area network, complete with three PCs, including one of the servers with a laser printer. One tiny wire! (It was yellow with a red stripe.) It connected to an M-block, which connected to another M-block, which connected to another M-block, which connected to a Hub, which...well, let's just stop there.
I yanked at the wire with a pair of needle nose pliers to make it a little longer (didn't know copper would stretch, did you?), then kerchunked the wire back in place. I know that's not the right word. I call this tool I use a kerchunker because that's what it sounds like. You place it over the metal piece on the M-block and push. The wire is pushed in between the metal. A part of the plastic coating is automatically stripped off and the wire makes a supposedly solid connection (they're notorious for being flaky). The tool "kerchunks" as it makes the connection and cuts any extra off the wire -- a Kerchunker.
Actually, I was quite pleased with myself for having found this wire, so I must have had a smile on my face as I emerged from the dust of the telephone equipment room. It's right next to the receptionist's desk, and that's where the salesman was waiting for me. I didn't even have time to turn my cap around.
He was here to see the data processing manager, so Tammy, our receptionist, smirked in my direction, catching my eye with one of her "looks." I sniffled and brushed my hands on my jeans before shaking his hand. He was a real pretty boy with a perfect suit and perfect tie and perfect shoes, and freeze-dried hair. He sort of reminded me of that guy in the Subaru commercials. Except this time he was selling modems. (Well, he called it data communications gear.)
He took a quick look at his hand as he pulled it back. I decided I shouldn't be affronted, but it wasn't like I was greasy or anything. It was just dust and spider webs. I mean, what do you expect in a place like that?
"Well, I see you have at least one computer," he smiled (big teeth) in the direction of Tammy's PC.
"This thing is locked up," she muttered.
"Control, Alt, Delete," I said. Turnabout is fair play.
I took the suit over to see Deep Thought. He kept his head down as he followed me through the maze of desks, probably making sure he didn't step in the tracks I left on the carpet. The computer room is my old office. They just put up a wall and insulated it, then closed off the space and put in one of those operating room floors that absorb static electricity. It's not exactly fancy.
When he walked in the room he looked up and his mouth made a little "O" as he stared.
"What's this?" he said.
"Uh, I think they call it a minicomputer," I responded. "This is Deep Thought."
"Deep Thought? It has an Apple sticker?"
"Yeah, that's sort of a joke. It's really a Geac 8000, made in Canada. Well, actually, they don't make them anymore, but they do have a more powerful replacement." So it has a nose and a pair of eyes on it, too. That doesn't make it human.
I explained a few things about the computer and its design with four processors. I had been through this drill so many times that it kinda spurted out.
"One hundred terminals about, with fifteen lines out, mostly muxed on TDMs. You can stick about three terminals on one port, so that's twelve per mux at our busiest site. Some of the lines are multi-drop. And we have three dial-ins."
I talked quite a bit about our plans to upgrade in the near future, which would probably include data communications, since a new processor would probably not be a polled environment. Even Geac is now selling Advance: one terminal, one port. It would be a whole new situation either way. That's why I consent to talk to salespeople like this, even on cold calls. We need to be up to speed on the newer technology when it's time to go out to bid. I can often learn quite a lot just by listening.
I explained all this to him, but he seemed unfamiliar with some of the situations I was discussing. He didn't know about state contract, the enviable status of having won a competitive bid through the state, thus allowing us to dispense with bid bonds and quotes and all of the bureaucratic hassle inherent in most procurements.
He seemed entranced by the idea that you could use a modem to dial into a catalog from home.
"Wow! How long have you had that?" he asked.
"Five years," I said.
But it's sort of klutzy. Listen, I never said this was a sophisticated operation. ("I know," you sigh. I heard that.)
He only had a single sheet which just listed the various brands his company carried. I wanted to see blinking lights, not paper. It wasn't even a spec sheet, just a list of names.
"You know, I really am not as familiar with the new technology as I should be," I said. "I need to investigate Digicom service, statistical multiplexers, and even voice over data. And if I could integrate our fax network on the same circuits, I'd be just delighted."
"Lots of things are possible," he said. "I just started with the company myself. I'm learning, too."
"Oh, what's your background?" I asked.
"Pharmaceuticals," he said.
|Printer friendly Cite/link Email Feedback|
|Publication:||Computers in Libraries|
|Date:||Nov 1, 1991|
|Previous Article:||Just the facts, Ma'am.|
|Next Article:||Remote log-in with Telnet.|