SGML and the New Yorker magazine.
A primary benefit of SGML is that once information is established in this machine-readable form, transformations for various delivery systems, as well as views or cuts of the data for specific products, can be made easily, quickly, and at little cost. The information in a publication becomes the asset, rather than the publication itself.
What follows is a report on a speculative exercise that asked: "What if SGML came to The New Yorker?" (Those unfamiliar with the basic terms used in SGML should see the Gilmore article in this issue. Concepts relating directly to my article are described in the accompanying sidebar.)
For SGML '92, a conference sponsored by the Graphics Communications Association (GCA) and held this year in Danvers, MA, five SGML experts were invited to prepare five distinct Document Type Definitions (DTDs) for The New Yorker magazine. The literary magazine is famous for its cartoons, commentary, reviews, new fiction and poetry, and long articles on interesting but offbeat subjects. The magazine's trademark is the set of titles it uses for its commentary and nonfiction articles: "The Talk of the Town," "The Sporting Scene," "The Sky Line" are examples. Each weekly issue also provides New Yorkers with an extensive list of cultural events and entertainment called "Goings on About Town."
The expert DTD-writers volunteered their time for this exercise. The instructions were simple: ignore advertising, model as much of the magazine as needed to illustrate your application, and tackle at least part of "Goings on About Town." Each DTD-writer had a different set of goals in mind, and each DTD illustrated one of the many ways the content of the magazine could be viewed.
Each DTD developer was asked to provide a point of view or focus for discussion and comparison, and to concentrate on a different aspect of SGML. Their separate charges were as follows:
1. Use the American Association of Publishers (AAP) model as a basis for development.
2. Produce a content-specific tag set.
3. Print the publication directly from the SGML markup.
4. Create a hypermedia product from the SGML markup.
5. Selectively populate a database from the SGML markup.
Each provided a printed copy of his or her DTD, well-commented or with an accompanying "tag library," a written explanation for each element named in the DTD. These were distributed to conference participants.
CASE 1: USE THE APA MODEL
In the first case presented, Deborah Lapeyre of Atlis Consulting Group attempted to fit the magazine's document model to one of the set of DTDs created for the American Association of Publishers. She found that the distinctive magazine was difficult to squeeze into a predefined model and that the AAP naming conventions were usable but not necessarily suitable.
The AAP DTD set has served as a well-respected base for defining many SGML documents. Even so, Lapeyre left the impression that its naming conventions, use of mixed-content models, and difficult-to-follow parameter entities may have become outmoded. DTD writers today tend to use longer names for both elements and parameter entities, so that human readers have an easier time both talking about and reading SGML data. As Lapeyre pointed out, it's a little difficult to pronounce "p.zz." It is even more difficult to imagine what group of elements this parameter entity name might stand for.
CASE 2: PRODUCE CONTENT-SPECIFIC TAG SET
In contrast to the AAP model, Yuri Rubinsky of SoftQuad designed his DTD to be content-specific. In a DTD designed to be content-specific, the element article might be changed to a set of elements whose names provide more specific categorization: review, report, brief. Or, even more specifically, the review element might become movie-review, art-review, book-review.
Rubinsky called The New Yorker offices to learn more about how the magazine is constructed and what certain elements are called. For example, the boxed commentaries that appear in "Goings On About Town" are called jewels by magazine insiders. Rubinsky thought of his content-specific markup as being used for an online weekly distribution system for the magazine. He also produced an interesting tag library for his DTD that included a short description of each element for quick reference, and a how-to-use paragraph for each element.
CASE 3: PRINT DIRECTLY FROM SGML
Dennis O'Connor of the Bureau of National Affairs, a legal publisher, produced a DTD that met the requirements for printing by BNA's publishing system. Thus the attributes for elements included entries for columns, just(ification), and type size. Much work has been done in the SGML world to create a descriptive set of "Formatted Output Specifications" in the SGML medium in order to standardize how presentation characteristics are specified.
Generally speaking, however, SGML applications do not carry information about the appearance of a document because the intent is to delineate elements as information, not necessarily related to their display characteristics in any given publication. Nevertheless, O'Connor's DTD showed that attributes can be used to store virtually any auxiliary information considered germane to the use that will be made of the data.
CASE 4: CREATE A HYPERMEDIA PRODUCT
Steve DeRose and David Durand of Electronic Book Technologies produced a DTD to support use of the data by hypertext and multimedia delivery systems. In SGML, each element maintains a position in the structure of the publication being defined. Inside one element, other elements may be nested. For example, the doctype or top-level element in this DTD was called nyhyper. The nyhyper element consists of the element header followed by the element body, followed by the element authlist. The element authlist, or "authority list," contains one or more object elements. The object element is used to hold a unique or authoritative name for a picture, a video clip, or another document--all viewed as external objects in a hypermedia application. A cross reference (the element xref) to a video clip would carry the attribute magic, whose value would store information about what frames of the video to play.
CASE 5: POPULATE A DATABASE FROM SGML
My assignment was to model The New Yorker to produce a historical database of authors and artists whose work has appeared in the magazine. I used the magazine's trademark names for nonfiction articles, separating these content-specific elements into two categories--review or report.
Using short story, poem, cartoon, illustration, and drawing as elements too, I was able to infer the nature of the work and place it, together with its title, its creator, and the issue identifier, into a fixed set of classifications that provide a framework for building an indexed historical database. In addition, this DTD used SGML's LINK feature to process the SGML file to produce a transaction file for loading such a database.
ANALYSIS AND CONCLUSIONS
This session of the conference helped attendees to see the relationship between the design of an SGML document structure and the use or uses projected for the information. For example, to create the historical database application, I found content-specific naming of elements to be critical. For the hypermedia application however, a shorter, simpler, and more generic set of element names sufficed. If the attributes needed for hypermedia were added to content-specific elements, these applications--and an online distribution system--might all be served.
None of the developers devoted less than three full days to the project. In a real application, a complete decision-making process called "document analysis" would be carried out. In this process, The New Yorker staff would apply their expert knowledge of the information itself and of the techniques currently used to publish the magazine, their vision for new uses of the information, and perhaps their requirements for a new editorial system, to the reengineering of their information asset. From this formal analysis, often lasting many months, the DTD would be written. Then, as real data from the magazine was marked up and processed, additional revisions to the document model would undoubtedly occur.
To illustrate the difficulty of the real-world process, conference co-chair Tommie Usdin noted that right in the middle of the project, The New Yorker acquired a new editor-in-chief. Some dramatic changes occurred in the magazine whose design, features, and format had been stable for years! Usdin observed that none of the developers had any experience with magazines, most of the developers found the assignment far more difficult than they had anticipated, and all of the DTDs were delivered after the agreed-upon date.
It is easy to see how the process of designing and implementing an SGML database quickly becomes expensive and all-consuming for an organization. Therefore, decisions about the real uses for the information must balance the business opportunity against real time and money constraints. Simpler tag sets can often be applied economically by a machine process. Content-specific tagging may involve much editorial work on the data.
By illustrating the SGML design techniques for different applications, the presentation clearly demonstrated the importance of beginning the information reengineering process with a specific set of goals. From any of these starting points, more complex iterations of the data can evolve. Participants also learned that when an organization is designing an SGML information database it must relate its goals for the use and reuse of its information to the new and old product applications it plans.
The relationships among elements are described by providing a content model for each element. So, for example, the model for the element book might consist of an element named front.matter, which is followed by the element body, which is followed by the element end.matter. Now that the content model for the element book is established, the subelements contained in book are described. For example, body might contain one or more chapter elements. Chapter might contain a title, followed by an optional subtitle, followed by one or more paragraph elements. The model for the paragraph element would probably say that paragraph simply contains character data, thus ending the hierarchical nesting.
Mixed Content Model
When a content model for a given element allows both character data and other elements, it is called a mixed content model, For example, the element paragraph could contain both plain old character data (whatever the content of the paragraph is) and a cross-reference element.
Markup is a familiar concept in the publishing world. The difference between SGML markup and the markup used for typesetting is that SGML tags surround "text objects" with beginning and (nested) ending tags while typesetting tags simply provide insertion points for proprietary typesetting instructions. SGML markup has the effect of fielding the content so that it can be manipulated easily by a computer program or process.
Another SGML construct is called an entity. The most common use for entities is to provide generic names for special characters. In SGML, the entity reference will stand in for the character until it is resolved in processing to a typesetting or retrieval system. Thus the entity reference &beta, might stand--unambiguously--for the Greek beta character in the SGML data.
Another kind of entity, called a parameter entity, stands in for any often-used or repeatable string used in the Document Type Definition itself. In programmer's terms, a parameter entity is equivalent to a macro, where the name of the macro stands in for a data string or set of instructions.
Documents marked up in SGML are "parsed" by a document analysis program called a parser that determines whether the markup and document structure meet the description provided in the DTD. To implement SGML, such a parser is necessary.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||Case History; Standard General Markup Language, document processing system|
|Date:||May 1, 1993|
|Previous Article:||Getting your data into SGML.|
|Next Article:||Professional recognition and respect through quality.|