Searching for Relevance in the Virtual Files of the Future.While Charlemagne prayed in a Roman church on Christmas Day, 800 AD, Pope Leo III Pope Leo III (died June 12, 816) was Pope from 795 to 816. Leo announced his election to Charlemagne, sending him the keys of Saint Peter's tomb and the banner of Rome, requesting an envoy. set a crown upon the German king's head -- an event that inaugurated the Holy Roman Empire Holy Roman Empire, designation for the political entity that originated at the coronation as emperor (962) of the German king Otto I and endured until the renunciation (1806) of the imperial title by Francis II. . Despite the significance of an event that initiated a thousand-year empire, the information recounting it is extremely meager mea·ger also mea·gre adj. 1. Deficient in quantity, fullness, or extent; scanty. 2. Deficient in richness, fertility, or vigor; feeble: the meager soil of an eroded plain. 3. : a total of three paragraphs in the chronicles of three monks. Yet no one questions the relevance of these three paragraphs to an understanding of this episode. Indeed, historians have expended thousands of words in analyzing every conceivable bit of insight in them. The sparse information concerning Charlemagne's coronation stands in marked contrast to the wealth of words reporting even the most mundane events involving political leaders, celebrities, and others. Although usually much less significant than Charlemagne's dramatic coronation, these events are reported at length in hundreds of news publications throughout the world. Yet most of these superabundant su·per·a·bun·dant adj. Abundant to excess. su per·a·bun dance n. reports are irrelevant to an understanding of the events they recount because they are either duplicates or unreliable propaganda. In short, information a thousand years ago was scarce but relevant. Today it is abundant but most often irrelevant. Two interrelated in·ter·re·late tr. & intr.v. in·ter·re·lat·ed, in·ter·re·lat·ing, in·ter·re·lates To place in or come into mutual relationship. in , long-term trends are responsible for this transformation. The first is the steady increase -- especially since the 19th century -- of people who are literate and therefore able to contribute to the store of information. The second is a series of technological innovations -- including the printing press, camera, typewriter, photocopier photocopier Device for producing copies of text or graphic material by the use of light, heat, chemicals, or electrostatic charge. Most modern copiers use a method called xerography. , personal computer, radio, television, and Internet -- which have geometrically increased the dissemination of that information. At a certain point, arguably in this century, the volume of information became first sufficient and then overabundant o·ver·a·bun·dance n. A going or being beyond what is needed, desired, or appropriate; an excess: teenagers with an overabundance of energy. while the information that is relevant to our needs became increasingly harder to find. Now we face the challenge of finding what we want in a cornucopia cornucopia (kôr'ny kō`pēə), in Greek mythology, magnificent horn that filled itself with whatever meat or drink its owner requested. of data. We are reminded of the Ancient Mariner Ancient Marinercursed by the crew because his slaying of the albatross is causing their deaths. [Br. Poetry: Coleridge The Rime of the Ancient Mariner] See : Curse Ancient Mariner telling his tale is penance for his guilt. [Br. who mused, "Water, water everywhere, nor any drop to drink." Today the lament has become "Megs, gigs, and teras everywhere, but not a byte we need." What Happened to Relevant Information? As implied above, the scarcity of relevant information is directly related to the ever-increasing amount of total information. To borrow another metaphor, the bigger the information haystack, the harder it is to find the "needle" of relevant facts. This inverse relationship A inverse or negative relationship is a mathematical relationship in which one variable decreases as another increases. For example, there is an inverse relationship between education and unemployment — that is, as education increases, the rate of unemployment between total and relevant information mirrors the seesaw (language) SEESAW - An early system on the IBM 701. [Listed in CACM 2(5):16 (May 1959)]. correlation between the two scales we use to evaluate information searches. The first scale is recall, the ability to find all information pertaining to the question at hand. The second is precision, the ability of a search to produce only the information really required. Unfortunately, a search broad enough to produce good recall will tend to select too many irrelevant items to be considered precise, and vice versa VICE VERSA. On the contrary; on opposite sides. . The challenge is to tailor a search strategy to the environment so that it will provide both. Thus, so long as information is relatively scarce, our strategy stresses recall, since the total number of possible "hits" (found items) -- both relevant and irrelevant -- is limited. Once information is abundant, the task of sorting out the relevant hits becomes onerous and precision becomes the more consequential consideration. It is tempting to remedy imprecision by simply narrowing the search's scope. This approach enables us to concentrate on a small sector of information, such as a single discipline or department, in order to produce a more precise result. This strategy worked in the past, when compartmentalized com·part·men·tal·ize tr.v. com·part·men·tal·ized, com·part·men·tal·iz·ing, com·part·men·tal·iz·es To separate into distinct parts, categories, or compartments: "You learn . . . information enabled us to (a) be fairly sure where the information we desired was located, and (b) choose search terms that would still provide sufficient recall, without overburdening us with irrelevant hits. Unfortunately, limiting a search's scope is not as effective today as it once was, for the boundaries between disciplines are breaking down in every realm. Three major trends have accelerated the dissolution of barriers that once quarantined business information into departmental isolation wards. First, business reorganizations and downsizings have combined previously autonomous departments and linked once-independent functions with streamlined process workflows. Second, the popularity of ad hoc For this purpose. Meaning "to this" in Latin, it refers to dealing with special situations as they occur rather than functions that are repeated on a regular basis. See ad hoc query and ad hoc mode. project teams recruited from diverse departments has complemented such business process reorganization. Finally, the growing use of the local area computer network by every part of the organization makes it technically simple to move information between offices that formerly exchanged only pleasantries pleas·ant·ry n. pl. pleas·ant·ries 1. A humorous remark or act; a jest. 2. A polite social utterance; a civility: exchanged pleasantries before getting down to business. at the water cooler (Naisbitt 1982). As the boundaries break down, we can no longer predict with assurance the area in which to find what we seek. We are consequently unable to gain sufficient recall without widening the search's scope to a degree that seriously compromises precision. The rampant proliferation of multiple copies and versions -- made possible by copy machines, as well as networked word processing word processing, use of a computer program or a dedicated hardware and software package to write, edit, format, and print a document. Text is most commonly entered using a keyboard similar to a typewriter's, although handwritten input (see pen-based computer) and , and spreadsheet and database tools -- also hampers finding relevant information. Today's technology enables -- even encourages -- us to generate and circulate multiple versions of the same electronic files, all slightly different copies of the same thing. The same set of search terms selects all these versions, so we have more to look through, and it is hard to know which version contains the information we really want. Who among us has searched for an executed agreement only to find multiple drafts, transmittals, amendments, and correspondence concerning the agreement, along with -- if we are lucky -- the agreement itself? Unless we have carefully distinguished these documents by type, the same search terms will find them all. Finally, the variety of media documenting a certain event or subject worsens the problem because the indices of different media (whether audiotape au·di·o·tape n. 1. A relatively narrow magnetic tape used to record sound for subsequent playback. 2. A tape recording of sound. tr.v. , videotape, photograph, or electronic data file) are usually separate from each ,other. This means that any search requiring total recall of all records concerning a certain subject necessitates a series of separate searches of all indices. Restoring Relevance: Traditional Records Management Techniques Information managers have long sought to improve searches for relevant information by developing indexing schemes and records retention schedules. Consistent indexing and file-organization schemes have categorized records according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. type, provenance, or subject. Such schemes enable us to narrow our searches to a limited group of categories. By identifying records that can be purged, retention schedules reduce the total volume of documents we must search through in each category. Together, these indexing and scheduling techniques have proven valuable for traditional retrievals that seek particular documents from an identified category or records series, no matter what its format. As we enter the era of continually realigned departmental and functional areas, our searches evolve from simple retrieval into investigation. Often we seek not a record in a defined records series, but information in ad hoc, "virtual" files containing all documents pertaining to an issue involving several functional areas. (The virtual file is similar to the traditional case file except it is not kept in a single location, series, or version. We also have "virtual documents," pieces and versions of which are found in multiple locations and media.) Such virtual files do not even come into existence until we formulate our search. Of course, they still include records from traditional records series; however, their more common constituents are compound documents composed of multiple versions and attachments, existing in diverse media, located in various places (Sutton 1996). A recent search for information regarding procurement of new buses at a metropolitan transportation authority provides a typical example of a virtual file. In the Planning Department, we found ridership data that justified the purchase; in Budget, the provision of capital funds; in Public Relations public relations, activities and policies used to create public interest in a person, idea, product, institution, or business establishment. By its nature, public relations is devoted to serving particular interests by presenting them to the public in the most , tapes of public meetings promising improved transit service; in Government Relations, correspondence regarding a federal grant for the purchase; in Procurement, the actual bid documents; in Accounting, the disbursement DISBURSEMENT. Literally, to take money out of a purse. Figuratively, to pay out money; to expend money; and sometimes it signifies to advance money. 2. of funds to pay for the vehicles; and in Bus Operations, a strategy for deploying them. The records relevant to this procurement constitute a virtual file that belongs to no single department. Nor is it located in one place or explicitly identified with any index category or scheduled records series. Experienced information managers will recognize that a virtual file resembles the search result of any complex information retrieval information retrieval Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. . However, there are three aspects that make virtual files different. First, virtual files are more heterogeneous, composed of a broad spectrum of media and file formats from a wide range of locations. Second, whereas we have traditionally replicated our search results into a new physical file, there will be no reason to do this when everyone has electronic access to the same databases. Thus, the virtual file becomes a collection of electronic pointers with no separate physical existence. Third, a complex search for information in different locations and media was the exception in the past, while today the virtual file becomes the norm. Searches for information in virtual files do not greatly benefit from retention schedules (Sutton 1996). For one thing, we cannot simply consult the retention schedule to find the "office of record" for the virtual file we seek. By definition, virtual files have no "official copy." Less obvious, but ultimately more important, is that searches in virtual files are not affected by purging and inactivation inactivation /in·ac·ti·va·tion/ (in-ak?ti-va´shun) the destruction of biological activity, as of a virus, by the action of heat or other agent. . This is because the contents of virtual files are located in personal computers, desk files, and shared network drives. They are listed on the records retention schedule -- if at all -- in ambiguous terms such as "general correspondence," "working files," and "administrative files." When confronted with the alternative of warehouses filled with business paper, we agreed to inactivate in·ac·ti·vate v. 1. To render nonfunctional. 2. To make quiescent. in·ac ti·va or purge files on a regular basis. However, as it becomes possible to store a warehouse's equivalent on a single PC, purging loses much of its urgency. Futurist W. Warren Wagar W. Warren Wagar (1932 - November 16, 2004) was a historian and futures studies scholar. A specialist in alternative society futures and an expert in the work of pioneering science fiction writer H.G. (1989) has predicted that in the next century all recorded human information will fit into a single 21 cm cube. No longer compelled by the insufficiency of space to store records, we inevitably begin to find reasons to keep them. Do these records possess "evidentiary value" in detailing the history of a business or industry? Would it be valuable to maintain all earlier versions and review comments to an agreement so that we will know what its makers really had in mind? As we convince ourselves to delay purging our files, we also forfeit the assistance this activity provided to searches for relevant information. Restoring Recall: Intranets If the record copy concept and disposition process become insufficient to ensure search recall and precision, what will take their place? In the virtual files environment, achieving adequate recall will demand tools that can search across multiple indices, text databases, and media. Some new groupware products (e.g., Lotus Notes Messaging and groupware software from IBM Lotus that was introduced in 1989 for OS/2 and later expanded to Windows, Mac, Unix, NetWare, AS/400 and S/390. Notes provides e-mail, document sharing, workflow, group discussions and calendaring and scheduling. 4.5) permit such multi-database and multi-media searches. However, the most promising method of stitching together information from all applications, media, and platforms is an intranet, a technology that corporate networks borrow from the Internet. Intranets introduce browsers and search engines to transform each desktop PC into an "ATM of corporate information" (Koulopoulos 1997). Intranet browsers convert files from corporate applications into hypertext mark-up language (HTML HTML in full HyperText Markup Language Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. ) or other standard formats readable by every PC with the browser installed. (In actuality, the common language may be the simple but relatively weak HTML, the powerful but complex standard generalized mark-up language [SGML SGML in full Standard Generalized Markup Language Markup language for organizing and tagging elements of a document, including headings, paragraphs, tables, and graphics. ], extensible mark-up language [XML XML in full Extensible Markup Language. Markup language developed to be a simplified and more structural version of SGML. It incorporates features of HTML (e.g., hypertext linking), but is designed to overcome some of HTML's limitations. ], or some combination of all three.) Alternatively, the browser may use abbreviated versions of native applications (e.g., "plug-ins" or Java applets) that allow users to access information in its original format (Benett 1998). Connecting key words with HTML hyperlinks will further enhance the browser's search recall. Intranet technology potentially makes all corporate information -- no matter the media or application -- available to every PC. It may well provide the ultimate corporate recall (Motz 1998). Restoring Precision: The Document Life Cycle Despite an intranet's impressive promise, it has only solved half the relevance problem. All of us who receive 275,000 hits on what we thought was a very focused Internet search know that precision requires something more than an Internet browser See Web browser. . An obvious first step is to choose or design a search engine that provides precise searches. Wide variation exists among search engines. Most offer excellent opportunities to use Boolean operators (i.e., and, or, not) for narrowing the search, and provide functions for specifying proximity and frequency of the search terms we choose. Many now also provide a relevance ranking feature that displays the most relevant hits first based upon such criteria as the frequency of the most unique search terms. A few search engines offer relevance feedback Relevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query. , which enables the searcher to narrow the search further by indicating which hits most closely approximate the objective (Low and Chua 1998). Of course, we can make these engines more precise by adding tools that are specific to our corporate intranet. Perhaps the most powerful, but also the most labor-intensive to develop, are controlled languages (thesauri) of keywords and subject codes that can be entered into each document profile. Such controlled languages enable searches to be more precise, but they also increase recall by locating documents with the relevant codes even in locations and databases where such material would not be expected. Despite their promise, without an overall strategy for relevance, these tools will remain just useful gadgets. Such a strategy must be based upon an architecture that integrates all the text, image bit maps, profiles, and other compound document elements of corporate virtual files, into a document life cycle. This life cycle differs from its records management cousin, which concerns physical records, as well as from information technology's life cycle of hardware, systems, and applications. The document life-cycle details a document's life through all of its phases: the first conceptual draft; subsequent drafts and e-mailed reviews; supporting material in other media; correspondence regarding it; and its appearance in various formats (e.g., word processing file, image file, and text file) (Sutton 1996). As an example of such a document life. cycle, consider a transportation consultant's technical specification and drawings package produced for a proposed freeway overpass. This document will go through a variety of versions and media: from serving as the core of the Environmental Impact Report required by the state, to its use in the handouts, videos, and audio tapes of public meetings. The document's further use includes a board report for the approving agency's agenda, bid documents sent to potential contractors, and the resulting contract between the successful bidder and the agency. Finally, the document becomes part of the as-built project record submitted at the project's conclusion, and an entry in the agency's contract accounting files to justify funds disbursed. It may also serve as a reference in the bid package for subsequent freeway improvement projects. A document management system must link and organize all of these versions so that each is distinctly identified without destroying the document's organic identity. Realizing that all of these versions are manifestations of the same basic document will make it much easier to find what is relevant in the future information glut See information overload. . For example, linking document components increases the recall of a search that requires everything regarding a particular document or series of documents. Similarly, by showing that the current surfeit sur·feit v. sur·feit·ed, sur·feit·ing, sur·feits v.tr. To feed or supply to excess, satiety, or disgust. v.intr. Archaic To overindulge. n. 1. a. of information is really just a proliferation of document versions and copies, the document life cycle concept can inspire new techniques for improving search precision. It can enable us to orient our searches towards only the latest version, while still maintaining other versions for reference in a historical file (Sutton 1996). Correctly identifying, linking, and organizing these "extended, compound documents," and their multiple components is perhaps the one step that will most enhance our search for relevance in the future. BIBLIOGRAPHY Koulopoulos, T. M. "The Magna Carta Magna Carta or Magna Charta [Lat., = great charter], the most famous document of British constitutional history, issued by King John at Runnymede under compulsion from the barons and the church in June, 1215. of the Intranet." Smart Companies, Smart Tools. 1997. Extracted in http://idm.Internet.com/features/idm0997-magna.shtml. Benett, Gordon. "Docudramas." LAN (Local Area Network) A communications network that serves users within a confined geographical area. The "clients" are the user's workstations typically running Windows, although Mac and Linux clients are also used. Times. Available from http://idm.Internet.com/features/docudramas.shtml. Low, W.C. and Chua, T.S. Relevance Feedback Techniques for Content-based Image Retrieval Content-based image retrieval (CBIR), also known as query by image content (QBIC) and content-based visual information retrieval (CBVIR) is the application of computer vision to the image retrieval problem, that is, the problem of searching for . Presentation at the 1998 IEEE (Institute of Electrical and Electronics Engineers, New York, www.ieee.org) A membership organization that includes engineers, scientists and students in electronics and allied fields. International Workshop on Multimedia Database Management Systems, Dayton/Fairborn, Ohio, August 1998. Available from http://www.iscs.nus.edu.sg/~lowwaich/mscproject.html. Motz, Arlene A. "Intranets -- An Opportunity for Records Managers." Records Management Quarterly. July 1998. Naisbitt, John. Megatrends. 1982. "Relevance Ranking." Facts on File World News at http://www.facts.com/cd/relrank.htm.1999. Sutton, Michael J. D. Document Management for the Enterprise. 1996. Wagar, W. Warren A Short History of the Future. 1989. Robert Sanders, Ph.D., CRM (Customer Relationship Management) An integrated information system that is used to plan, schedule and control the presales and postsales activities in an organization. , is the Manager of Records and Mail for the Los Angeles Los Angeles (lôs ăn`jələs, lŏs, ăn`jəlēz'), city (1990 pop. 3,485,398), seat of Los Angeles co., S Calif.; inc. 1850. Metropolitan Transportation Authority. He was previously an Associate Professor of History, Director of Records and Registration, Registrar, and Archivist ARCHIVIST. One to whose care the archives have been confided. at Pepperdine University Pepperdine University is a private institution of higher learning affiliated with the Church of Christ in unincorporated Los Angeles County, California, United States. The university's location overlooks the Pacific Ocean and is adjacent to the city limits of Malibu. . He may be reached at skeelgorli@aol.com. |
|
||||||||||||||||||

per·a·bun
dance n.
kō`pēə)
Printer friendly
Cite/link
Email
Feedback
Reader Opinion