Finding needles in database haystacks.Computers have become the repositories of vast amounts of information, ranging from electronic messages and bulletins to newspaper articles, research papers, textbook materials, documents and dictionaries. Whereas storing large masses of information is relatively easy, retrieving particular items from such enormous stocks can prove both time consuming and frustrating frus·trate tr.v. frus·trat·ed, frus·trat·ing, frus·trates 1. a. To prevent from accomplishing a purpose or fulfilling a desire; thwart: . Text retrieval is especially difficult when a database contains material covering an unlimited range of subjects expressed in widely varying vocabularies. Because some words may mean different things in different contexts -- plasma, for example, -- conventional search and retrieval methods, which rely on indexes consisting of sets of key words or phrases, are unreliable and difficult to apply. Computer scientists Gerard Salton Gerard Salton (8 March, 1927 in Nuremberg - 28 August, 1995) was a Professor of Computer Science at Cornell University. Salton was perhaps the leading computer scientist working in the field of information retrieval during his time. and Chris Buckley of Cornell University Cornell University, mainly at Ithaca, N.Y.; with land-grant, state, and private support; coeducational; chartered 1865, opened 1868. It was named for Ezra Cornell, who donated $500,000 and a tract of land. With the help of state senator Andrew D. have now developed an alternative approach for extracting relevant information from a large, diverse database. Their scheme, described in the Aug. 30 SCIENCE, relies on automated techniques for evaluating the degree of similarity between different pieces of text. The method involves breaking down each piece of text into such units as sections, paragraphs and sentences, then assigning to each unit a set of terms used to represent its content. Suppose that a user of a digitally stored encyclopedia encyclopedia, compendium of knowledge, either general (attempting to cover all fields) or specialized (aiming to be comprehensive in a particular field). Encyclopedias and Other Reference Books wants to find all material related to astronomical as·tro·nom·i·cal also as·tro·nom·ic adj. 1. Of or relating to astronomy. 2. Of enormous magnitude; immense: an astronomical increase in the deficit. intruments. The user selects a single article, perhaps on telescopes, as her starting point Noun 1. starting point - earliest limiting point terminus a quo commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the . She then asks the computer to look for all other articles containing material similar to that in the telescope article. The computer proceeds by evaluating the degree of similarity, expressed according to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. a set of special formulas, between the telescope article and the material in the rest of the database. On the basis of those calculations, the computer then selects other articles that appear relevant to the topic. Instead of starting with a text excerpt ex·cerpt n. A passage or segment taken from a longer work, such as a literary or musical composition, a document, or a film. tr.v. ex·cerpt·ed, ex·cerpt·ing, ex·cerpts 1. or article already in the database, a user can also write out a request for information, expressed in English-language sentences that provide a good description of the required material. The scheme's efficiency and convenience depends on how effectively it identifies related text passages. Preliminary tests have proved encouraging, the researchers say. "No other text search and retrieval approach currently contemplated appears to offer equal promise for unrestricted text environments and arbitrary subject matter," they conclude. |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion