Printer Friendly
The Free Library
14,557,847 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Finding needles in database haystacks.


Computers have become the repositories of vast amounts of information, ranging from electronic messages and bulletins to newspaper articles, research papers, textbook materials, documents and dictionaries. Whereas storing large masses of information is relatively easy, retrieving particular items from such enormous stocks can prove both time consuming and frustrating frus·trate  
tr.v. frus·trat·ed, frus·trat·ing, frus·trates
1.
a. To prevent from accomplishing a purpose or fulfilling a desire; thwart:
. Text retrieval is especially difficult when a database contains material covering an unlimited range of subjects expressed in widely varying vocabularies. Because some words may mean different things in different contexts -- plasma, for example, -- conventional search and retrieval methods, which rely on indexes consisting of sets of key words or phrases, are unreliable and difficult to apply.

Computer scientists Gerard Salton Gerard Salton (8 March, 1927 in Nuremberg - 28 August, 1995) was a Professor of Computer Science at Cornell University. Salton was perhaps the leading computer scientist working in the field of information retrieval during his time.  and Chris Buckley of Cornell University Cornell University, mainly at Ithaca, N.Y.; with land-grant, state, and private support; coeducational; chartered 1865, opened 1868. It was named for Ezra Cornell, who donated $500,000 and a tract of land. With the help of state senator Andrew D.  have now developed an alternative approach for extracting relevant information from a large, diverse database. Their scheme, described in the Aug. 30 SCIENCE, relies on automated techniques for evaluating the degree of similarity between different pieces of text. The method involves breaking down each piece of text into such units as sections, paragraphs and sentences, then assigning to each unit a set of terms used to represent its content.

Suppose that a user of a digitally stored encyclopedia encyclopedia, compendium of knowledge, either general (attempting to cover all fields) or specialized (aiming to be comprehensive in a particular field). Encyclopedias and Other Reference Books
 wants to find all material related to astronomical as·tro·nom·i·cal   also as·tro·nom·ic
adj.
1. Of or relating to astronomy.

2. Of enormous magnitude; immense: an astronomical increase in the deficit.
 intruments. The user selects a single article, perhaps on telescopes, as her starting point Noun 1. starting point - earliest limiting point
terminus a quo

commencement, get-go, offset, outset, showtime, starting time, beginning, start, kickoff, first - the time at which something is supposed to begin; "they got an early start"; "she knew from the
. She then asks the computer to look for all other articles containing material similar to that in the telescope article. The computer proceeds by evaluating the degree of similarity, expressed according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 a set of special formulas, between the telescope article and the material in the rest of the database. On the basis of those calculations, the computer then selects other articles that appear relevant to the topic. Instead of starting with a text excerpt ex·cerpt  
n.
A passage or segment taken from a longer work, such as a literary or musical composition, a document, or a film.

tr.v. ex·cerpt·ed, ex·cerpt·ing, ex·cerpts
1.
 or article already in the database, a user can also write out a request for information, expressed in English-language sentences that provide a good description of the required material.

The scheme's efficiency and convenience depends on how effectively it identifies related text passages. Preliminary tests have proved encouraging, the researchers say. "No other text search and retrieval approach currently contemplated appears to offer equal promise for unrestricted text environments and arbitrary subject matter," they conclude.
COPYRIGHT 1991 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1991, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:new method for text retrieval
Publication:Science News
Date:Sep 7, 1991
Words:366
Previous Article:Firms sweet on no- or low-cal sugar. (new sugar substitutes developed)
Next Article:Detecting the loss of encoded data. (new equipment for detecting improper synchronization)
Topics:



Related Articles
State consumer fraud act applied to accounting firm. (Illinois's Consumer Fraud and Deceptive Business Practices Act and Arthur Andersen & Co.)
Pigeonholing text.(Marc Damashek develops statistical text categorization and retrieval method called Acquaintance)(Brief Article)
Introduction.(examination of knowledge discovery in databases)
Introduction.
Exploiting Multimodal Context in Image Retrieval.
NOVEMBER CONFERENCE TACKLES TEXT RETRIEVAL SYSTEMS.(Brief Article)
Apache chock block.
Learning PHP.(TEACH IN - LANGUAGES)
Interactive exploration of non-indexed data.(Diamond, network software)
Watching the kids: Surveillance in the U.K.(childcare)(Brief article)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles