Printer Friendly
The Free Library
5,679,357 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Pigeonholing text.


The enormous task of categorizing and retrieving information from the vast quantities of text stored in digital form has spurred the development of a variety of strategies for finding the textual tex·tu·al  
adj.
Of, relating to, or conforming to a text.



textu·al·ly adv.
 needle in the database haystack. Most of these automated au·to·mate  
v. au·to·mat·ed, au·to·mat·ing, au·to·mates

v.tr.
1. To convert to automatic operation: automate a factory.

2.
 techniques rely on the identification of specific words and phrases Words and Phrases®

A multivolume set of law books published by West Group containing thousands of judicial definitions of words and phrases, arranged alphabetically, from 1658 to the present.
 after sentences and paragraphs are stripped of extraneous ex·tra·ne·ous  
adj.
1. Not constituting a vital element or part.

2. Inessential or unrelated to the topic or matter at hand; irrelevant. See Synonyms at irrelevant.

3.
 material (SN: 9/7/91, p.155). However, such methods often require some degree of expert human participation in their development and setup. They have trouble with misspellings and garbled text, and they are usually suitable only for specific topics or languages.

Now, Marc Damashek of the Department of Defense's National Computer Security Center at Fort George G. Meade Fort George G. Meade, U.S. army post, 13,500 acres (5,460 hectares), central Md., between Baltimore and Washington, D.C.; est. 1917 as a World War I induction center. , Md., has developed a text categorization and retrieval technique that works equally well in any language and requires practically no human preparation. His method, known as Acquaintance, is purely statistical. "No prior information about document content or language is required," Damashek says.

His software divides text samples into sequences made up of a given number of consecutive characters, then computes how often each distinct sequence appears in the document. To gauge similarity, Damashek assumes that two documents showing comparable patterns are likely to deal with related subjects.

Tests of the technique show that it performs well for grouping documents by language, topic, and subtopic sub·top·ic  
n.
One of the divisions into which a main topic may be divided.
, Damashek says. He describes the method in the Feb. 10 Science.
COPYRIGHT 1995 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1995, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:Marc Damashek develops statistical text categorization and retrieval method called Acquaintance
Publication:Science News
Article Type:Brief Article
Date:Feb 25, 1995
Words:232
Previous Article:...And religious, social perks.(research indicates heart surgery patients with religious beliefs and social activity survive longer)(Brief Article)
Next Article:Taking a faster path.(new optimization method for long-range weather forecasting models)(Brief Article)
Topics:



Related Articles
Research Tradition in Occupational Therapy: Process, Philosophy and Status.
Finding needles in database haystacks. (new method for text retrieval)
Introduction.(examination of knowledge discovery in databases)
Template Mining for Information Extraction from Digital Documents.
Introduction.
Exploiting Multimodal Context in Image Retrieval.
NIST AUTHORS CONTRIBUTE TO ENCYCLOPEDIA OF COMPUTER SCIENCE.(National Institute of Standards and Technology)(Brief Article)
Automatic categorization: how it works, related issues, and impacts on records management. (Cover Story).(Statistical Data Included)
So you want to implement automatic categorization? Automatic categorization can be a powerful tool despite its limitations, but it is still important...
Improving performance support systems through information retrieval evaluation.

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles