Printer Friendly

Basis Technology Introduces the Rosette Arabic Language Analyzer; First Commercially Available Analyzer for Arabic Developed Entirely in the United States.

Business Editors/High-Tech Writers

CAMBRIDGE, Mass.--(BUSINESS WIRE)--March 4, 2003

Addresses the Needs of U.S. Government Agencies

for Search and Retrieval of Arabic Documents

Basis Technology, the leading provider of globalization software and services, today introduced the Rosette(R) Arabic Language Analyzer (ARLA), the first commercially available analyzer for Arabic text developed entirely in the United States. ARLA is the latest addition to Basis Technology's suite of Rosette Language Analyzers, which also includes products for Chinese, Japanese, and Korean. Developed in response to the needs of the U.S. Intelligence Community, the new product is designed to plug into mainstream search engines and data mining products to facilitate search and retrieval of information written in Arabic.

"One of the most pressing issues facing the Intelligence Community today is the need to quickly and accurately identify, analyze, and extract information in foreign languages and scripts," said Glenn Nordin, Assistant Director Intelligence Policy (Language), Department of Defense. "Because U.S. Government computer systems are largely designed to work with the Latin alphabet and US character sets, processing information in Arabic is a difficult undertaking. In the absence of universal transliteration standards, human transcript of foreign text into the Latin alphabet can result in significant corruption of the data and mismatches in searches. Finding solutions that enable intelligence analysts to extract and disseminate information in the original language and script could be of critical importance."

ARLA is a multi-platform, high-performance linguistic engine for analyzing Arabic documents. It performs orthographic and lexical normalization of text, including removal of grammatical affixes (such as conjunctions, prepositions, and pronouns) that complicate search and retrieval. ARLA utilizes advanced computational linguistics and specialized lexica to convert plural nouns, including broken plurals, to their singular forms.

The new product is a component of the Rosette Globalization Platform, a comprehensive software suite which enables multilingual information processing. Other components include the Rosette Core Library for Unicode (RCLU), a portable framework for implementing Unicode, and the Rosette Language Identifier (RLI), which automatically identifies the language and encoding of incoming documents. RLI now supports over forty written languages, including Arabic, Farsi, transliterated Arabic, and transliterated Farsi.

"Linguistics technology is beginning to play an increasingly important role when it comes to ensuring national security," said Everette Jordan, Director of the National Virtual Translation Center, an organization jointly sponsored by the FBI and CIA under the USA Patriot Act. "Because of the enormous volume of multi-lingual intelligence information that must be analyzed with limited human resources, technologies that can assist in sifting, sorting, and finding critical information are essential in ensuring that threats are detected as quickly as possible. Whereas the U.S. Government cannot endorse any one product over another, we are pleased to see that companies are responding to the government's call for solutions to these difficult issues."

"Search and retrieval of information in Arabic documents is highly complex," explained Glenn Adams, Technical Director Emeritus of the Unicode Consortium, and co-author of the Unicode Standard. "For example, Arabic incorporates affixes and infixes indicating grammatical elements such as conjugation, prepositions, and pronouns. Searching through documents for an exact match to a particular search term will miss many relevant hits. Searching for "book" ("kitaab") will not return the Arabic term for "the books" ("alkutub"). ARLA solves this problem and many others like it, resulting in a more accurate and comprehensive search that doesn't miss relevant terms because of slight grammatical variations."

Together with the other language components of the Rosette Globalization Platform, ARLA enables Federal law enforcement and intelligence agencies to expand their ability to detect and monitor intelligence originating in a foreign language, even when searching documents with terms which have been transcribed into the English alphabet.

"A key issue when searching Arabic text is the fact that names may be transcribed into English with many varied spellings, even though there will be far fewer ways of writing the same name in Arabic," said Carl Hoffman, CEO of Basis Technology. "For example, there are over thirty different commonly-used English spellings for the name of Libya's ruler, all of which correspond to the unique spelling of his name in Arabic. Our software can be used to build applications that allow users to search and retrieve information in Arabic documents using "phonetic approximation"--spelling the name the way it sounds--without having knowledge of the many varied transliteration schemes. This significantly increases the likelihood of non-Arabic speakers locating the critical information for which they are searching."

ARLA is available for immediate shipment with plug-ins either available or under development for Convera RetrievalWare(R), FAST Data Search(TM), Microsoft(R) SQL Server(TM), and Oracle(R) Text/interMedia.

About Basis Technology

Basis Technology is the leading provider of products and services for software globalization and multilingual information processing. The company provides high-performance, highly reliable software components through its Rosette(R) Globalization Platform, a suite of interoperable products designed for applications that analyze and process all the world's languages. The company also provides rapid deployment engineering services covering all aspects of globalization, including source code audits, project management, software re-engineering, and global quality assurance.

Top-tier software vendors, content providers, multinational enterprises, and government agencies rely on Basis Technology's solutions for Unicode compliance, language identification, multilingual search, normalization, and transliteration. Customers include industry leaders, America Online, Convera, Fast Search & Transfer (FAST), Google, Hewlett-Packard, IBM, L.L. Bean, Overture Services, PeopleSoft, Siebel Systems, Software AG, and Verity.

Company headquarters are located in Cambridge, Massachusetts, with branch offices in San Francisco, California; Herndon, Virginia; and Tokyo, Japan. For more information, visit or call 800-697-2062.
COPYRIGHT 2003 Business Wire
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2003, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

Article Details
Printer friendly Cite/link Email Feedback
Publication:Business Wire
Date:Mar 4, 2003
Previous Article:Summus and TriBond Enterprises Partner to Bring Games to Wireless Phones; Tribond, the Popular Board Game That Has Sold over Three Million Copies, to...
Next Article:Courion Granted Interface Certification by SAP AG.

Related Articles
The Arabic Literary Heritage: The Developments of its Genres and Criticism.
The Sociolinguistic Market in Cairo: Gender, Class and Education.
[Winning the war of words]: At a time of international conflict, language can make or break peace.
"A Fiction of Authenticity": Contemporary Art Center St. Louis.
HIV/AIDS prevention and care gay and lesbian asylum reproductive health and sex education domestic violence and women empowerment female genital...
The Undergraduate's Companion to Arab Writers and Their Web Sites.
Scientists developing software to scan Arabic documents.
Fighting with words: bridging language and culture gaps through games.

Terms of use | Privacy policy | Copyright © 2018 Farlex, Inc. | Feedback | For webmasters