Basis Technology Introduces the Rosette Arabic Language Analyzer; First Commercially Available Analyzer for Arabic Developed Entirely in the United States.Business Editors/High-Tech Writers CAMBRIDGE, Mass.--(BUSINESS WIRE)--March 4, 2003 Addresses the Needs of U.S. Government Agencies for Search and Retrieval of Arabic Documents Basis Technology, the leading provider of globalization globalization Process by which the experience of everyday life, marked by the diffusion of commodities and ideas, is becoming standardized around the world. Factors that have contributed to globalization include increasingly sophisticated communications and transportation software and services, today introduced the Rosette Rosette D’Albert’s pliable, versatile, talented, acknowledged bedmate. [Fr. Lit.: Mademoiselle de Maupin. Magill I, 542–543] See : Courtesanship (language) Rosette - A concurrent object-oriented language from MCC. (R) Arabic Language Arabic language Ancient Semitic language whose dialects are spoken throughout the Middle East and North Africa. Though Arabic words and proper names are found in Aramaic inscriptions, abundant documentation of the language begins only with the rise of Islam, whose main texts Analyzer (ARLA), the first commercially available analyzer for Arabic text developed entirely in the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. . ARLA is the latest addition to Basis Technology's suite of Rosette Language Analyzers, which also includes products for Chinese, Japanese, and Korean. Developed in response to the needs of the U.S. Intelligence Community, the new product is designed to plug into mainstream search engines and data mining products to facilitate search and retrieval of information written in Arabic. "One of the most pressing issues facing the Intelligence Community today is the need to quickly and accurately identify, analyze, and extract information in foreign languages and scripts," said Glenn Nordin, Assistant Director Intelligence Policy (Language), Department of Defense. "Because U.S. Government computer systems are largely designed to work with the Latin alphabet Latin alphabet or Roman alphabet Most widely used alphabet, the standard script of most languages that originated in Europe. It developed before 600 BC from the Etruscan alphabet (in turn derived from the North Semitic alphabet by way of the Phoenician and and US character sets, processing information in Arabic is a difficult undertaking. In the absence of universal transliteration standards, human transcript of foreign text into the Latin alphabet can result in significant corruption of the data and mismatches in searches. Finding solutions that enable intelligence analysts to extract and disseminate information in the original language and script could be of critical importance." ARLA is a multi-platform, high-performance linguistic engine for analyzing Arabic documents. It performs orthographic or·tho·graph·ic also or·tho·graph·i·cal adj. 1. Of or relating to orthography. 2. Spelled correctly. 3. Mathematics Having perpendicular lines. and lexical normalization In relational database management, a process that breaks down data into record groups for efficient processing. There are six stages. By the third stage (third normal form), data are identified only by the key field in their record. of text, including removal of grammatical affixes (such as conjunctions, prepositions, and pronouns) that complicate search and retrieval. ARLA utilizes advanced computational linguistics and specialized lexica lex·i·ca n. A plural of lexicon. to convert plural nouns, including broken plurals, to their singular forms. The new product is a component of the Rosette Globalization Platform, a comprehensive software suite which enables multilingual information processing. Other components include the Rosette Core Library for Unicode (RCLU), a portable framework for implementing Unicode, and the Rosette Language Identifier (RLI RLI Realtors Land Institute RLI Reserve Life Index (oil industry) RLI Rhodesian Light Infantry (Rhodesian Army Unit) RLI Retail & Leisure International RLI Resource List Interoperability ), which automatically identifies the language and encoding of incoming documents. RLI now supports over forty written languages, including Arabic, Farsi, transliterated Arabic, and transliterated Farsi. "Linguistics technology is beginning to play an increasingly important role when it comes to ensuring national security," said Everette Jordan, Director of the National Virtual Translation Center The National Virtual Translation Center (NVTC) is a United States government organization that provides "timely and accurate translations of foreign intelligence for all elements of the Intelligence Community. , an organization jointly sponsored by the FBI and CIA CIA: see Central Intelligence Agency. (1) (Confidentiality Integrity Authentication) The three important concerns with regards to information security. Encryption is used to provide confidentiality (privacy, secrecy). under the USA Patriot Act USA PATRIOT Act [Uniting and Strengthening America by Providing Appropriate Tools Required to Intercept and Obstruct Terrorists], 2001, U.S. . "Because of the enormous volume of multi-lingual intelligence information that must be analyzed with limited human resources, technologies that can assist in sifting, sorting, and finding critical information are essential in ensuring that threats are detected as quickly as possible. Whereas the U.S. Government cannot endorse any one product over another, we are pleased to see that companies are responding to the government's call for solutions to these difficult issues." "Search and retrieval of information in Arabic documents is highly complex," explained Glenn Adams, Technical Director Emeritus of the Unicode Consortium, and co-author of the Unicode Standard. "For example, Arabic incorporates affixes and infixes indicating grammatical elements such as conjugation conjugation, in genetics conjugation, in genetics: see recombination. conjugation, in grammar conjugation: see inflection. , prepositions, and pronouns. Searching through documents for an exact match to a particular search term will miss many relevant hits. Searching for "book" ("kitaab") will not return the Arabic term for "the books" ("alkutub"). ARLA solves this problem and many others like it, resulting in a more accurate and comprehensive search that doesn't miss relevant terms because of slight grammatical variations." Together with the other language components of the Rosette Globalization Platform, ARLA enables Federal law enforcement and intelligence agencies to expand their ability to detect and monitor intelligence originating in a foreign language, even when searching documents with terms which have been transcribed into the English alphabet. "A key issue when searching Arabic text is the fact that names may be transcribed into English with many varied spellings, even though there will be far fewer ways of writing the same name in Arabic," said Carl Hoffman, CEO (1) (Chief Executive Officer) The highest individual in command of an organization. Typically the president of the company, the CEO reports to the Chairman of the Board. of Basis Technology. "For example, there are over thirty different commonly-used English spellings for the name of Libya's ruler, all of which correspond to the unique spelling of his name in Arabic. Our software can be used to build applications that allow users to search and retrieve information in Arabic documents using "phonetic approximation"--spelling the name the way it sounds--without having knowledge of the many varied transliteration schemes. This significantly increases the likelihood of non-Arabic speakers locating the critical information for which they are searching." ARLA is available for immediate shipment with plug-ins either available or under development for Convera RetrievalWare(R), FAST Data Search(TM), Microsoft(R) SQL Server(TM), and Oracle(R) Text/interMedia. About Basis Technology Basis Technology is the leading provider of products and services for software globalization and multilingual information processing. The company provides high-performance, highly reliable software components through its Rosette(R) Globalization Platform, a suite of interoperable products designed for applications that analyze and process all the world's languages. The company also provides rapid deployment engineering services covering all aspects of globalization, including source code audits, project management, software re-engineering, and global quality assurance. Top-tier software vendors, content providers, multinational enterprises, and government agencies rely on Basis Technology's solutions for Unicode compliance, language identification, multilingual search, normalization, and transliteration. Customers include industry leaders Amazon.com, America Online, Convera, Fast Search & Transfer (FAST), Google, Hewlett-Packard, IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , L.L. Bean, Overture Services, PeopleSoft, Siebel Systems, Software AG, and Verity. Company headquarters are located in Cambridge, Massachusetts, with branch offices in San Francisco, California “San Francisco” redirects here. For other uses, see San Francisco (disambiguation). The City and County of San Francisco (EN IPA: [sænfrənˈsɪskoʊ] ; Herndon, Virginia; and Tokyo, Japan. For more information, visit www.basistech.com or call 800-697-2062. |
|
||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion