Printer Friendly
The Free Library
14,507,792 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

Speech for export.


Automating the translation of spoken words

The calller at the other end of the telephone line speaks a language that's completely foreign to you, and you can't tell what she wants. For help, you could look for a co-worker fluent fluent /flu·ent/ (floo´int) flowing effortlessly; said of speech.  in the language, or you could turn to a commercial telephone service that connects you with an interpreter A high-level programming language translator that translates and runs the program at the same time. It translates one program statement into machine language, executes it, and then proceeds to the next statement. .

But Alex Waibel, a computer scientist at Carnegie Mellon University Carnegie Mellon University, at Pittsburgh, Pa.; est. 1967 through the merger of the Carnegie Institute of Technology (founded 1900, opened 1905) and the Mellon Institute of Industrial Research (founded 1913).  in Pittsburgh, has a more high-tech solution in mind. He envisions the development of a computer system that recognizes speech in one language, translates the spoken words into another language, and feeds the translated text into a speech synthesizer synthesizer

Machine that electronically generates and modifies sounds, frequently with the use of a digital computer, for use in the composition of electronic music and in live performance.
.

"Real-time translation of telephone conversaitons is an ambitious project," he admits. It requires the integration of three capabilities - speech recognition, machine translation, and speech synthesis speech synthesis

Generation of speech by artificial means, usually by computer. Production of sound to simulate human speech is referred to as low-level synthesis. High-level synthesis deals with the conversion of written text or symbols into an abstract representation of
 - that by themselves present formidable difficulties.

In a demonstration staged last January, Waibel and his team, working with research groups in Germany and Japan, showed both the future promise of such technology and its present-day limitations. In scripted, three-way conversation, independent systems - at Carnegie Mellon, at the Interpreting Telephony Research Laboratories of Advanced Telecommunications Communicating information, including data, text, pictures, voice and video over long distance. See communications.  Research (ATR ATR Achilles tendon reflex, see Ankle reflex ) in Kyoto, Japan, and at Siemens A.G. in Munich, Germany - went through the rigmarole rig·ma·role   also rig·a·ma·role
n.
1. Confused, rambling, or incoherent discourse; nonsense.

2. A complicated, petty set of procedures.
 of obtaining information and registering for an international conference.

Pronouncing pro·nounc·ing  
adj.
Relating to, designed for, or showing pronunciation: a pronouncing dictionary. 
 his words distinctly and carefully, an English-speaking participant in the demonstration talked into a headset-mounted microphone connected to Carnegie Mellon's JANUS system. He said: "I would like to register for the conference." Seconds later, the voice synthesizer in Germany repeated: "Ich wuerde mich gerne zur Konferenz anmelden."

But if a speaker happened to stray Stray

(1) Not a member of the participating party in the trade at hand; (2) not a meaningful indication of a customer's desire to take a sizable position or be involved in a stock.
 from the script, going beyond the system's vocabulary of roughly 500 words, the computer would fail to produce a translation. "That's a problem," Waibel says. "We're now moving ahead, trying to break through these limitations."

Waibel seems just the right kind of person to be involved in this linguistic stew. He is fluent in both English and German, and his wife is Japanese. So he can readily check how well the machines are doing.

Beyond the intellectual challenge of a difficult research problem and his own interest in language understanding, Waibel also sees an unfulfilled need for technology that can aid communication among people speaking different languages. Although an increasing proportion of the world's population is learning English, these people are seldom really fluent in that language.

"Even in countries like Germany or Japan, people don't all speak English," Waibel notes. "There is actually a huge need for language translation."

As an example of the strong interest of users in having an interpreter available, Waibel cites the success of a relatively new enterprise known as AT&T Language Line Services. Since it started in the early 1980s, this telephony service - now offered 24 hours a day, seven days a week - has grown rapidly to encompass interpretation between English and 140 other languages. This requires a large staff of part-time interpreters, who work by telephone out of their homes at locations all over the United States United States, officially United States of America, republic (2005 est. pop. 295,734,000), 3,539,227 sq mi (9,166,598 sq km), North America. The United States is the world's third largest country in population and the fourth largest country in area. .

Frequent customers of the service include hospitals, insurance companies, and all manner of government agencies - anymore regularly dealing with U.S. residents who do not speak English. The service is also used by large and small businesses interested in cracking cracking - cracker  international markets and even by individuals trying to communicate with foreign visitors. Spanish is the most requested language, followed by French, German, Italian, Chinese, Japanese, Korean, and Vietnamese.

At the same time, "human interpreters are very costly and may not be required for some routine things," Waibel notes. "If you want to talk poetry or do international peace talks, you would hire the best interpreter you can get. But if you want to register for a conference, reserve a room at a hotel, plan a trip to Japan, you don't necessarily want to go through an expensive interpreter. You want to have a box that helps you along."

Among the various projects at Carnegie Mellon devoted to speech recognition, natural language understanding, and machine translation, Waibel's group has the distinction of emphasizing the application of neutral network - computer systems intended to mimic the brain - to speech recognition. Programmed to modify itself according to according to
prep.
1. As stated or indicated by; on the authority of: according to historians.

2. In keeping with: according to instructions.

3.
 whatever signals come into the system, the speech recognizer actually "learns" how to identify sounds and words.

"This allows a great deal of flexibility and robustness," Waibel says. "The technology has matured enough that we are in a position to produce a state-of-the-art speech recognizer comparable with the best based on any other technique."

In 1988, the Carnegie Mellon group teamed up with Japan's ATR - which was in the midst Adv. 1. in the midst - the middle or central part or point; "in the midst of the forest"; "could he walk out in the midst of his piece?"
midmost
 of a seven-year initiative devoted to speech translation - to provide the ATR system's English-language component. Meanwhile, Waibel started a speech translation laboratory at the University of Karlsruhe in Germany and then got the Munich-based Siemens company interested. The two together have developed a German-language counterpart.

Working largely independently but sharing ideas, the three groups used their own approaches to build somewhat different systems. Nonetheless, charged with the common task of facilitating conference registration, all three systems also had to work together.

Carnegie Mellon's component, JANUS, translates English-language speech into German or Japanese text. To begin with, a person talks into a microphone. The resulting signal is converted into digits and parceled into 10-millisecond segments. Each of these speech fragments is then converted into 16 numbers, representing the signal's strength in 16 frequency ranges.

A speech recognizer analyzes the segments, identifying the particular language sounds, or phonemes, involved. Looking for Looking for

In the context of general equities, this describing a buy interest in which a dealer is asked to offer stock, often involving a capital commitment. Antithesis of in touch with.
 patterns, it works out possible word combinations that seem to fit the identified sequence of phonemes and produces a list of candidate sentences, starting with the most likely possibility.

The translation part of the system then parses the top candidate, or works out its grammar in detail. Using this information, it converts the sentence into a special, intermediate language. The appropriate language generator then translates this intermediate form into either Japanese or German. Finally, the text is transmitted to computers in Japan or Germany, where speech synthesizers complete the process.

The Carnegie-ATR-Siemens/Karlsruhe collaboration is not the only speech translation effort under way. Last year, scientists at AT&T Bell Laboratories in Murray Hill Murray Hill may refer to one of the following places:
  • Murray Hill, Kentucky
  • Murray Hill, Manhattan, a residential neighborhood in New York City
  • Murray Hill, Queens, a different locality in New York City
  • Murray Hill, New Jersey
  • Murray Hill, Pennsylvania
, N.J., collaborated with researchers at Telefonica Investigacion y Desarollo in Spain to create a translator that could handle a 450-word vocabulary in Spanish and English. The system determined which language was spoken, translated the sentence into the other language, and "spoke" the new sentence, typically taking less than two seconds to complete the process.

To achieve this speed, the researchers found a way to use the same language model for both speech recognition and grammatical gram·mat·i·cal  
adj.
1. Of or relating to grammar.

2. Conforming to the rules of grammar: a grammatical sentence.
 analysis, saving a potentially time-consuming step. Moreover, the system - known as VEST for Voice English/Spanish Translator - handled sentences dealing only with currency exchange and routine banking transactions.

Indeed, the most successful systems now in use all have strictly limited vocabularies and topics of conversation. "If you have an expert system that knows all about currency exchange, then it's [easy] to translate sentences back and forth between languages - so long as they deal only with currency exchange," says David Roe David Roe (born 11 September 1965) is an English professional snooker player, and a four-time ranking tournament quarter-finalist. He has consistently held a Top-64 ranking since the 1988/1989 season, peaking at no. 13.  of Bell Labs. "What is hard is if you say 'bank' and you don't mean financial institution, you mean 'snowbank.'"

"That is where text translation machines usually run into problems," he adds. "They see a word and they cannot tell from the context what the sense of the word is."

One particularly successful system used in Canada translates weather forecasts with better than 99 percent accuracy between French and English. "Its saving grace is that it always deals only with weather forecasts," Roe comments.

The VEST system, demonstrated at Expo '92 in Seville, Spain, is part of an ongoing research effort at Bell Labs and Telefonica. The Spanish company already offers customers a system that recognizes the spoken words "uno," "dos," "tres," and so on, allowing someone using a dial telephone to make the same kinds of choices possible on a push-bottom phone.

At Bell Labs, Roe and his colleagues are working to improve speech translation systems by going back to the basics - looking for a superior method of speech recognition and for a better mathematical way of telling whether a given sequence of words is a valid sentence. "We also want to have translation from English into any of eight languages," Roe says.

Improved speech recognition remains one of the keys to better translation. A number of groups have recently demonstrated systems that indicate how far this technology has progressed in recent years (SN: 4/3/93, p.222). In one impressive showing, John Makhoul and his co-workers at BBN (BBN Technologies, Cambridge, MA, www.bbn.com) A consulting firm that participated in the development of some of the most extensive networks in the world, including ARPANET, which evolved into the Internet. It was founded in 1948 as a consulting service in acoustics by Dr.  Systems and Technologies in Cambridge, Mass., showed that a speech recognition system running on an ordinary workstation could readily handle a 20,000-word vocabulary, no matter who the speaker is and without unnatural pauses between the spoken words.

But it's still a giant leap from speech recognition to accurate, rapid translation of speech - especially as the vocabulary gets larger and speakers are no longer restricted to grammatical sentences. It would also be nice if the translation system could somehow provide feedback concerning what it doesn't understand about any particular utterance ut·ter·ance 1  
n.
1.
a. The act of uttering; vocal expression.

b. The power of speaking; speech: as long as I have utterance.

c.
.

"A human interpreter will carry on a dialog with a speaker in one language until the concept is clear before generating a message in the other language," Waibel remarks. "That's one of the things we're attempting to do in the second phase of our project."

That could be a handy capability when the system encounters the illformed sentences typical of spontaneous speech. "You want to allow people to speak spontaneously, without having to make sure they are speaking grammatically gram·mat·i·cal  
adj.
1. Of or relating to grammar.

2. Conforming to the rules of grammar: a grammatical sentence.
 correct sentences, using only certain words, or not coughing Coughing
Coughing helps break up secretions in the lungs so that the mucus can be suctioned out or expectorated. Patients sit upright and inhale deeply through the nose. They then exhale in short puffs or coughs. Coughing is repeated several times per day.
 in the middle of a sentence," Waibel says.

"But we're biting biting

pertaining to the characteristic behavior of performing a bite.


biting louse
see species of the insect suborder mallophaga.

biting midge
insects of the family ceratopogonidae.
 off a big chunk in going to spontaneous speech," adds Arthur E. McNair, a research programmer (1) A hardware device used to customize a programmable logic chip such as a PAL, GAL, EPROM, etc. See PROM programmer.

(2) A person who designs the logic for and writes the lines of codes of a computer program.
 with the JANUS project.

People in conversation naturally drift from topic to topic. Even in the seemingly seem·ing  
adj.
Apparent; ostensible.

n.
Outward appearance; semblance.



seeming·ly adv.
 benign realm of conference registration, a speaker may easily slip into subjects outside a system's expertise. When Waible and his team recorded actual registration dialogs at a real conference, they found that some people stuck to the topic, while others wanderd off on tangents. One woman went into a lengthy discussion of her recent divorce as a reason for asking the conference organizers to waive To intentionally or voluntarily relinquish a known right or engage in conduct warranting an inference that a right has been surrendered.

For example, an individual is said to waive the right to bring a tort action when he or she renounces the remedy provided by law for such
 the registration fee.

Hence, most research groups will continue to concentrate on the translation of small vocabularies restricted to a certain domain. "The system isn't going to let you talk about anything under the sun," Waibel says. In its new effort, Waibel's group will focus on the task of scheduling a meeting as the topic of conversation.

Spoken language also has subtleties that seem almost impossible to capture by machine: the tone of a remark, the level of politeness, even the latest terms in an ever-changing body of slang expressions. "You need to get a lot more information out out the input data than what's required for a simple data-retrieval task," Waibel notes. "That makes speech translation much more challenging than speech recognition."

All this puts true "translating telephones" into the distant future. "A number of corporate managers have become very interested in the dream - and it really is a dream - of having telephone conversations between people speaking different languages," Roe says. "There's no doubt that this provides some of AT&T's corporate incentive for keeping the project going. But we have to work very hard to keep from overselling Overselling is a term used in the web hosting industry to describe a situation in which a company provides hosting plans that are unsustainable if every one of its customers uses the full extent of services advertised.  the technology."

At the same time, speech recognition shows enough promise that the German government has just launched a major initiative - an eight-year project dubbed dub 1  
tr.v. dubbed, dub·bing, dubs
1. To tap lightly on the shoulder by way of conferring knighthood.

2. To honor with a new title or description.

3.
 Verbmobil - to develop a portable speech translator. And Japan's ATR is gearing up for the second phase of its effort.
COPYRIGHT 1993 Science Service, Inc.
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 1993, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:translating languages over the telephone
Author:Peterson, Ivars
Publication:Science News
Date:Oct 16, 1993
Words:1957
Previous Article:Exotic species prove costly immigrants. (Office of Technology Assessment report on environmental and economic impact of non-indigenous plant and...
Next Article:Gene, biochemical fixes sought for CF. (cystic fibrosis)
Topics:



Related Articles
Computer-aided realtime translation (CART) technology. (captioning live television programs)
The information superhighway: a bilingual opportunity for small business. (Hispanic Business Focus)
Solibo Magnificent.
Equal access for all.(Internet access for blind)(Brief Article)
ENCRYPTION CHIP SIDESTEPS U.S. EXPORT CONTROLS.(BUSINESS)
Valie Export: Galerie Charim. (Reviews: Vienna).(the work of artist Valie Export illustrates the mutual influence of reality and representation)
MASTORing new languages.(Mobile Lifestyle Advisor[R])(Brief Article)
Definition Du Jour.(Speech-World News)(what is a voice browser)(Brief Article)
Sharing CAT memories: numbers as words as songs.(Fully Automatic High Quality Machine Translation)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles