The Spoken Language Translation Research Laboratories at ATR recently demonstrated a system that shows a lot of promise. Two people, one Japanese and one native English speaker, each carried a PDA connected to earphones and a microphone. Each PDA used a wireless LAN card. The two speakers simulated a foreigner checking into a hotel. Each sentence that each speaker spoke was automatically converted into text form, translated and read aloud to the other person in the opposite language. The PDA also displayed a text record of the conversation.
There are still a few areas that need to be improved. First, getting the entire system out of a PDA/earphone/microphone set-up and into a mobile phone would greatly improve usability. Second, there is a slight delay in the translation time, which seems to vary based on the difficulty of the translation. In the recent demonstration, wait times averaged about 4 seconds for Japanese to English translations and 6 seconds for English to Japanese translations.
The size of the back-end support is very large. In order to demonstrate a tourist-based conversation within a limited setting, the tech staff had to string together six high-end PCs, a server and a database of 23,000 words. In order to have a system that could translate nearly any conversation, the database would have to be around 300,000 words. (The group is also presently working on a similar Japanese to Chinese translation system.)
It's very easy to see the market potential of a mobile phone translation system. Once the system worked effectively on a national basis, hypothetically two people could punch in a code on their mobile phones and communicate effectively across languages, regardless of whether they were in the same physical location of not.
The ATR is taking more than one-approach in getting computers and robots to speak directly to you. The Biological Speech Science Project in the Human Sciences Information Laboratories has created what might be called a plastic singing vocal chamber. They first took a Magnetic Resonance Image (MRI) of a person singing a vowel--"ahhhhhhhhh," for example. From the MRI, they created a stiff three-dimensional hollow plastic model of the mouth and throat. They hooked the model up to a frequency-vibration machine and created a sound similar to the one made by the original human. ("Similar" being defined here by a blue and a red line having similar patterns on a graph that goes far beyond the understanding of your Average Joe.)
Suggestions that they might next try to use pliable plastic connected to small motors to create human-like speech are shot down with a quick and pragmatic recitation of the project's goal--to find the source of "individuality" in the human voice. Both the shape of the vocal tract and the geometry of the lower pharynx contribute to the sound.
ATR's Media Information Science Research Laboratories' Senseweb could easily be mistaken for the Internet's equivalent of the Lava Lamp. Mention a word into the headset and a widescreen TV shows images gathered from Web sites that seem to bubble up out of the center of the screen. Touch the screen with your hands and you can move the images around, toss aside images that bore you and open those that interest you. Using the Senseweb is a bit like playing Tom Cruise in the movie Minority Report, only using two dimensions instead of three. The idea is to allow users not only to access a large amount of data intuitively, but also to have fun in controlling the flow and presentation of the data. It was designed in anticipation of future applications for entertainment, edu-tainment and art that will call for more playful and intuitive interactions.
Another project the Media Information Science Lab is working on is the collaborative capturing of interactions by multiple sensors. Ever wonder what would happen if a bunch of people each strapped on a head-mounted camera, a headset microphone, physiological sensors and a small personal computer, then walked into a room filled with stationary video cameras and microphones and looked at objects with LED sensors attached? No? Well, these folks have. The less than obvious aim here is to understand both verbal and non-verbal human interaction mechanisms, and have those interactions recognized by a computer.
ATR's Human Information Processing Research Laboratory has helped in the development of a wireless tongue pressure sensing system that allows users to maneuver electric equipment using only the tips of their tongues. For example, a quadriplegic person could ideally be able to control the movement of a wheelchair. The sensors in the mouth unit would control direction, and the magnitude of pressure on the sensors determines the wheelchair's speed of movement.
The system uses an onboard FM radio wave receiver and microprocessor to create drive signals to the wheelchair. Possible future applications of the technology include remotely controlling an electric bed, television, air conditioner, telephone and personal computer.
|Printer friendly Cite/link Email Feedback|
|Title Annotation:||The Pulse 1|
|Date:||Jan 1, 2004|
|Previous Article:||Ashikaga implodes.|
|Next Article:||Ratoc Systems' REX-Link1.|