ABC-TV's 'Prime Time' to Showcase Role of Sensory's Fluent Animated Speech Technology in Teaching Deaf Children to Speak; Software Transforms Diane Sawyer's Image & Voice into Animation Demo.Business Editors/High-Tech & Education Writers NOTE TO MEDIA: Photo is available in a Smart News Release(TM) on Business Wire's Home Page at www.businesswire.com and at www.newstream.com SANTA CLARA Santa Clara, city, Cuba Santa Clara (sän`tä klä`rä), city (1994 est. pop. 217,000), capital of Villa Clara prov., central Cuba. , Calif.--(BUSINESS WIRE)--March 5, 2001 At the Tucker-Maxon Oral School in Portland, Ore., deaf children ages 6 through 12 are improving their listening and speech-production skills with the help of Baldi, a dome-headed, talking, computer-generated face. Tucker-Maxon's unusual talking tutor -- along with the powerful software technology that combines 3D animation with speech recognition and audio-visual generation of speech -- will be showcased on the ABC-TV news program "Prime Time" on Thursday, March 8 (10 p.m. Eastern). To demonstrate the power and accuracy of the software as a teaching tool for the profoundly deaf, the voice and face of "Prime Time's" co-anchor Diane Sawyer Please help [ rewrite this article] from a neutral point of view. Mark blatant advertising for , using . will be converted to a so-called conversational agent. The software that allows the animated face of Sawyer -- and Baldi -- to talk and be understood by Tucker-Maxon students is Sensory Inc.'s Fluent Animated Speech(TM) technology. Sensory, based in Santa Clara, is a leading provider of embedded speech technology. Origins of the Software Sensory's Fluent Animated Speech software had its beginnings through research and development efforts primarily at the Oregon Graduate Institute Center for Spoken Language Understanding The Center for Spoken Language Understanding (CSLU) is part of the Oregon Graduate Institute of Science and Technology Research Center, part of Oregon Health & Science University. The focus is on spoken language technologies. External links
Technology Behind the Tucker-Maxon Story With Sensory's Fluent Animated Speech technology, programmers and non-programmers alike can control the facial expressions, emotional expressions and lip synchronization Noun 1. lip synchronization - combining audio and video recording in such a way that the sound is perfectly synchronized with the action that produced it; especially synchronizing the movements of a speaker's lips with the sound of his speech of an animated 3D agent or avatar. At Tucker-Maxon, for example, educators with minimal computer skills easily design programs that both speak and listen. The software incorporates the animated face, Baldi, whose articulators are aligned with the utterances produced in either synthesized or natural speech. The motion of Baldi's lips, eyes and facial expressions add meaning to the words "spoken" by the computer. Around a topic chosen by the teacher, Baldi can ask a question; the student will be prompted to respond. That response will determine the next turn of the dialogue. "The ability to create realistic, talking characters is no longer of interest solely to professional animators or producers of motion pictures," said Todd Mozer, president and chief executive officer of Sensory. "Our Fluent Animated Speech technology will bring such capabilities within the reach of nearly everyone." Applications in Education and Beyond By achieving its unprecedented accuracy of speech and facial animation, Sensory's Fluent Animated Speech technology will enable animated characters to play roles in Internet-based commerce, entertainment and customer support as well as education. Possible applications include adding an animated agent to a text or voice message; automating an interactive web host or agent; adding personality and emotional expressions to a web character or message; and creating online games in which the players control the speech of the characters. New Animation Technology Represents a Breakthrough The Fluent Animated Speech technology employs a non-linear morphing technique that enables Sensory to take a few dozen static pictures and blend them to create a virtually unlimited assortment of expressions and articulations. The technology provides memorable, highly accurate real-time lip-synching, as well as the delivery of emotional content by a 3D animated agent, synchronized to a variety of speech and text sources. The 3D models can be created using off-the-shelf 3D graphics tools. The speech output comes from Sensory's Fluent Speech(TM) Text-to-Speech engine, which can reside in either a client or server environment. The Fluent Speech Text-to-Speech engine is an LPC (language) LPC - A variant of C designed ca 1988 to program LP MUDs. (linear predictive coding Linear predictive coding (LPC) is a tool used mostly in audio signal processing and speech processing for representing the spectral envelope of a digital signal of speech in compressed form, using the information of a linear predictive model. ), diphone-based speech synthesizer synthesizer Machine that electronically generates and modifies sounds, frequently with the use of a digital computer, for use in the composition of electronic music and in live performance. capable of expanding or contracting pitch periods and changing speech rates to produce a variety of sounds. The LPC approach makes it possible for the Fluent Animated Speech technology to synthesize high-quality speech using very little computer memory. The Fluent Animated Speech 3D animation comprises a general-purpose OpenGL- or Direct3D-based real-time 3D rendering engine and a viseme generation engine. (A viseme is the visual component of a phoneme phoneme Smallest unit of speech distinguishing one word (or word element) from another (e.g., the sound p in tap, which differentiates that word from tab and tag). The term is usually restricted to vowels and consonants, but some linguists include differences of pitch, , which is the smallest individual component of speech.) The viseme generation engine is a coarticulation Co`ar`tic`u`la´tion n. 1. (Anat.) The union or articulation of bones to form a joint. package that generates weighted morphing data (in the form of visemes) that drives the animated speech from either synthetic or natural speech. The coarticulation package is an important part of Sensory's special speech software code that enables animated characters to speak with realistic facial and mouth movements. In humans, coarticulation is the coordination by the brain of the lips, tongue and jaw to create the movements needed to produce adjacent vowels and consonants simultaneously during normal speech. Coarticulation ensures that speech is produced smoothly, and it spreads out acoustic information about a vowel or consonant to help a listener understand what is being said. With Sensory's coarticulation package, animated characters can communicate at five syllables per second - the same rate that humans produce speech. The Fluent Speech Animation technology's 3D rendering engine allows the rendering of arbitrary 3D models and uses a morphing-based approach to animation. Exporters for 3D authoring tools enable 3D models to be saved in a compatible format. Additionally, the Fluent Animated Speech technology can take advantage of other vendors' existing tools for the scripting of speech and facial content and the automatic generation of expressions and facial gestures. Users can control lighting and background images as well as the characters being animated, and AVI (Audio Video Interleaved) A Windows multimedia video format from Microsoft. It interleaves standard waveform audio and digital video frames (bitmaps) to provide reduced animation at 15 fps at 160x120x8 resolution. Audio is 11,025Hz, 8-bit samples. output is available. The Sensory technology comes with a selection of human and animal 3D models that include the mouth and facial targets required for animating (i.e., not every feature in a face or mouth needs to be animated - and thus modeled - for creating realistic speech). As a result, users can quickly create realistic animated characters, along with background environments. Price, Availability and System Requirements To be used efficiently, all computer software needs certain hardware components or other software resources to be present on a computer system. These pre-requisites are known as (computer) system requirements and are often used as a guideline as opposed to an absolute rule. Sensory's Fluent Animated Speech technology is available now. For networked applications, typical pricing is based on an Application Service Provider (ASP) model with an annual per-port fee. For embedded applications, pricing is under $2 per unit in volume. The technology currently runs under Windows 95/98/2000/ME on a minimum 266 MHz (MegaHertZ) One million cycles per second. It is used to measure the transmission speed of electronic devices, including channels, buses and the computer's internal clock. A one-megahertz clock (1 MHz) means some number of bits (16, 32, 64, etc. Pentium II The successor to the Pentium Pro from Intel. Pentium II refers to the CPU chip or the PC that uses it. Code named "Klamath," the Pentium II was a Pentium Pro with MMX multimedia instructions. processor with at least 64 MB of RAM. About Sensory, Inc. Founded in 1994, Sensory, Inc., is the leading provider of high-quality, low-cost speech recognition and speech synthesis speech synthesis Generation of speech by artificial means, usually by computer. Production of sound to simulate human speech is referred to as low-level synthesis. High-level synthesis deals with the conversion of written text or symbols into an abstract representation of technology. Sensory's speech technology is embedded in consumer products such as personal electronics, Internet appliances, interactive toys, and high-end telephone and automotive applications. Sensory offers a complete line of integrated circuit integrated circuit (IC), electronic circuit built on a semiconductor substrate, usually one of single-crystal silicon. The circuit, often called a chip, is packaged in a hermetically sealed case or a nonhermetic plastic capsule, with leads extending from it for (IC) and embedded software Instructions that permanently reside in a ROM or flash memory chip. Embedded software may be immediately available to the CPU or, for faster execution, may be transferred to RAM first and then executed. solutions, including the Interactive Speech(TM) line of low-cost ICs and the Fluent Speech(TM) large-vocabulary software engine. Sensory's customers include leading companies in the consumer electronics and embedded product markets, such as JVC JVC Victor Company of Japan (or Japan's Victor Company) JVC Jewelers Vigilance Committee JVC Jesuit Volunteer Corps JVC Jet Vane Control (directs VLS-launched missiles) JVC Jonker-Volgenant-Castanon , Hasbro, Mitsubishi, Mattel, Sega, Sharper Image, Fisher-Price, Sony, Tektronix, Toshiba, Uniden, VOS An operating system used in Stratus computers. FTX is Stratus' Unix operating system. and Westclox. More information is available from Sensory's web site at www.sensoryinc.com. Note to Editors: Interactive Speech, Fluent Speech and Fluent Animated Speech are trademarks of Sensory, Inc. All other trademarks are the property of their respective owners. Details about the Tucker-Maxon application is available at http://cslu.cse.ogi.edu/tm Note: A Photo is available at URL URL in full Uniform Resource Locator Address of a resource on the Internet. The resource can be any type of file stored on a server, such as a Web page, a text file, a graphics file, or an application program. : http://www.businesswire.com/cgi-bin/photo.cgi?pw.030501/bb8 |
|
||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion