VoiceXML 1.0 Comes To Call.Will it be a wrong number? Will the line be busy? Like videoconferencing, mainstream voice recognition and text-to-speech conversion have been "just around the corner" for the last ten years now. Today, it finally appears as though a viable standard for the integration of voice with applications is finally on the horizon. And, to no one's great surprise, the Web is the impetus behind the innovation. In the spring the World Wide Web Consortium announced its acceptance of VoiceXML 1.0 as a proposed standard. The W3C (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php). . will very likely incorporate VoiceXML into its Speech Interface Framework; for copyright reasons the standard will be known as the W3C Dialog markup language markup language Standard text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship among its parts. The most widely used markup languages are SGML, HTML, and XML. . A standard is expected by next summer. Already, a number of early adopters allow users to dial-in to the 'net and get the usual info slices--scores, quotes, weather--that have come to represent the first wave of any new data service on the Internet. But these sites--Tellme Networks, AudioPoint, and BeVocal among them--are a potpourri of differing commands, and data is presented in formats unsuited unsuited Adjective 1. not appropriate for a particular task or situation: a likeable man unsuited to a military career 2. to the telephones that access it. VoiceXML hopes to change all that through the creation of standard dialogs and data types. You Make The Call Retrieving information via a telephone is nothing new, of course: we're all familiar with the (complex and often frustrating) menu systems used by banks, airlines, and most corporate telephone systems. While some of these now use conversion technology for text-to-speech and vice versa VICE VERSA. On the contrary; on opposite sides. , these services are closed and generally not dynamic: they use a set menu system and fixed (or predictable) data sets. On the Web, information retrieval information retrieval Recovery of information, especially in a database stored in a computer. Two main approaches are matching words in the query against the database index (keyword searching) and traversing the database using hypertext or hypermedia links. is an open-ended process: a vocal request is made, converted to text, and plugged into a search engine. The search results are then converted to voice and given to the user. The problems with voice portals as they stand today, and the ones that VoiceXML hopes to solve, are the issues of sequencing and interactivity. As we all know, the beauty of the visual model of Web pages is that all the information on them is accessible at once, not sequentially. Data on a telephone, on the other hand, is available only through a sequential system of prompts. A good way to understand this issue is simply to imagine your email screen versus your voicemail system. With email, one glance at the screen gives you a tremendous amount of information at one time: number of messages, who they are from, what time they arrived, and even what the subjects of the messages are. Conversely, voicemail is a sequential medium: you must listen to all your messages, one after the other, to get any information at all. Linear technologies like voicemail are inherently slower because data is highly structured and cannot easily be reordered. This rubric RUBRIC, civil law. The title or inscription of any law or statute, because the copyists formerly drew and painted the title of laws and statutes rubro colore, in red letters. Ayl. Pand. B. 1, t. 8; Diet. do Juris. h.t. holds true for voice Web portals as well. When information is returned to the user, it is offered in list (sequential) form. Hence, to hear a page of search results, the user must listen to the entire list before getting all of the data. In addition, speech recognition technology has a historically poor performance record when used without first being trained in the ins-and-outs of each user's speech pattern. While this doesn't present a problem for dictation software that can be coached before being used, it does lead to difficulties when accessing multiple voice portals that may not be able to distinguish the nuances of speech among thousands of individuals. There are additional problems related to the lack of a standardized markup language for voice applications. According to according to prep. 1. As stated or indicated by; on the authority of: according to historians. 2. In keeping with: according to instructions. 3. the VoiceXML Forum, an industry consortium, current content providers face significant hardware and integration costs involved in deploying a voice service. A provider needs telephony hardware, a proprietary application in a proprietary API, and must integrate the new application with existing databases. With a voice markup language like VoiceXML, content providers will be able to develop voice applications using standard Web-development tools, publish applications on an existing Web server, and have a service provider handle VoiceXML interpretation. VoiceXML will also allow easier integration with back-end databases that can also be shared with any HTML application An HTML Application (HTA) is a Microsoft Windows application written with HTML and Dynamic HTML. The ability to write HTAs was introduced with Microsoft Internet Explorer 4.0. HTAs can be made from regular HTML files by simply changing the file extension to .hta. , and application development and deployment can be separated. "With VoiceXML, developers will be able to add voice access just like a browser 'skin'," says Jeff Kunins, manager of Telime Studio, the company's development platform. TeIlme's voice app testing site, at studio.tellme.com, allows any developer to create VoiceXML code and then dial in to the company's servers and test the application. VoiceXML is based on standards work done by AT&T, IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries) , Lucent, and Motorola in the mid-1990s. From these efforts came a number of voice markup languages
But while these languages have been useful, they are usually targeted at specific application areas-- in general the ones closest to each vendor's core competency A core competency is something that a firm can do well and that meets the following three conditions specified by Hamel and Prahalad (1990):
["Synchronous Operations as First-Class Values", J.H. Reppy <jhr@research.att.com>, Proc SIGPLAN 88 Conf Prog Lang Design and Impl, June 1988, pp. 250-259]. and PhoneWeb), mobile productivity for Motorola (VoxML), and so on. VoiceXML attempts to unify these various voice efforts by integrating support for numerous communications mechanisms, including touch-tone input, automatic speech recognition support, audio recording (voicemail), playback (WAV), speech synthesis speech synthesis Generation of speech by artificial means, usually by computer. Production of sound to simulate human speech is referred to as low-level synthesis. High-level synthesis deals with the conversion of written text or symbols into an abstract representation of , call transfer, and conferencing. Like all standards efforts, some pre-existing proprietary languages will likely be lost in the shuffle to create a unified voice markup language. The W3C's Dave Raggett expects that VoiceXML will supplant sup·plant tr.v. sup·plant·ed, sup·plant·ing, sup·plants 1. To usurp the place of, especially through intrigue or underhanded tactics. 2. earlier but related voice dialog markup languages. "The Speech Interface Framework [also] includes markup languages for speech synthesis, speech grammars and reusable dialog components," Raggett says. Potentially, VoiceXML will give users the ability to interact with Web sites and information in ways similar to--though not exactly the same as--the methods available from PCs. For example, when presented with a long list of restaurants, a user might ask for only those within two miles and only those that serve alcohol. VoiceXML will allow developers to create standard search forms, post methods and display objects that can be used for any number of different applications, from ordering groceries to requesting a stock quote. VoiceXML also allows developers to utilize multiple forms of user interaction, including voice commands and DMTF (Distributed Management Task Force, Inc., Portland, OR, www.dmtf.org) An industry consortium founded in 1992 that is involved with the development, support and maintenance of management standards for PCs. Its goal is to reduce the cost and complexity of PC management. menus. Figs i and 2 illustrate two potential VoiceXML programming methods: DMTF menus and voice-response forms, respectively. VoiceXML Application Development Because Voice-XML is so new, few development packages exist. One application announced in June is Nuance Communications' V-Builder 1.0, one of the earliest development tools for building VoiceXML-based speech recognition applications. Nuance also announced a Voice Site Staging Center that provides developers (within the Nuance Developer Network) with a free service to test applications running on a hosted voice browser A voice browser is a web browser that presents an interactive voice user interface to the user. In addition, it typically provides an interface to the PSTN or a PBX. Just as a visual web browser works with HTML pages, a voice browser operates on pages that specify voice dialogues. . "Just as HTML HTML in full HyperText Markup Language Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web. fueled the growth of the World Wide Web, VoiceXML is the standard markup language that is enabling the rapid evolution of the voice Web," Steve Ehrlich, vice president of marketing at Nuance, said in a statement. V-Builder enables developers to drag and drop A graphical user interface (GUI) capability that lets you perform operations by moving the icon of an object with the mouse into another window or onto another icon. For example, files can be copied or moved by dragging them from one folder to another. VoiceXML objects and Nuance's own SpeechObjects into an application and customize them without writing code. A graphical grammar editor and prompt management system simplify the definition of valid input into speech applications and the prompts that are played to callers. Resulting voice sites can use Nuance's speech recognition and voice authentication (1) Verifying the integrity of a transmitted message. See message integrity, e-mail authentication and MAC. (2) Verifying the identity of a user logging into a network. software to enable telephone access to a broad range of applications. V-Builder runs on Win NT and can be integrated or bundled with third-party Interactive Voice Response (IVR (Interactive Voice Response) An automated telephone information system that speaks to the caller with a combination of fixed voice menus and data extracted from databases in real time. ) tools and Web development tools. Press Or Say "I" New standards and development tools notwithstanding, one question that industry observers must ask, of course, is whether or not consumers are really screaming for IP-based voice services. The Web-over-phone technology model--much less the Web-over-phone business model--remains new and unproven. Are users clamoring for MovieFone-style access to Web services (1) Loosely, any online service delivered over the Web. Such usage appears in articles from non-technical sources, but not in IT-oriented publications, because definition #2 below describes the correct use of the term. ? Indeed, the fastest growing area of Web access--wireless--is thanks in large part to PDA-based graphical user interfaces graphical user interface (GUI) Computer display format that allows the user to select commands, call up files, start programs, and do other routine tasks by using a mouse to point to pictorial symbols (icons) or lists of menu choices on the screen as opposed to having to , not to mobile phone-based voice access. Can the pioneers in voice application development offer the average user content that is compelling enough to make this new Web access paradigm successful? More to the point, when everyone can browse the Web with their Palm or PocketPC, will they want or need to dial in using their phone? In fact, the companies themselves have yet to figure out the best way to connect handset users with online content. A few recent deals indicate that the new voice players are taking divergent roads as they try to determine which path will lead to acceptance for voice-to-Web access. Telime Networks has gone the more traditional route, partnering with AT&T in a $60 million deal to help the phone giant integrate voice access with its other service offerings. Quack, another voice start-up, is taking a different route: it has partnered with Lycos to voice-enable that portal and offer Lycos users phone-based access to their personal Web page content. Yet other companies are going the vertical route, seeking partnerships with companies in the banking and financial industries. Speech Works, which went public in August, will be developing voice-access services for First Union's call centers allowing callers to get account information and conduct transactions. In many ways, advances in speech-to-text/TTS conversion and the wireless explosion have conspired to create a technology because it's possible, not necessarily because it solves a problem- at least not today. "For data entry, people are more comfortable with a GUI (Graphical User Interface) A graphics-based user interface that incorporates movable windows, icons and a mouse. The ability to resize application windows and change style and size of fonts are the significant advantages of a GUI vs. a character-based interface. right now," concedes Peter Lefkin, executive director of the VoiceXML Forum. As voice access as a technology matures, this may change. "Voice technology needs to be smooth, quick, and efficient for it to be engaged," says Motorola's Carl Clouse, chairman of the VoiceXML Forum's marketing committee. Of course, voice-to-Web access may be one of those technologies that we didn't know we needed until we had it. Tellme's Jeff Kunins makes the point that there are one billion telephone users out there but only a few hundred million browser users; clearly there is a potential audience for the voice Web. But when voice application company execs rave about the endless possibilities that will result from having my messages read to me over the telephone, I generally just smile. After all, this technology has been around for some time. They call it voicemail. |
|
||||||||||||||||||||

Printer friendly
Cite/link
Email
Feedback
Reader Opinion