Printer Friendly
The Free Library
14,757,922 articles and books
Member login
User name  
Password 
 
Join us Forgot password?

IBM and speech technology: an interview with Bruce Morse.


To give readers a more comprehensive picture of where IBM (International Business Machines Corporation, Armonk, NY, www.ibm.com) The world's largest computer company. IBM's product lines include the S/390 mainframes (zSeries), AS/400 midrange business systems (iSeries), RS/6000 workstations and servers (pSeries), Intel-based servers (xSeries)  stands today in the realm of speech technologies, Customer Interaction Solutions recently spoke with Bruce Morse, vice president of Contact Center Solutions for the IBM Software Group. Morse is responsible for establishing IBM as a major software provider for developing, deploying and managing contact center solutions. He has over 25 years' of software and hardware experience in the information processing information processing: see data processing.
information processing

Acquisition, recording, organization, retrieval, display, and dissemination of information. Today the term usually refers to computer-based operations.
 industry and has held executive positions in marketing, development, finance and business development. Prior to his current role, he was vice president, marketing, sales and business development for IBM's Pervasive Computing Refers to the use of computers in everyday life, including PDAs, smartphones and other mobile devices. It also refers to computers contained in commonplace objects such as cars and appliances and implies that people are unaware of their presence.  business. In that role, he built a number of strategic alliances that established industry software specifications and standards, and he significantly expanded IBM's software and services participation in the wireless/mobile and speech markets.

**********

CIS Cis (sĭs), same as Kish (1.)


(1) (CompuServe Information Service) See CompuServe.

(2) (Card Information S
: Historically, from where did IBM's speech technologies grow in IBM's product family?

Morse: IBM's interest and investment in speech recognition began at IBM Research IBM Research, a division of IBM, is a research and advanced development organization and currently consists of eight locations throughout the world and hundreds of projects.  over 30 years ago. We anticipated that as the technology matured it would become the preferred method of accessing and interacting with information technology in a wide variety of scenarios. We're now at an inflection point Inflection Point

An event that changes the way we think and act.
-Andy Grove, Founder of Intel.

Notes:
For example, the fall of the Berlin Wall was an inflection point in global politics and the commercialization of the Internet was an inflection point in technology.
 in speech recognition where users find it to be a satisfactory and pleasant way to do personal and company business.

IBM was the first to use a purely statistical approach to voice technology while others attempted to teach a computer how to mimic human linguistics. The early 1990s featured IBM dictation software. A few years later, IBM's first speech recognition software family, VoiceType, was produced. IBM ViaVoice IBM ViaVoice is a range of language-specific continuous speech recognition software products offered by IBM. Individual language editions may have different features, specifications, technical support, and microphone support.  products were introduced in the late 1990s, and they continue to evolve today in offerings such as IBM Embedded Inserted into. See embedded system.  ViaVoice, which speech-enables personal digital assistants (PDAs) and in-vehicle telematics Originally coined to mean the convergence of telecommunications and information processing, the term later evolved to refer to automation in automobiles. GPS navigation, integrated hands-free cellphones, wireless communications and automatic driving assistance systems all come under the .

IBM speech technologies are now an integral part of the WebSphere family of products. They leverage WebSphere process and application integration capabilities to model, simulate and optimize business processes, and to reliably and seamlessly exchange data between multiple applications.

As a technology company that has helped millions of customers make smart IT investments, IBM is uniquely positioned to help companies extend access to those systems to their employees, customers and business partners. Just as the personal computer and Web browser The program that serves as your front end to the Web on the Internet. In order to view a site, you type its address (URL) into the browser's Location field; for example, www.computerlanguage.com, and the home page of that site is downloaded to you.  have opened up application access to millions of users, speech technology extends access to the two billion telephones in the world today, as well as to all kinds of mobile devices.

As the most natural way to interact, speech is at the beginning of a tidal wave tidal wave, term properly applied to the crest of a tide as it moves around the earth. The wavelike upstream rush of water caused by the incoming tide in some locations is known as a tidal bore.  in contact centers, devices and automobiles. Speech allows people to interact easily and cost-effectively; it improves customer service and lowers cost. The return on investment (ROI (Return On Investment) The monetary benefits derived from having spent money on developing or revising a system. In the IT world, there are more ways to compute ROI than Carter has liver pills (and for those of you who never heard of that expression, it means a lot). ) for speech-enabled applications can be dramatic.

CIS: Why do you believe that speech is best delivered in an on-demand model?

Morse: In today's business Today's Business is a show on CNBC that aired in the early morning, 5 to 7AM ET timeslot, hosted by Liz Claman and Bob Sellers, and it was replaced by Wake Up Call on Feb 4, 2002.  environment, companies have to be flexible, responsive and able to take advantage of opportunities instantly. That is the essence of the on-demand model. As a primary interface to a company's customers, speech-enabled applications are at the forefront of the on-demand model. Contact centers worldwide are increasingly looking at integrating all methods of customer interaction, including Web and telephone, to ensure a consistent customer experience, reduce cost and drive revenue growth through cross-selling and upselling. Speech-enabled contact centers ensure that up-to-the-minute customer information is available and leveraged across multiple communications channels. For example, a retail bank may want to know when a customer calls requesting forms to apply for a home equity loan so [the bank] can immediately route [the customer] to a live agent, bypassing the speech application entirely in order to close the business quickly. When interest rates change, the bank may want to change its Web and speech-enabled applications immediately to cross-sell certain offerings over others. IBM provides highly flexible and customizable speech solutions built on the highly acclaimed WebSphere Application Server platform.

CIS: Why do you believe a company like IBM is better suited to offer speech than its many niche competitors?

Morse: Speech has evolved into a mature enabling technology that reaches far beyond turning spoken words into text. Speech extends access and interaction to an enterprise's data and business processes, improving customer service while reducing the total cost of completing a transaction. Integrating speech access to business processes in a cost-effective, flexible and secure way requires a deep understanding of the enterprise's IT infrastructure and business processes. IBM's position as the leading middleware provider and our expertise in business process transformation uniquely position us to help our clients leverage speech to improve customer service, reduce cost and drive incremental Additional or increased growth, bulk, quantity, number, or value; enlarged.

Incremental cost is additional or increased cost of an item or service apart from its actual cost.
 revenue.

IBM is recognized around the world as one of the pioneers in speech research and development, with deep expertise to analyze, design and deploy speech-enabled applications. IBM's research organization has over 30 years' of experience in speech. It is highly skilled in voice user interface design, persona development and grammar, has more than 250 speech patents and over 100 researchers worldwide in speech labs, including China, Haifa, Tokyo, India and Almaden, working in more than 15 languages. Our work ranges from contact centers to mobile devices to automobiles. IBM is a leader in driving and incorporating speech standards such as VoiceXML, MRCP MRCP Member of Royal College of Physicians.

MRCP
abbr.
Member of the Royal College of Physicians
 and W3C (World Wide Web Consortium, www.w3.org) An international industry consortium founded in 1994 by Tim Berners-Lee to develop standards for the Web. It is hosted in the U.S. by the Computer Science and Artificial Intelligence Laboratory (CSAIL) at MIT (www.csail.mit.edu/index.php). . We work with companies of all sizes. IBM was the first to deploy natural language understanding in an automated contact center. For two consecutive years, JD Power and Associates surveys rating customer satisfaction with in-car navigation systems found the top cars were from Honda and Acura, which use IBM's Embedded ViaVoice speech recognition technology. Our contact center customers have found our speech solutions improve call retention rates by six to 10 percent, cutting call times by 10 percent and decreasing costs by up to 90 percent compared to assisted services.

IBM is also helping the large community of developers, ISVs and customers deploy and manage speech enablement. We have made significant contributions to the speech industry, through open standards Specifications for hardware and software that are developed by a standards organization or a consortium involved in supporting a standard. Available to the public for developing compliant products, open standards imply "open systems;" that an existing component in a system can be replaced  work on VoiceXML, CCXML CCXML Call Control Extensible Markup Language
CCXML Call Control Xml Interpreter
 and MRCP, as well as to the Eclipse Foundation The Eclipse Foundation leads the development of Eclipse, the open-source Java application platform and IDE. History
In 2003–2004 the Eclipse Consortium, an unofficial consortium of software industry vendors led by IBM, founded The Eclipse Foundation, a
, including our recently announced contributions of VoiceXML and CCXML editors. In addition, we recently announced our contribution to the Apache Foundation of the Reusable Dialog Components (RDC RDC Republique Democratique du Congo (French)
RDC Rez de Chaussee (French: Ground Floor)
RDC Red Deer College
RDC Remote Desktop Connection (Microsoft)
RDC Rowan Companies, Inc
) Framework. A barrier to the adoption of speech capabilities has been the skills required for high-quality voice user interfaces. By moving the requirement into the building of RDCs that can be joined by application developers, IBM enables experienced application developers to concentrate on what they know best, while skilled voice user interface designers do their work up front, in the RDC.

IBM regularly participates in performance improvements and transformation efforts for the world's leading organizations through our management consulting group Management Consulting Group, PLC is a consulting company listed on the London Stock Exchange under the name MMC. As of 2006 they were ranked the twelfth largest operational consultancy firm in the world. , IBM Business Consulting Services. Our ongoing involvement with all of the major industries gives us a deep understanding of industry business models. Our teams ensure that our solutions are relevant, practical and well thought out.

[ILLUSTRATION OMITTED]

CIS: On what applications for speech is IBM focusing?

Morse: IBM is focused on developing and offering first-class speech capabilities and tools, while our business partners and customers provide targeted speech-enabled applications. We are focused on three primary areas:

* Contact center functionality, such as call routing and natural language understanding.

* Multimodal interaction Multimodal interaction provides the user with multiple modes of interfacing with a system beyond the traditional keyboard and mouse input/output. The most common such interface combines a visual modality (e.g. , or the ability to combine multiple input/output methods in the same interaction or session. IBM's WebSphere software integrates different modes of data entry--speech, keyboard strokes, visual and handwriting-recognition technology. For example, one of our customers developed speech, keyboard and handwriting-enabled input and output applications on handhelds used by doctors in the pediatric pediatric /pe·di·at·ric/ (pe?de-at´rik) pertaining to the health of children.

pe·di·at·ric
adj.
Of or relating to pediatrics.
 intensive care unit of Miami Children's Hospital A children's hospital is a hospital which offers its services exclusively to children. The number of children's hospitals proliferated in the 20th century, as pediatric medical and surgical specialties separated from internal medicine and adult surgical specialties. . Healthcare providers can give spoken commands to access and input patient information and can enter repetitive data using multiple modes of interaction.

* Embedded speech in telematics (e.g., vehicles), devices (e.g., cell phones, PDAs, etc.) and other consumer appliances (e.g., set top boxes, DVD players).

For example, IBM Embedded ViaVoice technology in OnStar provides, on some models, the basis for a hands-free, in-vehicle, safety, security and communication service, putting the company at the forefront of the automotive telematics industry.

CIS: Speech has historically been considered a "high-maintenance" technology. How is IBM carrying out its promise to lower development time and complexity?

Morse: There are two million to three million J2EE (Java 2 Platform, Enterprise Edition) A platform from Sun for building distributed enterprise applications. J2EE services are performed in the middle tier between the user's machine and the enterprise's databases and legacy information systems.  developers in the marketplace, and our tooling and open source strategy has been to enable this highly skilled group to expand its reach into speech enablement. By creating plug-ins to the Eclipse framework, we help developers leverage their existing skills in Web development to extend to speech. We are contributing to the speech industry's efforts in order to shorten development time and decrease complexity through our commitment to open standards such as VoiceXML, CCXML, MRCP, xHTML and X+V. In addition, we have donated approximately 20 VoiceXML Reusable Dialog Components (RDCs) to the open source community through IBM's Alphaworks.

As more and more of the speech ecosystem adopts and writes to the RDC framework, the time and the skills needed to deploy will come down considerably. By moving the voice user interface (VUI (Voice User Interface) A voice-controlled application on a computer, PDA or smartphone. A VUI is more sophisticated than an interactive voice response (IVR) system. It implies a wide range of commands rather than just voicing "yes" or "no." Contrast with GUI. ) skill from the application layer to the RDC layer, we leverage the skills up front that are most in demand, which allows the J2EE developer to take advantage of the best practices already deployed internally in the RDC. IBM donated the framework and example tags to the Apache Software Foundation (open source, body) Apache Software Foundation - (ASF) An umbrella consortium that manages the development of the Apache web server, dozens of XML- and Java-based projects (under the name Jakarta), the Ant build tool, the Geronimo J2EE server, the SpamAssassin anti-SPAM tool, and  last fall, and we made them available to interested members of the community through the Apache Taglibs sandbox A restricted environment in which certain functions are prohibited. For example, deleting files and modifying system information such as Registry settings and other control panel functions may be prohibited.  project. The financial value of this contribution was approximately $10 million.

CIS: Many companies still don't understand why they need speech, or if they do, they don't understand what's involved in implementing it. How is IBM helping customers to understand the benefits?

Morse: We have worked with a variety of clients to successfully implement speech solutions. The best way to communicate the benefits of these solutions, and what's involved in implementing them, is to use case studies and to describe the dramatic return on investment that many companies achieve once the solutions are deployed. We share these stories on our Web site, in our press releases and in our advertising. We publish technical papers that describe the implementation effort. Most important, our worldwide sales, services and consulting teams show customers the benefits of speech in hundreds of one-on-one customer engagements every year, as well as at many industry trade shows and events.

CIS: What level of knowledge must a user possess in order to administer and make changes to call flows?

Morse: First, the adoption of the VoiceXML standard has changed the way we administer contact center applications. We have moved the business logic away from the proprietary interactive voice response (IVR (Interactive Voice Response) An automated telephone information system that speaks to the caller with a combination of fixed voice menus and data extracted from databases in real time. ) scripting language A high-level programming, or command, language that is interpreted (translated on the fly) rather than compiled ahead of time. A scripting, or script, language may be a general-purpose programming language or it may be limited to specific functions used to augment the running of an  to the Web application server. This has been a game-changing event, as the administration and development of speech-enabled applications moves to the millions of J2EE developers, therefore opening up the ability to manage call flows to a much larger community of developers.

Second, IBM has Eclipse-based plug-ins, such as the Call Flow Builder. It allows for graphical drag-and-drop modifications to call flows, making call flow maintenance an intuitive administrative step that does not require the knowledge of a proprietary scripting language.

The implementation of the call flows is also greatly simplified with the advent of VoiceXML. Preexisting pre·ex·ist or pre-ex·ist  
v. pre·ex·ist·ed, pre·ex·ist·ing, pre·ex·ists

v.tr.
To exist before (something); precede: Dinosaurs preexisted humans.

v.intr.
 scripts used for a particular task can be reused by the speech application, so there is no need to redevelop re·de·vel·op  
v. re·de·vel·oped, re·de·vel·op·ing, re·de·vel·ops

v.tr.
1. To develop (something) again.

2.
 scripts for existing tasks.

CIS: What's the average implementation time, using a midsized company as an example?

Morse: The length of a speech implementation project is dependent on many factors. It should be broken up into several distinct phases, which include: business and application objectives; usability and human factors; business process integration; call flow design; development; testing; and deployment and post-deployment tuning. The final area that can impact the schedule is the level of training the customer has (which is why most of our initial deployments are done in conjunction with a very skilled systems integrator). Assuming all of these phases are included, implementation of a simple speech-enabled application can range from one to six months. A project of medium complexity can take three to nine months, and a complex application takes six months to one year.

Using standards-based programming techniques such as VoiceXML, the development, testing and deployment elements can be done more efficiently by reusing applications and application components that the enterprise has already developed and deployed, thus reducing implementation time and ensuring a rapid return on investment.

CIS: Is speech technology feasible for smaller companies?

Morse: Speech is a technology that can offset contact center costs, which makes it a very good source of bottom-line return for small companies. Implemented correctly, it can also improve customer satisfaction and generate revenue through upselling and cross-selling. It allows a small company to establish a unique persona and to gain differentiation in the marketplace.

There are many IBM business partners that offer tailored speech-enabled application solutions to small to medium-sized businesses. Although some small companies have the in-house expertise to deploy speech in their own environment, others may find it more cost-effective to outsource the speech elements, and a number of solutions are now available for them to do so.

If you are interested in purchasing reprints of this article (in either print or HTML HTML
 in full HyperText Markup Language

Markup language derived from SGML that is used to prepare hypertext documents. Relatively easy for nonprogrammers to master, HTML is the language used for documents on the World Wide Web.
 format), please visit Reprint reprint An individually bound copy of an article in a journal or science communication  Management Services online at www.reprintbuyer.com or contact a representative via e-mail at reprints@tmcnet.com or by phone at 800-290-5460.

For information and subscriptions, visit www.TMCnet.com or call 203-852-6800.
COPYRIGHT 2005 Technology Marketing Corporation
No portion of this article can be reproduced without the express written permission from the copyright holder.
Copyright 2005, Gale Group. All rights reserved. Gale Group is a Thomson Corporation Company.

 Reader Opinion

Title:

Comment:



 

Article Details
Printer friendly Cite/link Email Feedback
Title Annotation:SPEECH-WORLD[TM]
Publication:Customer Interaction Solutions
Article Type:Interview
Geographic Code:1USA
Date:Apr 1, 2005
Words:2242
Previous Article:Reusable dialog components.(Definition Du Jour)(Brief Article)
Next Article:IBM technology aiding Children's Hospital ICU.(SPEECH-WORLD[TM])(intensive care unit )
Topics:



Related Articles
From the editor.(report on Japanese economy)(Editorial)
WINNERS BRING HOME SPOILS SCV STUDENTS' MEDALS GLITTER FROM ACADEMIC DECATHLON.(News)
Principled figure unveiled.(General News)(A statue of the outspoken senator is dedicated at the remodeled free speech plaza outside the courthouse)
At the intersection of 8th and free speech.(General News)(An impromptu forum for public discourse pays tribute to the late Wayne Morse)
IBM technology aiding Children's Hospital ICU.(SPEECH-WORLD[TM])(intensive care unit )
Citing `illegal activities,' county cuts microphone to free speech program.(Government)
Youth program unfairly blamed for problems at courthouse plaza.(Columns)(Column)
Pulling the plug won't quiet this pair.(Government)(Young activists press on in a fight to restore electricity to the Wayne Morse Free Speech Plaza)
IBM and dtms Solutions forge marketing agreement.(Customer Inter@ction NEWS)(Brief article)
Morse Plaza supporters aim to give power to the people.(Entertainment)(Backers of the open mike venue hold a benefit to pay for new equipment)

Terms of use | Copyright © 2009 Farlex, Inc. | Feedback | For webmasters | Submit articles