Press Release - Speech understanding technology will drive technology industry growth
The non-profit Applied Voice Input Output Society (AVIOS) has promoted the commercialization of speech and natural language technology for more than three decades. The organization claims the technology has broken through the barrier of broad utility and will lead to a chain reaction of new applications, easing the use of ever-more-complex digital technology.
"Early applications were primitive," admits K.W. (Bill) Scholz, AVIOS president, "but pioneers laid the groundwork for today's technology. We've finally reached the point where the promise of the technology has been met. Computer speech understanding may never reach the full capabilities of humans, but it has clearly passed the level of high utility."
William Meisel, Executive Director of AVIOS, notes that the maturing of speech technology will trigger a "chain reaction" of applications that include a human-computer connection wherever we go. "In my book, The Software Society, I argue that the desire to have the power of computers always available to us as a constant companion that knows us well will change what it means to be human. Technology has always had a fundamental impact on the way we live, and being able to deal with technology using human language is a major development that will propagate throughout most products and services that deal directly with people."
Smartphones are driving the adoption of speech recognition and "natural language understanding," where personal assistants such as Apple's Siri, Google Now, and Samsung S-Voice try to do everything for us—all we have to do is ask. "Intelligent assistants will transform our interactions with technology by acting as go-betweens between us and the applications and devices in our environments," notes Deborah Dahl, Principal, Conversational Technologies and AVIOS Board member. "The uniform speech and natural language-based user interface provided by personal assistants means that we won't have to learn how to interact with every software application and consumer device individually. The intelligent assistant will interpret our naturally spoken requests and take care of the details."
Another important driver of the adoption of voice technology is in automobiles, with the need for safe hands- and eyes-free control of increasingly complex infotainment systems. "The solution to reducing driver distraction likely resides in combining a variety of driver interfaces to fit specific tasks," noted Thomas Schalk, AVIOS Board member and Vice-President of Voice Technology at Agero. "The most critical role that speech plays is text entry while driving. But holistically, complex infotainment systems require simple user interfaces that are multimodal. Based on several recent studies, we're not there yet."
Nava Shaked, Principal of Brit Business Technologies and AVIOS Board member, also noted that voice interaction doesn't stand alone in a user interface: "The interesting development with spoken language understanding technologies is that they are now being integrated with other multimodal technologies to create a holistic user experience where it is possible to maximize speech for the tasks it best fits. The rules of the game have changed, and the fusion of interface technologies led by voice and gesture interaction is now a must for mobile applications looking to create a natural and intuitive experience."
Is the technology ready for prime time? Matt Yuschik, Mobile Services Architect at CitiCorp R&D and AVIOS Board member, urges that analysts look at the accuracy of speech recognition in context: "Speech recognition critics view accuracy without comparison to the alternatives. Let's not forget that humans make errors, too. Dialing a phone number (on a keypad) has error rates up to 10%. Thank the mobile phone for the visual display and the back/erase button! For dictation, human transcription rate error is about 2%. And the error rate for the mini-QWERTY keypad on smartphones is between 5-6%, even if you are looking at the buttons -- otherwise it jumps to 18-22%. We humans tend forgive errors we seem to cause. Speech technology and its subsequent errors are transparently detected and corrected by syntax, semantics, and higher-level context sensitive rules of Natural Language Processing. Multimodal interactions are making transaction throughput rates even faster and more successful. And the good news is that performance is continuing to improve!"
James Larson, Vice President, Larson Technical Services, and an AVIOS Board member, noted that traditional telephone Interactive Voice Response and voice search systems need not have the full functionality of intelligent agents to be effective. "Intelligent agents must apply knowledge about the real world, knowledge about the user, knowledge about the current context, knowledge about the meaning and structure of natural language, and be able to determine when they can not respond to a request correctly. It will take much insight and user testing to build useful intelligent agents, they can't be build overnight."
Bruce Pollock, AVIOS Board member and Vice-President of West Interactive, noted the impact on customer service operations. "Speech recognition, particularly natural language speech, is helping more companies every day to improve their customer experience and lower their costs. A well-designed speech system helps to improve self-service resolution, and also helps to ensure that if a caller needs to get to an agent, they are transferred quickly and easily to the most suitable agent to get help. When integrated as part of an integrated, multi-channel customer communication strategy (along with the web, SMS, agent, etc.), speech can be an incredibly powerful tool."
TV is another area whose evolution into watch What You Want When You Want will drive the need for flexible voice requests, with apps on smartphones perhaps acting as the remote control. Meisel noted that all the recent announcements for Smart TVs include voice search to find and launch TV content.
Roberto Pieraccini, CEO of the International Computer Science Institute at Berkeley and AVIOS Board member, summarized, "The progress in computer speech understanding we have made during the past years is tremendous. After decades of activity in the field, I can see now how intelligent assistants and other applications based on voice recognition technology will become pervasive in the way we interact with machines. Science and technology need to address other robustness issues towards the achievement of truly human-like capabilities, but I am now confident we will see that happen in our lifetimes." Pieraccini authored The Voice in the Machine: Building computers that understand speech.
Sara Basson, AVIOS Board Member and Program Director, IBM Research, noted an additional dimension of personal assistant technology: "Users will expect Personal Assistants to be smart and intuitive, asking minimal clarification questions and quickly providing the service requested. Systems like IBM's Watson are being designed to provide immediate answers to specific queries, rather than burdening users to select from a lengthy set of possible answers. A speech recognition front end is the natural and logical interface for these smart systems -- with intelligent dialogue as the next frontier."
As technology grows more complex, with more devices and more features constantly appearing, using human language to deal with the complexity is a necessity, the AVIOS Board agreed. The breakthrough in speech and natural language technology can avoid consumer-facing technology hitting a wall where customers resist 'digital overload.'
AVIOS will continue to help both the public and developers understand how to best use speech technology. For example, the organization is holding the Mobile Voice Conference (www.mobilevoiceconference.com) March 3-5 in San Francisco. The conference examines the practical, business, and technical implications of the paradigm shift driven by the growth in numbers and sophistication of mobile phones and other mobile devices, with growing features made usable by voice interaction. A particular focus of the conference will be how individual companies and developers can create specialized personal assistants, including the business case for doing so.