Will it blend? A hands-on tutorial on the AT&T speech mashup
for mobile services
Presenter: Giuseppe (Pino) Di Fabbrizio, AT&T Labs
9 AM – 12 (with a break), Monday, January 24, 2011
Speech is a natural and efficient way to interact with mobile phones since it can overcome the input and output limitations of those devices. Moreover, speech is a direct, intuitive interface that requires no learning and is safer for multitasking users. However, mobile devices lack the computational capabilities to perform speech processing tasks required for speech interfaces, including automatic speech recognition and text-to-speech, especially when large vocabularies or high-quality synthesis is involved. One established solution is to move speech processing resources into the network by centralizing the heavy computation load in server farms, but it is unclear whether this can scale to accommodate the large deployments required by mobile traffic.
To address these real-world challenges, AT&T introduced speech mashups, an innovative approach that leverages web services and cloud computing to essentially blend (mash up) web content with a speech interface. This union of speech technologies, web mashups, and cloud computing has now taken hold and is changing the ecosystem of the speech technology industry, enabling innovators to create new and exciting voice-enabled services cheaply and at scale. AT&T is championing this approach by building an application-centric network through the AT&T Ecosystem Developer Program, which offers access to emerging technologies and technical support to create and test mobile applications.
The tutorial illustrates the main speech mashup concepts, a suite of developer tools, and the basic components needed to create and test voice-enabled mobile applications. It provides an overview of speech recognition, speech synthesis principles, and some elements of multimodal interaction. And it will describe how to combine speech interfaces with web services, and create and maintain speech recognizer grammars or stochastic language models. A hands-on session will demonstrate real programming examples to capture and render speech on web browsers and the most popular mobile operating systems, including iPhone iOS, Android, and BlackBerry OS. The tutorial will conclude by illustrating a complete end-to-end voice search mobile application on the iPhone. The examples will be based on AT&T’s publicly available speech mashup portal and will allow speech practitioners to experiment with a variety of existing grammars, code examples, and industrial-strength speech processing technology.
About the presenter
Giuseppe Di Fabbrizio is a Lead Member of Research Staff in the Network and Services Research Laboratory at AT&T Labs - Research in Florham Park, NJ. During his career, he has conducted research on multimodal and spoken dialog systems, conversational agents, natural language generation, multimodal and speech system architectures, platforms and services, publishing several conference and journal papers on these subjects. He was instrumental in the development and deployment of the AT&T VoiceTone® Dialog Automation product for the AT&T business enterprise customers and the recipient of the 2008 AT&T Science and Technology Medal Award for outstanding technical innovation and leadership in the advancement of spoken language technologies, architectures, and services. Di Fabbrizio is a senior member of the Institute of Electrical and Electronics Engineers (IEEE), an elected member (2009-2011) of the IEEE Signal Processing Society's "Speech and Language Processing Technical Committee" (SLTC) in the area of dialog systems; he serves as editor of the SLTC’s quarterly newsletter and contributes as a program committee member and technical reviewer for numerous international conferences, journals, and workshops. Prior to joining AT&T, he worked as a Senior Researcher at Telecom Italia Lab (formerly CSELT, now mostly Loquendo).