Now You're Talking
Voice-recognition technology is no longer stuttering - and that means huge opportunities for established players and newcomers alike.
(Business 2.0 Magazine) -- As man-vs.-machine classics go, it had the crucial elements: The brash young champion. The new-and-improved computing powerhouse. That the champ was 17-year-old Ben Cook, anointed by the Guinness Book of World Records as the world's fastest text messager, and the machine was not a supercomputer but a cell phone, didn't detract from the drama - at least not to the crowd gathered at an Orlando voice-recognition software conference last fall.
Which would be faster at converting an elaborate sentence into text: Cook's flying thumbs or the elegant algorithms of new speech software from Nuance Communications? The harrowing test phrase - "The razor-toothed piranhas of the genera Serrasalmus and Pygocentrus are the most ferocious freshwater fish in the world. In reality they seldom attack a human" - flashed on a screen. Cook thumbed furiously. A Nuance staffer calmly dictated the phrase into a cell phone. It was a blowout: Nuance's software converted the phrase flawlessly in 16 seconds. Cook trudged home in 48 and was left mumbling in a dazed tone, "I don't know how you do that."
They did it with Nuance's recently launched Mobile Dictation software, which will be available through carriers as early as the first half of this year. There's also a broader explanation: Voice recognition, long ridiculed as one of those perpetually just-around-the-corner technologies like the personal jet pack or the Dick Tracy wristwatch, has finally arrived.
Advances in processing power, new software algorithms, and even better microphones have enabled established players like Nuance and a raft of startups to design systems that work - often at near 100 percent accuracy rates. And they're creating explosive potential for growth in markets for everything from handheld dictation devices to mobile phones to auto parts to battlefield translators.
The overall market for voice-recognition technology topped $1 billion for the first time in 2006, a 100 percent increase in just two years. Within that broad market, there are numerous subsectors that are likewise surging: The market for server-based voice-recognition technology to power call centers and the like reached nearly $600 million in 2006 and is expected to double by 2009, according to Opus Research.
The market for speech technology embedded in devices such as phones and auto dashboards - worth about $125 million in 2006, according to research firm Datamonitor - is expected to quadruple to $500 million by 2010, powered by the rapid spread of voice-command features on phones and cars with increasing levels of "talking electronics," from music players to navigational systems. Ultimately, some experts say, voice-recognition systems are likely to be built into almost every gadget, appliance and machine that people use.
The surge in demand is already triggering investment from established voice players and newcomers alike. In 2006, Nuance (Charts) bought Dictaphone to enhance its presence in the health-care industry, even as Nuance's sales grew 20 percent to more than $300 million.
Microsoft's (Charts) new Vista operating system comes with voice technology that, after suffering embarrassing glitches, is now winning kudos from reviewers. Google (Charts) has said that it's studying technology to enable search-by-voice. Venture capitalists, meanwhile, are lining up to fund entrepreneurs with voice-recognition ideas all over Silicon Valley and beyond. "Speech technology," says Datamonitor analyst Daniel Hong, "is finally transitioning from a cool technology to a business solution."
The next generation
Voice-recognition technology dates to 1952, when Bell Labs researchers cobbled together a primitive system that could recognize numbers spoken over a telephone. Progress since has been halting, but with the advent of far more powerful computing components and years of plain old trial and error, systems today have finally reached the point where they can cope with innumerable accents, dialects and quirks of speech.
VoiceBox Technologies, a startup in Bellevue, Wash., in 2004 unveiled a prototype whose components had to be carried in a steamer trunk. Today roughly the same system fits on a device the size of a credit card and could be the brains of Toyota's voice-command dashboard systems (see correction below).
VoiceBox systems are now so sophisticated that they can analyze context to, say, figure out if the command "traffic" refers to road congestion, tunes from Steve Winwood's old band or a dope-smuggling film starring Michael Douglas.
Today's systems also have powerful capacity to essentially teach themselves. Tellme Networks, a startup in Mountain View, Calif., makes voice-recognition software used for corporate call centers and telecoms' 411 information systems. Tellme's platform captures some 10 billion utterances annually and constantly analyzes them, improving the system's precision literally every day. "Voice recognition is all about pattern recognition," says Tellme executive Jeff Kunins. "The more data you have, the better the recognition gets."
And the more valuable voice recognition becomes as a customer tool. Call centers and customer service departments are notorious for the infuriating "Press or say 1" purgatories that older speech-recognition technologies created, but customer outrage isn't the only penalty: The average call-center call costs $5 if handled by an employee but 50 cents with a self-service, speech-enabled system, according to Data-monitor.
Online brokerage E-Trade Financial uses Tellme to field about 50,000 calls a day; half never go to an E-Trade employee. The company says Tellme's system is saving it at least $30 million annually.
Startup TuVox is also racking up customers in the call-center and corporate markets. Its VP for marketing, Azita Martin, has her team dial a call center and record the typically torturous, multistep efforts to, say, reach the billing department. Then they create an audio file that reveals what the interaction could sound like if Martin's target used TuVox's software for routing calls with advanced voice-recognition technology. She e-mails the two interchanges to the CEO of the company using the call center. The contrast has helped Martin sign up numerous clients during the past few months - one reason TuVox's annual revenue is growing at double-digit rates and its customer base has quadrupled in 12 months. Telecom New Zealand, one of its new customers, reports a tripling of call-center customer satisfaction since it installed a TuVox system.
While call centers and autos are expected to continue to be growth markets for voice recognition, the real bonanza will likely come in improved systems for all manner of mobile devices. Start with cell phones: Telecom companies think consumers will pay for a host of additional services such as dictating e-mail or searching for a restaurant if there's an easy-to-use voice interface on mobile phones. Indeed, Opus Research says telecoms expect to earn an additional $5 to $15 per month from every customer who opts for a voice-enabled phone.
Numerous startups are scrambling to provide that technology, including Promptu. Founded in 2000 by speech-technology veterans, the Menlo Park, Calif., startup has developed a package of voice-recognition features that will be offered through several carriers later this year. "The telecoms are calling us now," says Brady Bruce, a Promptu senior vice president. "I love that." Real
Other startups are developing voice features for everything from MP3 players to handheld GPS devices to laptops. Pluggd, founded last February by former Microsoft and Amazon engineer Alex Castro, has created a search engine that combines speech recognition with semantic analysis to, for instance, find the exact spot in a cooking podcast where soufflé techniques are discussed.
Vocera Communications, whose founders grew up on Star Trek reruns and named the conference rooms at their Silicon Valley headquarters after Capt. Kirk and other characters, got some attention two years ago when it unveiled a communicator badge inspired by the show that combines voice-recognition and wireless technologies. The device produced some snickers at the time but has found a growing following; among its customers are medical workers, who use it to search through a hospital directory by voice and find the right person to help with a patient problem or look up medical records.
Vocera expects to turn profitable early next year. VoxTec International's Phraselator, a handheld gadget about the size of a checkbook, listens to requests for a phrase and then spits out a translation in any of 41 specified languages; it's currently being used by U.S. troops in Iraq and Afghanistan to provide on-the-fly translations in Arabic, Pashto and other local tongues. The Annapolis, Md., company, whose technology was originally developed for the Department of Defense back in 1997, won't disclose specific figures but says sales are way up.
Many experts expect voice technology to become almost ubiquitous someday, as speech recognition supplants typing, tapping, texting and touching as the primary interface with our machines. Rob Chambers, head of Microsoft's voice-recognition efforts, even foresees a day when the technology becomes powerful enough to correct mistakes in word choice or grammar - a kind of spell check for voice.
That may be decades away, but the technological improvements are nonetheless coming fast and furious, as was driven home in Orlando last fall.
The Nuance software that dusted the champion texter is roughly 25 percent more accurate than the company's best versions from a year ago, and Nuance researchers say next-generation products due to hit the market in just one year could produce 20 percent fewer errors than today's best systems. "Ben Cook is pretty incredible in how quickly he can type on his phone," says Nuance VP for worldwide marketing Peter Mahoney. "But this technology is just going to continue to get better and better."
Jeanette Borzo is a writer in San Francisco.
Correction: An earlier version of this story incorrectly stated that the Voicebox technology will be the brains in Toyota's new in-car navigation systems.