Lars Sjoberg talks speech recognition ahead of next week's Monetising Mobile conference.
Every time ME publishes a story about in-car apps, we are guaranteed lots of views and numerous re-tweets. This tells us two things: you're interested in voice recognition; you have nice cars.
That will please Nuance no end, given that the mobile and the automobile are crucial to its mission to make speech as natural an interface as touch and type.
This is no easy mission. Voice recognition is famously difficult to get right. Think about how hard it would be for you (if you are a native English speaker) to understand Sanskrit. And even if you learned the vocabulary, how tough would it be to glean the full meaning?
Well, that's what Nuance is trying to do with its Dragon Dictation and Search products. But for a machine, without your innate understanding of the concept of language.
And yet the rewards to so tantalisingly vast. Speech recognition can speed up mobile communications, open up the channel to the illiterate, put mobile into new environments, introduce new entertainment possibilities and more.
Article continues belowAdvertisement
After many false dawns for the tech, you get the sense that technical breakthroughs are coming through now. With Google and Apple gently introducing voice functions into their UIs, users are experimenting at last.
Needless to say, apps have helped. That's why Nuance introduced its Nuance Mobile Developer Program earlier this year in response to the growing number of developers interested in adding a voice interface to their products.
Next week, ME hosts the Monetising Mobile conference on next-gen mobile discovery, and Nuance is an official partner.
We talked to Lars Sjoberg, head of mobility and enterprise marketing for Northern EMEA at Nuance, ahead of the event...
What's the idea behind the Nuance Mobile Developer Program and how is it progressing?
We're trying to make it easy for developers to add a speech interface. It's just a few lines of code, so it can be added in a few hours via a self-service web site. We've had a very good response, with 2,500 enrolled so far. New apps are being launched all the time.
Examples?
It's what you'd expect – translation, dictionaries and so on. A few examples are Siri, Price Check by Amazon, Ask for iPhone, Merriam-Webster, Dictionary.com, SpeechTrans, Yellow Pages and AirYell from Avantar, iTranslate from Sonico, Taskmind from Catalyst, CapnTrans from Keystone Technology, and many others. Salesforce has just launched an iPhone and iPad app, Dragon for Salesforce, that lets users speak updates and and send it directly to their call log. And this week, Waze released a version of its SatNav apps for iPhone and Android powered by Nuance’s text-to-speech to provide turn-by-turn directions and traffic alerts.
How important is mobile to Nuance?
Hugely important. We divide our operation into four business units: healthcare; enterprise, automotive and mobile. Healthcare is the biggest, but mobile is the fastest growing. But in a sense mobile embraces all of them because it's reaching into everything we do. Dragon Dictation is number one or two on the iTunes store in every country launched, and has been downloaded by more than 13 million users worldwide.
What are the main challenges for the technology?
For all the improvements, it's still very difficult to achieve dependable voice recognition in some enviroments. The technology needs one person at a time to speak, and to adapt to the individual speaker. It doesn't have the ability of the human mind to filter information and understand context. Clearly, a journalist, for example, would love to have a speech recognition programme that could turn conversations into perfectly transcribed text. But we're not there yet. Obviously, there are situations in which the tech works very very well – where you can make simple commands clearly and in a quiet environment. And that's what we're focused on.
What's the availability of Dragon Dictation now?
We just added Russian, which is the 18th language it's available in. There'll be 25 by the end of the year.
It's also a component in FlexT9 for Android, which lets you speak, trace, write or tap to interface with your phone. This is available as an app download, but we also work with OEMs to embed some or all of it into devices. That's our real focus because the tech works best when it's integrated closely with the phone functions like, for example, speaking a reply to a written text.
How much of Dragon's computing power is in the client, and how much in the cloud?
Dragon Dictation is a hosted platform on the cloud, but with 'command and control' functions handled locally on the client. Obviously, you need a lot of data to run the more advanced features, but there is the obvious disadvantage of needing network access. Longer term, we'll work to migrate to the client.
What are your revenue channels and how do your partners make money?
We work on various models with OEMs, such as licensing and fees per a number of installs. For developers that incorporate voice recognition, the most popular model is to charge per use. So the iTranslate app from Sonico, for example, let users speak words into their iPhone, which are then translated to another language. Users would be charged for a certain number of translations.
Which new areas can be developed by voice recognition?
We're looking forward to seeing what game developers can do with the technology, as there must be opportunities to enhance gameplay using vocal commands. The camera is another phone feature that could support some kind of voice-recognition. It's all down to developer creativity, and it's very exciting to see what they can come up with.
* To hear more about Nuance's voice recognition plans, come to the Monetising Mobile conference on mobile discovery on September 28th.





















Add a new comment
You need to be logged in to post comments. If you do not have an account then please register.
Comments
0 comments
There are no comments yet, be the first to add one!