Tuesday, August 3, 2010

Voice recognition

Text that is as good as your word

By Paul Taylor
Published: July 29 2010 22:08 | Last updated: July 29 2010 22:08
Dragon NaturallySpeaking 11 makes speech easier to transcribe to a PC


Voice recognition software, which converts live speech or digital recordings to text, usually ranks with desktop videoconferencing and optical character recognition as technologies that have never lived up to expectations.

Early packages were too clumsy, inaccurate or needed too much “training”, in which the user gets the program accustomed to a particular voice, to make them practical outside specialist niches such as law firms and healthcare. They also required more powerful PCs and better headsets than many people owned. Those failings, followed by the inclusion of basic speech recognition technology in Microsoft’s Windows operating system, led to a shake-out in the consumer sector and the eventual emergence of Nuance’s Dragon Naturally­Speaking program as the market leader for Windows-based PCs (Nuance’s technology is also available for Mac users, under the MacSpeech Dictate package).


Dragon NaturallySpeaking 1.0 was launched in April 1997, the first package to be able to cope with natural, or continuous, speech rather than a staccato version in which each word must be enunciated separately.

I have been testing the latest version, Dragon NaturallySpeaking 11 Premium, which went on sale yesterday. It has three main versions: Home, Premium and Professional, starting at $99 (£80 in the UK) for the Home version.

Nuance claims the latest version is more accurate, faster and easier to use than its predecessors. It also claims the program enables users to use speech to perform almost any task on the computer – create documents, send e-mails, surf the web, search Facebook and Twitter and interact with applications at speeds up to three times faster than if you had to type the commands.


In my tests, these claims were justified. Nevertheless, the product is just one more tool, alongside the keyboard, touchpad, mouse and other input devices.


That said, the latest version of Dragon NaturallySpeaking removes several of the remaining barriers to the adoption of voice recognition technology for controlling a PC and dictating text into office productivity applications, such as Microsoft Word, e-mail packages and internet browsers. It is easy to install, fast, reliable and accurate and, best of all, if you take just a little time on “training” the software to recognise your voice, it is much faster than typing.


For users who are familiar with earlier versions of Dragon Naturally­Speaking, the most noticeable difference are: the new user interface with the context-sensitive Dragon Sidebar, which helps users discover and remember commands and tips; a new Help system; and an updated toolbar that helps users discover and access important but often overlooked Dragon features quickly.


I set up Dragon NaturallySpeaking Premium, which costs $200 (£150), on a Toshiba Portégé R700-S1331 laptop – reviewed last week – and on an older Lenovo ThinkPad X300.

Installation on both was smooth and took less than 10 minutes, including setting up a personal profile and a short dictation session to train the software. You can skip this training session but it really is worth doing.


The set-up procedure also involves the automatic calibration of the Plan­tronics headset that came with my software. I was able to open a new Word document and immediately start dictating a letter that turned out surprisingly accurate. Of course, the software still struggles with proper names and obscure technical terms – in my case, the stumbling block was when I dictated “eSata” (a type of hard drive interface), it came out as “E Satter’ (see screen shot above). But errors are easy to correct and the software learns from its mistakes, so once I had corrected the spelling, it came out right after that.


Nuance says Dragon Naturally­Speaking 11 is 15 per cent more accurate than the previous version, thanks partly to technology from a new partnership with IBM, another speech recognition pioneer.


Dragon NaturallySpeaking 11 works with most versions of Microsoft Word, including Word 2010 as well as OpenOffice Writer, WordPerfect and WordPad, the basic word processing package included with Windows. It also supports natural language – ordinary speech – commands in the new Office 2010 suite.


Equally important, Dragon Naturally­Speaking 11 is faster than previous versions for controlling other common PC-based operations, such as sending e-mail or searching the web using either Internet Explorer or Firefox.


It collapses many of the tasks that usually take an annoying number of clicks and keystrokes into simple voice commands. For example, you can ask your PC to search Amazon for a particular book, send an e-mail to a friend, search maps for an address, or open a folder. I also used it to search Facebook and to use Gmail.


Most of the time, both my PCs responded quickly and accurately to commands, although I did notice the speed slowed a lot if another resource-hungry application, such as reformatting a video file, was going on in the background.
The Sidebar was a particularly useful new feature.



Another interesting new feature is the ability to turn digital voice recordings into text. Once again, I achieved better results by taking time at the outset to create a new user profile and training it for use with a digital recorder by making corrections. Then, Dragon Naturally­Speaking 11 successfully and most­ly accurately transcribed notes I dictated into an Olympus mach­ine and then connected to the PC.


Unfortunately, I found the accuracy declined dramatically if I asked the software to transcribe an interview with another person – that is, with more than one voice. Nevertheless, this feature could be very useful for doctors, lawyers and others who regularly dictate notes.


Overall, Dragon NaturallySpeaking 11 is a welcome advance in both voice recognition and voice control of a PC. It is not for everyone – for example, there are some places that are too noisy, too public and where silence is mandatory to use voice commands. But voice is gradually becoming a viable alternative to more traditional input devices.