top of page
Writer's pictureReed James

Interview with Rob Meulman, Speech Productivity Developer

1. Please describe speech recognition (SR).


The technique used by computers to transfer spoken words into text (dictation) and to interpret words as actions (voice commands).


2. What can you do with it?


You can use speech recognition for dictating text in applications like Word, Outlook, Browsers EMR, etc.). This is probably the most used feature.


The text appears on your screen while dictating. This is much faster than typing. As long as you're working in speech-friendly applications (Select-and-Say enabled) you will have complete text control and you can quickly capitalize words, correct words, format your text, delete words and sentences etc.


The other important feature of speech recognition is voice commands to perform common tasks on your computer. Many people have physical limitations when it comes to using a standard keyboard and mouse (RSI). Others (Quadriplegics) will need complete hands-free control. Standard voice commands can perform common tasks like opening Windows Explorer, running applications, scrolling in browsers, send emails are usually included by default.

Speech recognition programs usually also have the option to create custom voice commands. This is done by using a (relatively simple) scripting language. Scripting your own voice commands can be very powerful because you can combine many actions into one command.


3. Why did you get involved with SR?


Although I (superficially) had worked with computers since the very early WordPerfect DOS versions in previous jobs and on my girlfriend's computer (who had been using very early Dragon NaturallySpeaking versions) I only really discovered the joy of having a PC when Windows XP came out. I had saved some money from gigs in a cover band I was playing in and I bought my first real computer: a multimedia Packard Bell system with a 19-inch monitor (one of those big old-fashioned CRT screens ;-)). I initially bought it to download music and to compose and arrange my own songs. I've always been interested in microphones, recording and audio in general.

Pretty soon I spent countless hours behind my computer. Those hours pretty soon turned into days, weeks.... I was so eager to try out all kinds of (audio, video, photo and web) programs that it pretty much kept me up all night :-(. Since I also did a lot of experimenting with programs and Windows itself I messed up frequently. Very often I had no idea what I was doing wrong but the frustration of it all motivated me to dig a lot deeper into the Windows operating system and computers in general.

This overgrown passion went on for a couple of years until I started experiencing discomfort in my elbows, particularly when clicking the mouse. Of course, I ignored this but not only did it come back, it went from bad to worse. Pretty soon typing started to hurt as well. The pain generalized to my shoulders and neck.

I tried nearly every possible ergonomic solution like a trackball mouse, touchpad, ergonomic keyboard, foot mouse, other foot pedals, headmouse, webcam-based headtrackers. In the end, using those devices intensively became painful as well.

I was already familiar with Dragon NaturallySpeaking but now I developed a serious interest in the program (voice commands in particular) because I had a hunch that this eventually might be my only ticket to still have a (relatively) productive and creative computer life.



4. What is the best SR software on the market?


There's very little doubt that Dragon NaturallySpeaking is the best, if not the only serious speech recognition solution out there. Dragon NaturallySpeaking was originally created by Dragon Systems, further developed after merging with Lernout & Hauspie (L&H Voicexpress) and finally taken over by Nuance previously known as Scansoft).

The Dragon speech engine has always been superior compared to other solutions like Windows speech recognition (WSR). Dragon dictation accuracy comes close to 98/99% (although not for all users). Furthermore, the Dragon scripting engine (Visual Basic for Applications - VBA derivate) is very powerful when it comes to writing your own voice commands. When it comes to performance and ease of use DPI 15 (especially the most recent 15.3 update) has taken great strides. Mind you I'm mostly referring to the Professional Dragon version, although general accuracy should be similar in the Premium or Home versions.



5. Do you need special equipment to use it?


There's no need to buy special equipment for using speech recognition, no expensive equipment anyway. It is important to use a good microphone, preferably a headset. Again that does not have to cost you an arm or a leg. Particularly since DPI 15 which uses far-field algorithms (auto gain) a cheap headset will usually give decent results. If not you can try a more expensive Sennheiser, Microsoft or Logitech headset.


6. What is the best computer for SR?


The best computer is always the fastest computer you can get :-). Preferably a PC, not a notebook. For smooth speech recognition performance, an Intel i5 Core processor (or higher) is advised. There is even reason to believe that the i5 may work better than the expensive i7 processors because i5 processors usually have better single-core performance and Dragon mostly is a single-core program :-)

Recent Dragon versions especially DPI14/15 have a tendency to take up about 600 MB of RAM therefore it is advised to have at least 16 GB of RAM memory installed on your system.


7. Can SR benefit translators? Some translators have said that Dragon NaturallySpeaking is hard to use, is that true? What about Windows speech recognition?


To start with that last one, although freely included in Windows, WSR is not accurate and fast enough for serious translation work. Translating with Dragon NaturallySpeaking is a much better idea. Translators can enjoy many of the benefits that speech recognition (Dragon NaturallySpeaking particularly) offers. You can work much faster and intuitively because (basically) you don't have to type. If using a keyboard and mouse in general has become painful then you can additionally use voice commands to start your translation software and access its menus/buttons.


8. What precautions should translators use when dictating? I also heard that Dragon sometimes inserts strange words into your dictation?


About Dragon inserting strange words. I haven't come across this very often. Besides, accidental offensive or awkward words will usually be replaced with other words anyway.

Generally speaking, also when it comes to precautions, starting to use Dragon is a learning process. It's important to properly set up your first user profile using correct dictation techniques like speaking at a consistent volume level and only speaking the words you actually mean to get on your screen. No "eh", "uh" etc. Speech Recognition is all about control. Being aware of what you are saying, being aware of background noises, and putting the microphone to sleep or off when you don't need to dictate.

Also, be aware that dictating in speech unfriendly applications usually gives unsatisfying recognition results. You will notice that recognition is far better using a fast dictation box like the one(s) featured in SP Pro (see section 14).


9. Is SR expensive?


You can use the built-in Windows speech recognition (WSR) or one or two other freeware solutions out there but believe me, you will get frustrated with it soon. Control of such programs (UI and such) as well as accuracy simply are not good enough. But it could be a good way to get acquainted with speech recognition, in general, to see if it's something for you.

When it comes to Dragon, if you are serious about speech recognition that I can strongly recommend buying DPI 15. The price varies, usually anywhere between $200 and $300 but sometimes it comes as an offer for $150 or even less. If I'm not mistaken there is also still an attractive offer to upgrade from Dragon Premium.

Of course, you could buy the cheaper Home or Premium versions but they are very limited when it comes to creating your own voice commands which really is such an interesting option!


10. Can you use 2 languages with the same SR software?


When it comes to WSR (which doesn't support many languages) you would first have to change your Windows default language which is a hassle. You cannot use two languages at the same time.

When it comes to Dragon, it really depends on the version you purchase. If you buy a non-English Dragon Professional version (Dutch, German, French, Spanish etc.) the English language is usually supported as well (English Dragon Professional versions don't always support other languages). This probably also goes for Dragon Premium versions.

If your Dragon version supports several languages then you cannot use them at the same time. You will have to load each profile separately.


11. What are situations when it is best not to use SR?


In general it's best to dictate in a relatively quiet situation. If you're working in very noisy surroundings then you probably don't want to use speech recognition (unless the sound input of your headset has a very high threshold).

The same goes for situations where you have no privacy at all.


12. Is there a way to dictate a translation with Google's speech recognition?


You probably mean in Google Docs? I haven't really tried it much. It appears to have quite some correction options as well now. I do however frequently use the Google Home commands (OK/Hey Google) for hands-free Google search and that works great, especially since more languages are supported simultaneously.

However, DPI will probably be more accurate and you don't need to be online in order to use it. You can create as many convenient text edit commands as you wish using more ergonomic command names. You also have the benefit of utilizing Dragon's built-in AutoText commands (for quickly adding signatures or frequently used sentences).


13. Are there add-ons for SR software? If so, can you name some?


There are several Dragon add-ons on the market today focusing on various limitations of the program. Some of these add-ons however are either unnecessary (Dragon can already perform the actions), unstable, slow down Dragon, or won't work in some important applications.


In my 16 years of experience using Dragon many hours each day I found there are only two commercial add-ons and one free extension that are crucial for me:


VoiceComputer


The ultimate hands-free mouse. This is as good as it gets when it comes to controlling your computer completely hands-free. I'm so glad this program is around. It is crucial for me to remain active and productive on my computer.


Click by Voice


There are several Chrome extensions that enable you to browse (semi) hands-free. However, this is the best I've seen so far. It also works in Edge Developer and it's free!


SP Pro


Yes, I know my own program ;-). I use it multiple times a day (that easily accumulates to a couple of hours a day!).


14. You developed Speech Productivity. How did you come up with the idea? Why should people buy it? http://www.speechproductivity.eu/


Although Dragon is a truly wonderful program, dictating in speech unfriendly applications like Thunderbird, Chrome, Libre Office but also translation software like Trados and menoQ leaves much to be desired. Dictating only works properly in speech-friendly applications like Microsoft Word, Outlook and Internet Explorer. These applications support full Select-and-Say control which means that you can directly approach words, sentences even letters by voice and do all kinds of corrections to them. A new line will automatically be capitalized as soon as a period, exclamation/question mark is detected. If the target application supports rich text then you can also format your text by voice.


Sadly, translator software like SDL Trados Studio, memoQ and Cloud-based translation solutions (run in other browsers than IE) are far from speech friendly. They mostly just support "dumb dictating", which means the applications will accept text but there is no Select-and-Say control. This results in lots of manual corrections. Although Internet Explorer is the ultimate Dragon-compatible browser, not many people dare to use it anymore. Of course, Dragon provides web extensions for both Firefox and Chrome but those have a tendency to hang or crash on some systems. Programmers who work with code editors (syntax highlighting) will soon find out that these editors are extremely speech unfriendly as well.


A note on speech-friendly office applications: MS Word and Apache OpenOffice (free) fully support Select-and-Say control. However, many users have encountered the annoying jumping cursor problem where the cursor has a tendency to shoot back up one line or even back to the top of the document upon speech input.


Some resort to the Dragon dictation box because it supports full Select-and-Say control and formatting. However this can be a cumbersome process because it loads very slowly, has less-than-perfect visibility (especially on UHD systems), usually doesn't remember its size or location upon reboot and limited additional options. The Dragon dictation box also has the downside that it makes your target application temporarily inaccessible when it's loaded. The biggest risk however is that you will lose all your dictation if you accidentally cancel the box (or press the Esc key!). This has led to frustration with many a Dragon user.


A much better option is to use the SP Pro Dictation Boxes which effectively eliminate these limitations. The SP boxes are extremely fast, safe and versatile. However SP Pro is much more than just dictation boxes. It's a suite of effective productivity add-ons that will make your work with Dragon Professional (DPI14/15) much more enjoyable.


15. Lastly, what is your vision of SR in the next 10 years?


Let's start with WSR. Unless Microsoft is going to perform some long overdue maintenance to their speech recognition engine they are not likely to play much of a role in this.

Google was mentioned before. I think it's going to be big when it comes to speech recognition. Their Google speech engine will likely be improved most (so will the Artificial Intelligence element of it). Amazon (Google's biggest competitor in this) may chime in as well. However, most of the developments will be focused on mobile devices (Android mostly).

Although a smaller market, Apple recently came up with Mac OS Catalina which has some extensive speech support as well (Dragon support for Mac was dropped lately).

However, most hands-free Windows users will place their hopes in Nuance, and so will I. The NaturallySpeaking program will probably get a couple of interesting upgrades. Let's hope it will be around for many years to come!


Rob Meulman, Speech Productivity developer

1 view0 comments

Recent Posts

See All

I'm not good at computers

Please don't say that, not to me, not to a potential client, and not even yourself. You see, underestimating your computer skills is the...

Kommentare


bottom of page