Skip Navigation | ANU Home | Search ANU | Students | Staff
The Australian National University
Student Services
Disability Services Unit
Printer Friendly Version of this Document
AT Project Logo and Link to Index page

A Quick Guide to the Training Process for Voice Recognition Software

CONTENTS:

Introduction
New User Wizard
General Training
Vocabulary Builder
Quick Tour
Dictation
Conclusion

 

1. Introduction

The successful use of Voice Recognition software requires an initial period of training to allow the program to gather and collate information about each user's voice characteristics, pronunciation, vocabulary, writing style, accent, etc. Whilst the latest versions of programs are able to produce high levels of accuracy with very short training times (about 10 to 15 minutes, with the right equipment), it is still necessary to do some training to use the software successfully.

This is designed to assist people to understand this training process, and hopefully encourage them to persist through the initial phase of using Voice Recognition software. The information in this guide refers to Dragon NaturallySpeaking, Professional Version 8, which is the program used by students at The ANU.

TIP: It is important that, when trialling Voice Recognition software, sufficient time is allowed for creating a voice file and doing the general training, rather than using an existing voice file. Firstly, your recognition accuracy will be much higher with your own voice file, and secondly, the existing voice file will be corrupted by the new information created by your voice. The only way a clear and accurate assessment can be made of the technology's usefulness for each individual is through a proper training process. It is a good idea to allow 1 to 2 hours for the initial training.

TIP: Another valuable strategy is to bring along some computer files containing the type of user-specific vocabulary and writing style of the new user, so that the Vocabulary Builder may be run. (See Vocabulary Builder section below)

CONTENTS

2. New User Wizard

When the program is first started, it automatically goes to the "New User Wizard" which will take you through the training process. After the first user is registered, the initial start-up will take you directly to the program if there is only a single user, or to a "Open User" Dialogue Box if there is more than one user. New users can be created from both locations by:

o NaturallySpeaking Window (Single User)


Go to the "User" Menu, click "New" and the program will take you to the "New User Wizard"

Opening the New User Wizard from Dragon NaturallySpeaking Window

Figure 1: Opening the New User Wizard from Dragon NaturallySpeaking Window

Opening the New User Wizard from the Open User Dialogue Box
Figure 2: Opening the New User Wizard from the Open User Dialogue Box

o "Open User" Dialogue Box (Multiple Users)

When the program first opens, if multiple users are registered, the "Open User" Dialogue Box appears. Click "New" and the "New User Wizard" appears. Other selections in this dialogue box are "Open," Delete," Rename," Cancel" and "Help."


The "New User Wizard" will then take you to the "Welcome" screen, which outlines the training process:

 

New User Wizard

Figure 3: New User Wizard Welcome Screen

Like all Wizards, the process is simply a matter of following the on-screen instructions, then clicking "Next" to proceed to the next window.

CONTENTS

o Create User

In this section, the user is asked to provide a name for the New User, and select the Speech Model and Vocabulary for the New User. The program will analyse your system and come up with a recommended Speech Model and Vocabulary, but it is possible to over-ride this to a certain extent. The preferable options are "Australian BestMatch Model" for the Speech Model, and "Australian English BestMatch Plus" for the Vocabulary. The BestMatch Plus Vocabulary is the only one which allows the very short training time, with the option of choosing a short piece call "Speaking to your Computer" which takes about 10 minutes to read, as the Vocabulary model.

The next window simply provides the option of choosing whether to dictate "Directly into the Computer" or "Into a Recorder." Choose the "Directly into the Computer" option if you are using a microphone into the computer's sound card, which is the usual method. The other option refers to the Mobile option, where you create a speech file for transcribing material recorded onto a tape recorder. This option will not be dealt with in this Guide.

The next window explains the connection of the microphone - either through the "Mic In" socket on the sound card, through the "USB port" if you have a USB (Universal Serial Bus) microphone, or through the "Line In" socket if your microphone has a built-in amplifier, such as with the Electronic Speech Enhancer.

On clicking "Next" you are taken to a window which will allow you to choose the Microphone type - "Headset Microphone," "Handheld Microphone" or "Electronic Device using Line Input." If you are using the standard microphone which comes with the package, select "Headset Microphone."

The next window provides instructions on how to position the microphone for the best results. The recommended position is to the side of the mouth, about a thumb width away from the mouth. The microphone can be on either the left or right side of the mouth, but it important to check that the face of the microphone (usually marked with a dot or "voice") is facing the mouth.

TIP: Microphone position is crucial to good recognition accuracy. If the mic is placed in front of the mouth, it will pick up breaths, and try to interpret them as words. You will get lots of "ins," "as," etc. That you didn't actually include in your dictation. If it's too far away from the mouth, the signal to noise ratio will be too low, again affecting recognition accuracy.

The "Adjust your volume" window next appears. This is where the program adjusts the volume settings on the sound card to suit each voice. When the window first appears, the will be some instructions, a volume slider control, a box with grey text and a "Start Adjusting" Button. When you click the button, the text goes black, and you simply read the text until the program beeps to tell you it is finished adjusting the volume settings.

TIP: Read the text at the volume you plan to dictate. If you adjust the Volume properties to a level which is different to that you will use normally, then recognition accuracy will suffer. Too loud and too soft are both detrimental to accuracy.

TIP: Similarly, if the computer you use is being shared by a number of users, it is a good idea to adjust the volume at the start of each session. This is done by accessing the "Adjust Audio Properties Wizard" from the "NaturallySpeaking - Advanced" Menu. If recognition is not up to the usual standard in any session, try adjusting the volume. It may be the problem.

The next stage is the "Audio Quality Check" where the program analyses the system and comes up with a Signal to Noise figure. The process is similar to the previous window, where text is read, a beep tells you when the process is complete, and a "Speech to Noise ratio" figure is displayed. Anything above 20 is good - the higher the figure the better. The program will tell you whether the results are "Acceptable" or "Not Acceptable" for good speech recognition.

TIP: If the figure is below 20, or is "Not Acceptable," try clicking the "Back" button and repeat the process, speaking at a louder volume, or adjusting the position of the microphone.

After this, you can click the "Finish" button, and the process moves on to the "General Training" stage.

CONTENTS

3. General Training

General Training can be done at any time, but a minimum General Training must be done for each new user. Further training can be used to enhance recognition and vocabulary at later times, and can be accessed from the "NaturallySpeaking - Advanced" menu. The range of material for later training is greater than for the initial training, including some shorter passages.

The initial training necessary for a new user basically involves matching a person's speaking style with a known passage and a pre-existing speech model. The user reads material from a dialogue box, and when sufficient material has been read, the program analyses the information gathered, comparing it to the existing model, and develops a Voice File for that particular user. The user is then able to begin dictating, or continue on to the Vocabulary Builder and/or Quick Tour before beginning dictating.

When the first window appears, along with some explanatory instructions, the user is asked to read the material which comes up in blue in the dialogue box. As the material is recognised, the text turns black, and when a paragraph is completed, it moves on to the next window. You can pause, go back or skip words that are not recognised. A yellow arrow comes up to mark where you should begin reading, or if text is misrecognised. Punctuation does not have to be read in the Training section.

General Training Dialogue Box
Figure 4: General Training Dialogue Box

TIP: If a person is unable to read the text due to a vision problem, such as a vision impairment or colour blindness, or if they have a Learning Disability or other impairment, this phase can be managed, with assistance, by having someone read the text, then having the person training repeat that text into the microphone. The speech of the person doing the initial reading will not be picked up by the microphone, even if the person is sitting close by and speaking normally. Screen Readers such as JAWS will not work with this section of the program.

Once five introductory paragraphs are successfully read and recognised, the user is then able to choose from a number of passages to do the training. Choices include extracts from "3001: A Space Odyssey," "Alice in Wonderland," "Dave Barry in Cyberspace" and "Dogbert's Top Secret Management Handbook." If you system allows it, you can also choose from a shorter, explanatory passage "Speaking to Your Computer." The Dogbert and Dave Barry passages are quite funny, and reading them (for the first few times, at least) is not particularly onerous.

After the required amount of material is read, the program will give you the option of finishing the training at that stage, or continuing. The initial training is usually enough to produce a level of recognition accuracy in the high 90%s. If you choose "Finish" the program will then analyse and calibrate the information and create the voice file for that user. After this is done, you can either begin dictation, or take the option to run Vocabulary Builder and take the Quick Tour. Both of these functions can be deferred until a later date if required.

CONTENTS

4. Vocabulary Builder

Running the Vocabulary Builder is a very worthwhile exercise. Recognition accuracy will be enhanced, and much time will be saved in training new words which are not in the program's standard vocabulary.

TIP: This step is particularly helpful when subject and user specific words and acronyms are likely to be used, such as specialist vocabulary from university subjects.

The process simply involves asking the program to analyse existing computer files, such as Word or RTF files, find the words it does not have in its existing vocabulary (it's surprising how many typographical errors are picked up!) and give you the opportunity to train them. The program then gives the user the option of having the writing style analysed as well, storing information about the type of writing preferred by the user in their voice file. Again, this enhances recognition accuracy, since the program analyses the sounds made, tries to match them with its vocabulary, then analyses the context and voice file for information about each user's writing history and style before finally deciding on the most likely word to transcribe.

The Vocabulary Builder operates like a Wizard, and takes the new user step by step through the training process. Vocabulary Builder can be run at any time, by accessing the "Tools" menu and selecting "Vocabulary Builder." If you choose to postpone this section, the program will give you the opportunity to run it each time the program is loaded, until it is completed, or the "Don't remind me again" box is clicked.

Vocabulary Builder follows three or four basic steps:
o Transfer words from an existing list into the vocabulary (optional),
o Analyse documents to find new words,
o Select words to train
o Analyse and adapt to preferred writing style.

Vocabulary Builder Dialogue Box

Figure 5: Vocabulary Builder Dialogue Box

Once this process has been completed, recognition accuracy should be improved, and the number of corrections and training of new words should be reduced.

TIP: It is important to correct new words that are misrecognised, using the Correction dialogue box, since this teaches the program that this is your preferred vocabulary. It may not choose the new word once or twice, but after a couple of corrections, it will be given a much higher priority for use, and will probably be recognised first time thereafter.

CONTENTS

5. Quick Tour

The Quick Tour will take you through the main features and operations of the program. It uses separate video screens which require the use of a mouse to operate, so will pose difficulties for people with manipulation issues. Assistance may be required for this phase.

However, it is a worthwhile process, since it does give a good overview of the main features of the program, and examples of how to use particular functions. When combined with the Quick Reference Card, which lists the major commands for formatting, navigation, correction, punctation, etc, a new user can be successfully dictating text, with a high level of accuracy, quite quickly.

The Quick Tour in Version 5 has been significantly enhanced by incorporating an interactive element into the process. The program outlines information about various aspects of NaturallySpeaking, then gives you the opportunity to apply that knowledge in NaturallySpeaking, with a series of exercises. This speeds up the learning process, since the new user gets to actually use the knowledge being provided, instead of merely receiving information. You can quickly learn to use basic commands, and explore some of the functionality in the program.

Quick Tour Screen 1

Quick Tour Screen 2
Figures 6 & 7: Quick Tour Screens

CONTENTS

6. Dictation

Once the New User Wizard has been completed, the process of dictation can begin. Technically, the user should have a quite high level of accuracy at this stage, but there are a number of adjustments to be made to develop a sufficiently high level of confidence and fluency in the use of the program.

The first step is to learn the commands for various functions such as:

  • Formatting
  • Correction
  • Navigation
  • Editing
  • Punctuation

To facilitate this process, NaturallySpeaking comes with a Quick Reference Card, which lists the commands for the major functions. With regular use, the commands quickly become familiar, but initially, it is desirable to have the card handy to check what you can say to achieve particular results. Once in the program, most functions and features can be readily accessed by voice, and it is simply a matter of learning how to say things to achieve the desired effect.

TIP: It is important to remember to pause briefly before speaking a command. If you do not pause, the program will interpret the command as dictation. If you pause in between words in a command phrase, the program will not recognise it as a command. Pressing the CTRL key while you speak forces NatSpeak to recognise a command.

One of the major adjustments people have to make in using Voice Recognition software is the mental shift required to feel comfortable in dictating formal written English. It is a very different process to typing, and requires practice and persistence to develop fluency in this different process. Most experienced keyboard users "think out the end of their fingers." They compose the material and type it as they are thinking of what to write. The process of dictation requires a person to compose the material, dictate phrases and sentences, then stop, compose the next passage, dictate that, and so on. Initially, this can be very difficult and gives the feeling of lacking fluency.

TIP: To become more familiar with the process of dictation, it is a good idea to avoid composing new material for a while, and simply read material into the program. By reading, the new user is becoming familiar with the process of dictation, learning to use commands and exploring the capabilities of the program without having to deal with the adjustment to composing for dictation. Another advantage is that personalised vocabulary and writing style are being developed, so that when you move on to creating new material, your voice file is much more mature, and require less corrections. This process should take several hours, until the user is comfortable with the requirements of dictation.

After this period of familiarisation through reading, trying some straightforward dictation, using the different commands should be the next step. Again, it is important not to be too ambitious, in the early stages. It is more important to develop the "Stop, Compose, Dictate" approach, rather than trying to complete your PhD Thesis by Friday! If you dictate in single words, you actually starve the Dragon of important information, and recognition accuracy suffers. Feed your Dragon plenty of information, and it will be very kind to you, by quickly and accurately transcribing your words.

This is the point where many people come unstuck, and is the cause of many failures to continue with the technology. By trying to leap ahead to full-scale composition and dictation, without the necessary period of training and familiarisation, the inherent difficulties of dictation are magnified.

Before beginning composition/dictation, the user should:

  • Be comfortable with the process of dictating to the computer;
  • Know the main commands necessary for punctuation, navigation, correction and formatting;
  • Have a reasonably mature voice file, with an adequate personal vocabulary, experience in transcribing your main writing styles and a high level of accuracy;
  • Have something to say (i.e. a plan of what you are going to write)

CONTENTS

Conclusion

Even when the high motivation factor for people with disabilities is taken into consideration, we have had a high acceptance and success rate with this method of training. These factors apply equally to people with and without disabilities, and provide an effective and longer-lasting means of adapting to the new technology. The great danger with Voice Recognition is to fall for the trap that "You can be up and running in less than an hour." Technically, for an experienced user, this is correct, but making the mental shift to an entirely different way of working is more of a long-term process. Intelligent, confident people can dig very large holes for themselves and become jibbering wrecks if they don't know appropriate commands for navigation, correction and punctuation. People can become very stressed if they try to compose new material with adequate experience in dictation.

It is important that new users are aware of the potential problems and adjustments necessary for effective use of this technology and appropriate methods of dealing with them.

Related Link: http://www.speechcontrol.com/articles/Keys%20to%20dictation.htm

CONTENTS

Guide to Training Voice Recognition