A Quick Guide to the Training Process for Voice Recognition Software
CONTENTS:
1. Introduction
The successful use of Voice Recognition software requires an initial
period of training to allow the program to gather and collate information
about each user's voice characteristics, pronunciation, vocabulary,
writing style, accent, etc. Whilst the latest versions of programs are
able to produce high levels of accuracy with very short training times
(about 10 to 15 minutes, with the right equipment), it is still necessary
to do some training to use the software successfully.
This is designed to assist people to understand this training process,
and hopefully encourage them to persist through the initial phase of
using Voice Recognition software. The information in this guide refers
to Dragon NaturallySpeaking, Professional Version 8, which is the program
used by students at The ANU.
TIP: It is
important that, when trialling Voice Recognition software, sufficient
time is allowed for creating a voice file and doing the general training,
rather than using an existing voice file. Firstly, your recognition
accuracy will be much higher with your own voice file, and secondly,
the existing voice file will be corrupted by the new information created
by your voice. The only way a clear and accurate assessment can be made
of the technology's usefulness for each individual is through a proper
training process. It is a good idea to allow 1 to 2 hours for the initial
training.
TIP: Another
valuable strategy is to bring along some computer files containing the
type of user-specific vocabulary and writing style of the new user,
so that the Vocabulary Builder may be run. (See Vocabulary Builder section
below)
CONTENTS
2. New User Wizard
When the program is first started, it automatically goes to the "New
User Wizard" which will take you through the training process.
After the first user is registered, the initial start-up will take you
directly to the program if there is only a single user, or to a "Open
User" Dialogue Box if there is more than one user. New users can
be created from both locations by:
o NaturallySpeaking Window (Single User)
Go to the "User" Menu, click "New" and the program
will take you to the "New User Wizard"

Figure 1: Opening the New User Wizard from Dragon NaturallySpeaking
Window

Figure 2: Opening the New User Wizard from the Open User Dialogue Box
o "Open User" Dialogue Box (Multiple Users)
When the program first opens, if multiple users are registered, the
"Open User" Dialogue Box appears. Click "New" and
the "New User Wizard" appears. Other selections in this dialogue
box are "Open," Delete," Rename," Cancel" and
"Help."
The "New User Wizard" will then take you to the "Welcome"
screen, which outlines the training process:

Figure 3: New User Wizard Welcome Screen
Like all Wizards, the process is simply a matter of following the on-screen
instructions, then clicking "Next" to proceed to the next
window.
CONTENTS
o Create User
In this section, the user is asked to provide a name for the New User,
and select the Speech Model and Vocabulary for the New User. The program
will analyse your system and come up with a recommended Speech Model
and Vocabulary, but it is possible to over-ride this to a certain extent.
The preferable options are "Australian BestMatch Model" for
the Speech Model, and "Australian English BestMatch Plus"
for the Vocabulary. The BestMatch Plus Vocabulary is the only one which
allows the very short training time, with the option of choosing a short
piece call "Speaking to your Computer" which takes about 10
minutes to read, as the Vocabulary model.
The next window simply provides the option of choosing whether to dictate
"Directly into the Computer" or "Into a Recorder."
Choose the "Directly into the Computer" option if you are
using a microphone into the computer's sound card, which is the usual
method. The other option refers to the Mobile option, where you create
a speech file for transcribing material recorded onto a tape recorder.
This option will not be dealt with in this Guide.
The next window explains the connection of the microphone - either
through the "Mic In" socket on the sound card, through the
"USB port" if you have a USB (Universal Serial Bus) microphone,
or through the "Line In" socket if your microphone has a built-in
amplifier, such as with the Electronic Speech Enhancer.
On clicking "Next" you are taken to a window which will allow
you to choose the Microphone type - "Headset Microphone,"
"Handheld Microphone" or "Electronic Device using Line
Input." If you are using the standard microphone which comes with
the package, select "Headset Microphone."
The next window provides instructions on how to position the microphone
for the best results. The recommended position is to the side of the
mouth, about a thumb width away from the mouth. The microphone can be
on either the left or right side of the mouth, but it important to check
that the face of the microphone (usually marked with a dot or "voice")
is facing the mouth.
TIP: Microphone
position is crucial to good recognition accuracy. If the mic is placed
in front of the mouth, it will pick up breaths, and try to interpret
them as words. You will get lots of "ins," "as,"
etc. That you didn't actually include in your dictation. If it's too
far away from the mouth, the signal to noise ratio will be too low,
again affecting recognition accuracy.
The "Adjust your volume" window next appears. This is where
the program adjusts the volume settings on the sound card to suit each
voice. When the window first appears, the will be some instructions,
a volume slider control, a box with grey text and a "Start Adjusting"
Button. When you click the button, the text goes black, and you simply
read the text until the program beeps to tell you it is finished adjusting
the volume settings.
TIP: Read
the text at the volume you plan to dictate. If you adjust the Volume
properties to a level which is different to that you will use normally,
then recognition accuracy will suffer. Too loud and too soft are both
detrimental to accuracy.
TIP: Similarly,
if the computer you use is being shared by a number of users, it is
a good idea to adjust the volume at the start of each session. This
is done by accessing the "Adjust Audio Properties Wizard"
from the "NaturallySpeaking - Advanced" Menu. If recognition
is not up to the usual standard in any session, try adjusting the volume.
It may be the problem.
The next stage is the "Audio Quality Check" where the program
analyses the system and comes up with a Signal to Noise figure. The
process is similar to the previous window, where text is read, a beep
tells you when the process is complete, and a "Speech to Noise
ratio" figure is displayed. Anything above 20 is good - the higher
the figure the better. The program will tell you whether the results
are "Acceptable" or "Not Acceptable" for good speech
recognition.
TIP: If the
figure is below 20, or is "Not Acceptable," try clicking the
"Back" button and repeat the process, speaking at a louder
volume, or adjusting the position of the microphone.
After this, you can click the "Finish" button, and the process
moves on to the "General Training" stage.
CONTENTS
3. General Training
General Training can be done at any time, but a minimum General Training
must be done for each new user. Further training can be used to enhance
recognition and vocabulary at later times, and can be accessed from
the "NaturallySpeaking - Advanced" menu. The range of material
for later training is greater than for the initial training, including
some shorter passages.
The initial training necessary for a new user basically involves matching
a person's speaking style with a known passage and a pre-existing speech
model. The user reads material from a dialogue box, and when sufficient
material has been read, the program analyses the information gathered,
comparing it to the existing model, and develops a Voice File for that
particular user. The user is then able to begin dictating, or continue
on to the Vocabulary Builder and/or Quick Tour before beginning dictating.
When the first window appears, along with some explanatory instructions,
the user is asked to read the material which comes up in blue in the
dialogue box. As the material is recognised, the text turns black, and
when a paragraph is completed, it moves on to the next window. You can
pause, go back or skip words that are not recognised. A yellow arrow
comes up to mark where you should begin reading, or if text is misrecognised.
Punctuation does not have to be read in the Training section.

Figure 4: General Training Dialogue Box
TIP: If a
person is unable to read the text due to a vision problem, such as a
vision impairment or colour blindness, or if they have a Learning Disability
or other impairment, this phase can be managed, with assistance, by
having someone read the text, then having the person training repeat
that text into the microphone. The speech of the person doing the initial
reading will not be picked up by the microphone, even if the person
is sitting close by and speaking normally. Screen Readers such as JAWS
will not work with this section of the program.
Once five introductory paragraphs are successfully read and recognised,
the user is then able to choose from a number of passages to do the
training. Choices include extracts from "3001: A Space Odyssey,"
"Alice in Wonderland," "Dave Barry in Cyberspace"
and "Dogbert's Top Secret Management Handbook." If you system
allows it, you can also choose from a shorter, explanatory passage "Speaking
to Your Computer." The Dogbert and Dave Barry passages are quite
funny, and reading them (for the first few times, at least) is not particularly
onerous.
After the required amount of material is read, the program will give
you the option of finishing the training at that stage, or continuing.
The initial training is usually enough to produce a level of recognition
accuracy in the high 90%s. If you choose "Finish" the program
will then analyse and calibrate the information and create the voice
file for that user. After this is done, you can either begin dictation,
or take the option to run Vocabulary Builder and take the Quick Tour.
Both of these functions can be deferred until a later date if required.
CONTENTS
4. Vocabulary Builder
Running the Vocabulary Builder is a very worthwhile exercise. Recognition
accuracy will be enhanced, and much time will be saved in training new
words which are not in the program's standard vocabulary.
TIP: This
step is particularly helpful when subject and user specific words and
acronyms are likely to be used, such as specialist vocabulary from university
subjects.
The process simply involves asking the program to analyse existing
computer files, such as Word or RTF files, find the words it does not
have in its existing vocabulary (it's surprising how many typographical
errors are picked up!) and give you the opportunity to train them. The
program then gives the user the option of having the writing style analysed
as well, storing information about the type of writing preferred by
the user in their voice file. Again, this enhances recognition accuracy,
since the program analyses the sounds made, tries to match them with
its vocabulary, then analyses the context and voice file for information
about each user's writing history and style before finally deciding
on the most likely word to transcribe.
The Vocabulary Builder operates like a Wizard, and takes the new user
step by step through the training process. Vocabulary Builder can be
run at any time, by accessing the "Tools" menu and selecting
"Vocabulary Builder." If you choose to postpone this section,
the program will give you the opportunity to run it each time the program
is loaded, until it is completed, or the "Don't remind me again"
box is clicked.
Vocabulary Builder follows three or four basic steps:
o Transfer words from an existing list into the vocabulary (optional),
o Analyse documents to find new words,
o Select words to train
o Analyse and adapt to preferred writing style.

Figure 5: Vocabulary Builder Dialogue Box
Once this process has been completed, recognition accuracy should be
improved, and the number of corrections and training of new words should
be reduced.
TIP: It is
important to correct new words that are misrecognised, using the Correction
dialogue box, since this teaches the program that this is your preferred
vocabulary. It may not choose the new word once or twice, but after
a couple of corrections, it will be given a much higher priority for
use, and will probably be recognised first time thereafter.
CONTENTS
5. Quick Tour
The Quick Tour will take you through the main features and operations
of the program. It uses separate video screens which require the use
of a mouse to operate, so will pose difficulties for people with manipulation
issues. Assistance may be required for this phase.
However, it is a worthwhile process, since it does give a good overview
of the main features of the program, and examples of how to use particular
functions. When combined with the Quick Reference Card, which lists
the major commands for formatting, navigation, correction, punctation,
etc, a new user can be successfully dictating text, with a high level
of accuracy, quite quickly.
The Quick Tour in Version 5 has been significantly enhanced by incorporating
an interactive element into the process. The program outlines information
about various aspects of NaturallySpeaking, then gives you the opportunity
to apply that knowledge in NaturallySpeaking, with a series of exercises.
This speeds up the learning process, since the new user gets to actually
use the knowledge being provided, instead of merely receiving information.
You can quickly learn to use basic commands, and explore some of the
functionality in the program.


Figures 6 & 7: Quick Tour Screens
CONTENTS
6. Dictation
Once the New User Wizard has been completed, the process of dictation
can begin. Technically, the user should have a quite high level of accuracy
at this stage, but there are a number of adjustments to be made to develop
a sufficiently high level of confidence and fluency in the use of the
program.
The first step is to learn the commands for various functions such
as:
- Formatting
- Correction
- Navigation
- Editing
- Punctuation
To facilitate this process, NaturallySpeaking comes with a Quick Reference
Card, which lists the commands for the major functions. With regular
use, the commands quickly become familiar, but initially, it is desirable
to have the card handy to check what you can say to achieve particular
results. Once in the program, most functions and features can be readily
accessed by voice, and it is simply a matter of learning how to say
things to achieve the desired effect.
TIP: It is
important to remember to pause briefly before speaking a command. If
you do not pause, the program will interpret the command as dictation.
If you pause in between words in a command phrase, the program will
not recognise it as a command. Pressing the CTRL key while you speak
forces NatSpeak to recognise a command.
One of the major adjustments people have to make in using Voice Recognition
software is the mental shift required to feel comfortable in dictating
formal written English. It is a very different process to typing, and
requires practice and persistence to develop fluency in this different
process. Most experienced keyboard users "think out the end of
their fingers." They compose the material and type it as they are
thinking of what to write. The process of dictation requires a person
to compose the material, dictate phrases and sentences, then stop, compose
the next passage, dictate that, and so on. Initially, this can be very
difficult and gives the feeling of lacking fluency.
TIP: To become
more familiar with the process of dictation, it is a good idea to avoid
composing new material for a while, and simply read material into the
program. By reading, the new user is becoming familiar with the process
of dictation, learning to use commands and exploring the capabilities
of the program without having to deal with the adjustment to composing
for dictation. Another advantage is that personalised vocabulary and
writing style are being developed, so that when you move on to creating
new material, your voice file is much more mature, and require less
corrections. This process should take several hours, until the user
is comfortable with the requirements of dictation.
After this period of familiarisation through reading, trying some straightforward
dictation, using the different commands should be the next step. Again,
it is important not to be too ambitious, in the early stages. It is
more important to develop the "Stop, Compose, Dictate" approach,
rather than trying to complete your PhD Thesis by Friday! If you dictate
in single words, you actually starve the Dragon of important information,
and recognition accuracy suffers. Feed your Dragon plenty of information,
and it will be very kind to you, by quickly and accurately transcribing
your words.
This is the point where many people come unstuck, and is the cause
of many failures to continue with the technology. By trying to leap
ahead to full-scale composition and dictation, without the necessary
period of training and familiarisation, the inherent difficulties of
dictation are magnified.
Before beginning composition/dictation, the user should:
- Be comfortable with the process of dictating to the computer;
- Know the main commands necessary for punctuation, navigation,
correction and formatting;
- Have a reasonably mature voice file, with an adequate personal
vocabulary, experience in transcribing your main writing styles and
a high level of accuracy;
- Have something to say (i.e. a plan of what you are going to
write)
CONTENTS
Conclusion
Even when the high motivation factor for people with disabilities is
taken into consideration, we have had a high acceptance and success
rate with this method of training. These factors apply equally to people
with and without disabilities, and provide an effective and longer-lasting
means of adapting to the new technology. The great danger with Voice
Recognition is to fall for the trap that "You can be up and running
in less than an hour." Technically, for an experienced user, this
is correct, but making the mental shift to an entirely different way
of working is more of a long-term process. Intelligent, confident people
can dig very large holes for themselves and become jibbering wrecks
if they don't know appropriate commands for navigation, correction and
punctuation. People can become very stressed if they try to compose
new material with adequate experience in dictation.
It is important that new users are aware of the potential problems
and adjustments necessary for effective use of this technology and appropriate
methods of dealing with them.
Related Link: http://www.speechcontrol.com/articles/Keys%20to%20dictation.htm
CONTENTS
Guide to Training Voice Recognition