Skip Navigation | ANU Home | Search ANU | Students | Staff
The Australian National University
Division of Registrar & Student Services
Disability Services Unit
Printer Friendly Version of this Document
AT Project Logo and Link to Index page

Progress Report 1 on Voice Recognition Research Project

The Universities Disabilities Co-operative Project (NSW) approved the submission for funding a research project into Tertiary Education Applications for Students with a Disability and the project began in February, 1998. The project involved the purchase of a Laptop Computer, related software and other necessary materials, the setting up and testing the computer and software, and the evaluation of the programs for use with students with a disability.

The project has been progressing well, although there have been some delays due to equipment delays and a period when the Project Co-ordinator, Trevor Allan, was not employed as an RDLO. During this period, Trevor Allan was employed for one day a week as a consultant to the project, and Trevor Wilks, Manager of the Adtech Centre at the University of Newcastle, accepted responsibility for the co-ordination of the project. Trevor Allan returned to the position of Northern NSW RDLO on the 18th May, and has resumed co-ordination of the Project.

The computer was purchased and arrived in March, 1998, and the available software, Dragon NaturallySpeaking Deluxe 2.02 and IBM ViaVoice were purchased and installed. The third program, Lernout & Hauspie Voice Xpress, became available in late June, and was also installed. We have also purchased a Tape Recorder and adaptor, and have ordered the most recent version of the JAWS 3.0 Screen Reader program to test its compatibility with the Voice Recognition Software. Due to microphone compatibility problems between the supplied microphone and the Audio System in the Laptop computer, a new microphone, with built-in adaptor has been ordered from the United States, through the Australian importers, Auscript. Both Trevor Wilks and Trevor Allan have subscribed to the Voice Users List, and regularly follow up information on the Internet. We have also purchased a Zip Drive to experiment with the tranferring of Voice Files between computers (See Section Transferring Voice Files).

Some time has been necessary to learn about the operation and characteristics of the programs, to train ourselves and the programs, and to learn how they may be effectively used with students with a disability. Also, because of continued development with the software, we are having to try to keep up to date with the latest versions and developments.

However, we have made a lot of progress in a number of areas.

1. Demonstrations/Consultations

Both Trevor Allan & Trevor Wilks have made a number of presentations/Consultations of the Voice Recognition software at a variety of venues:

Trevor Allan:

  • The University of Wollongong (2 sessions)
  • Southern Cross University (Lismore)
  • University of New England (Armidale)
  • Central Sydney Universities (UTS)
  • Setting Directions Seminar (State Library) - brief
  • Booked to present at UWS in September
Trevor Wilks:
  • University of Newcastle (Callaghan 20 - 25)
  • University of Newcastle (Central Coast 3)
  • CRS Newcastle

Response to these presentations has been very positive and enthusiastic, with a high interest level shown by participants, and very positive feedback. In discussion about the most effective means of presenting our findings, we felt that demonstrating the capabilities of the products was a very effective starting point, since many people would not be able to visualise how effective this new technology could be, without seeing it in action. Once people are aware of the potential of the programs, and the possibilities for use with students with a disability, then written advice would take on more meaning. I have prepared a short Power Point presentation, which works in conjunction with the demonstration (See attachment)

2. Identified Potential Uses for VR Software

The use of Voice Recognition Software for Students with a Disability has a range of potential applications. Some disability groups to potentially benefit from this technology would be:

  • People with a keyboard or mouse impairment
  • People with RSI or related injuries
  • People with back or neck injuries
  • People with Hearing Impairment
  • People with Learning Disabilities (In some cases)
  • People with a Visual Impairment (In some cases, and with support initially)
  • People with a Mental Illness or Personality Disorder (In some cases)
  • People with a Brain Injury (In some cases)

All of these uses require some preparation and organisation to work effectively. All will require the Registration or Enrolment process to be completed before the programs can be used effectively (although the latest version of Voice Xpress will allow use without enrolment, but with less accurate recognition) and the more the programs are used, the better the recognition. This process takes approximately 30 minutes to an hour, plus about 15 minutes to process the Voice File. Because the program requires the user to read material on the screen, this poses problems for people with a visual impairment, or some people with a Learning Disability. TIP: This may be able to be addressed by having someone assist the registration process by dictating to the user what is on the screen, and for the user to then begin the recording process using their own voice.

As the programs have to work off a dictionary of potential words, if a specific vocabulary (say, vocabulary specific to a particular subject, such as Psychology) is to be used, the program needs to be trained to recognise those words. This can be done by using what is called a Topic Builder or Vocabulary Builder function in the program, or by simply training the words as you use the program. TIP: One suggestion is to simply read subject notes or textbooks to the program, training as you go. This allows the recognition of the subject-specific vocabulary, and for the revision and study of the subject matter by the student.

After Registration or enrolment, most Continuous Voice Recognition programs allow a dictation speed of about 140 to 160 words per minute, with a 95% plus accuracy. This can be affected by varying acoustics of the location (For example, an acoustically "live" room - i.e. lots of echoes from glass or reflective surfaces - can reduce accuracy), by a person's pronunciation and enunciation, by microphone position (It is important to position the microphone according to manufacturer's instructions) and by the speed of dictation. TIP: It is usually better to dictate faster with these programs, since they not only analyse the sounds, but also the context, and the more information you can give them, the better they can contextualise the words. The advice usually given is to dictate first, then edit and proofread later.

Some potential uses for students with a disability are:

  • As a scribe in examination situations. (N.B. The student would need to have a Voice File Registered on that computer,
  • with a subject-specific Vocabulary trained prior to the exam situation. Voice Files can be transferred between computers - see section below).
  • To enter information, do assignments, access Internet, research materials, etc. (Since the use of a computer has become fundamental to much of s student's work at university, having equal or near equal access and speed of use of a computer to enter and access information is a major means of redressing the disadvantage incurred by disability. N.B. if the student is unable to use a keyboard or mouse at all, they will need to use a Voice Control program such as Dragon Dictate, which comes bundled with NaturallySpeaking Deluxe (Ver 2.02) or Professional (Ver. 3.0). Again, Voice registration will be necessary, and speed, accuracy and recognition will improve with use and training of specific vocabularies. For all students using these programs, preparation early by developing a voice File will improve their capacity to use the program in the pressures of University Work).
  • Note Taking. (This is a bit more problematic, since it requires the Lecturer to have a Voice File Registered and subject-specific vocabulary developed. However, as this technology becomes more mainstream - its primary market - and widespread, many Lecturers may be using this software to prepare their Lectures, anyway. Also, the newer programs are incorporating technology to transcribe from audio disks and tapes, so the audio recordings of Lectures may be a potential source of these transcriptions. Again, because these programs require the user to define punctuation, punctuation marks will not be included in a transcription of a lecture. One possible technique for Hearing Impaired Students is the use of this technology, combined with a Data Projector to display the text of what the lecturer is saying on a screen as the lecture is being delivered, or through the use of an FM microphone to transmit to the students' computer containing the program and the Lecturer's Voice File. This area has potential, but needs further preparation and experimentation to determine the effectiveness and viability of this approach. The newer versions of programs coming out shortly should address some of the limitations of the current software, with features such as Best Match Technology and improved Audio taped transcription.)

3. Transferring Voice Files (In Dragon NaturallySpeaking)

The following section is an extract from a web page which provides some tips on using Dragon NaturallySpeaking. Since Voice Files occupy between 8Mb & 20Mb of Disk Space, a special Drive such as a Zip, Jazz or Super Drive needs to be used to handle files of this size.

Each user in Dragon NaturallySpeaking has a set of files which represents all of the information which Dragon NaturallySpeaking has learned about how you speak, and about how you write. This information is known collectively as speech files and consists of an acoustic file and a set of vocabulary files. (In the Deluxe Edition, there can be multiple sets of vocabulary files and also your macro file.)

The acoustic file which contains all of the information which Dragon NaturallySpeaking has learned about how you talk, represents a significant investment of time. Not only does this file contain information which was learned during General Training, but any time you perform a correction using the correction dialog, or trained one or more words using the Train Words Dialog, then that information is also stored in your acoustic file.

The set of vocabulary files contains all of the words which are currently active for your user, as well as any statistical information computed when you ran the Vocabulary Builder program. In addition, any dictation shorthands which you created are also stored in your vocabulary files.

Because your speech files represent such an investment in time, it is important to protect these as valuable data. Nobody likes having to train Dragon NaturallySpeaking all over again when the system crashes, or when they have to re-install. This section will explain where your speech files are located, and give you instructions on how to move them around.

In the Personal Edition version 1.0, your speech files are stored in the following directory: c:\NatSpeak\Users\Customer\current. This directory contains one file which contains your acoustic information called dd10user.usr and five files which contain your vocabulary information called dd10voc1.voc, dd10voc2.voc, dd10voc3.voc, dd10voc4.voc, and general.voc.

These file locations are shown in the following image taken from the Windows Explorer.

In version 2.0 of Dragon NaturallySpeaking, the directory structure is slightly different. The current directory actually contains two or more subdirectories. Your acoustic file is stored in c:\NatSpeak\Users\USERNAME\current\voice and is called dd10user.usr. Note that "USERNAME" will be an abbreviation of the user name. Your vocabulary files are stored in c:\NatSpeak\Users\USERNAME\current\GeneralE. In the Deluxe Edition, it is also possible to have additional topics which are simply different sets of vocabulary files. In that case, the subdirectory name is formed from the topic name.

These file locations are shown in the following image taken from the Windows Explorer. This example is from the Deluxe Edition, and there are two topics -- General English (GeneralE) and Speech Recognition (SpeechRe).

HOW TO BACKUP YOUR SPEECH FILES

Because your speech files represent an investment of time, you should back them up. In that you should be backing up all of the critical data on your computer, but this Web site will restrict itself to discussing your speech files.

I recommend that you backup your acoustic files. You can also backup your vocabulary files, that is optional.

To backup your acoustic files, copy dd10user.usr to a safe place. This represent your training. In the unlikely event that your Dragon NaturallySpeaking configuration gets screwed up and you have to recover, you can copy back dd10user.usr so that you do not have to retrain from the beginning.

If you backup your vocabulary files, copy all five files with the extension of "voc". You must keep all five files together.

HOW TO RESTORE YOUR SPEECH FILES

In this scenario, we will assume that something has happened to your hard disk and you were forced to re-install Dragon NaturallySpeaking.

Re-install Dragon NaturallySpeaking from the CD-ROM. Then, start Dragon NaturallySpeaking. Dragon NaturallySpeaking will ask you for the name of your user. Type in the name of the user you want to restore. (It does not actually require that you use the same username, but it is convenient.)

Then Dragon NaturallySpeaking will ask you to run the Audio Setup Wizard, and then General Training. Click cancel on the Audio Setup Wizard, and then click cancel when General Training starts. Dragon NaturallySpeaking will then shutdown.

It is necessary to start Dragon NaturallySpeaking so that Dragon NaturallySpeaking will create the appropriate directory structure on disk for your user. But it is not necessary to run General Training since you will be using the already trained speech files which you saved.

Once Dragon NaturallySpeaking has terminated, copy the file dd10user.usr which is stored in a safe place back to its normal location (i.e. c:\NatSpeak\Users\USERNAME\voice).

If you also save your set a vocabulary files, you can copy them back as well. You are not required to backup your vocabulary files and I usually do not since they can easily be re-created by running the Vocabulary Builder.

Assuming that you have done everything right, when you start Dragon NaturallySpeaking again it will ask you to run the Audio Setup Wizard, but it will not ask you to run General Training. You should complete the Audio Setup Wizard to set the volume properly but then you should be able to use Dragon NaturallySpeaking as before.

HOW TO SHARE SPEECH FILES BETWEEN SYSTEMS

You can follow a similar procedure to share your speech files between two systems. For example, if you have trained Dragon NaturallySpeaking on your computer at work, and you want to copy your speech files to home where you have another copy of Dragon NaturallySpeaking, then you can take the file dd10user.usr home. And follow the previously describe procedure to restore your speech files (by running Dragon NaturallySpeaking once and canceling before training), except that you will restore the speech file which you took home your other system.

That said, I recommend training again if your computer system at home has a different sound card then your system at work.

4. Recommended Programs

The 3 programs currently available for Continuous Voice Recognition are:

  • Dragon NaturallySpeaking (Personal, Preferred & Deluxe Editions; $300 to $1,300)
  • IBM ViaVoice ($300) (98 Version coming soon)
  • Lernout & Hauspie Voice Xpress ($300)

Among experienced Voice Users on the Voice Users List, and from experience with the programs, Dragon NaturallySpeaking seems to be the most popular and effective of the three. It is generally regarded as having a better recognition accuracy, and features such as multiple options in the Correction Dialogue section are very helpful. A new version 3.0 has been released in America, and has a number of improvements over the existing versions, notably Best Match technology, which reportedly increases accuracy substantially, improved pre-recorded transcription, and Natural Language Commands, which has a number of different ways of issuing the same commands, and a much more natural way of using the program to format and edit documents. It is also compatible with more programs. Version 3.0 is due to arrive in Australia in late July, and to be available for sale in early August.

Features of Version 3.0 are outlined below (From Dragon Press Release)

New York, N.Y. - June 16, 1998 - The speech recognition PC software rated by independent reviewers as the world's best in accuracy just became even more accurate with new BestMatch™ technology from Dragon Systems of Newton, Mass. Dragon NaturallySpeaking Version 3.0, announced today at PC Expo in New York, incorporates BestMatch technology, evaluated to be about 25% more accurate than Version 2.0. Last year, various major publications reported Version 2.0 accuracy rates of 95% to 98%.

Other major enhancements include new Natural Language Commands, which allow users to edit and format documents by speaking commands in a more natural way, an enlarged active vocabulary, and Dragon NaturallyMobile™ software, which makes it easy to create documents using a hand-held recorder.

"Accuracy and ease-of-use are the most important features in a continuous speech recognition product for dictation and it is what customers request the most. But, improvements in accuracy are the most challenging improvements to make," said Dr. Janet Baker, President and Co-Founder of Dragon Systems. "Our accomplishments in this area are the result of our extensive on-going investments in research and development, which keep Dragon NaturallySpeaking as the leading speech recognition product in the marketplace."

Natural Language Commands build on the revolutionary Select-and-Say™ editing and formatting first introduced by Dragon Systems. Instead of requiring users to memorize a specific command, such as "bold that", the new Natural Language Commands recognize a wide variety of ways in which a user may issue a command. "Make that bold", "Bold the last paragraph", "Set font bold" will all accomplish the same task, as will many more conversational commands.

Dragon NaturallySpeaking now recognizes hundreds of thousands of ways in which a person could issue a command; however, the user only needs to say what comes naturally.

For the first time, Dragon products are designed with custom features that make it easier for users to create accurate documents with a hand-held recorder. Users can simply speak into an approved device, even if they are miles away from their personal computer. When the user returns to their computer, the Dragon NaturallyMobile software with Dragon NaturallySpeaking automatically transcribes the recording with the same high level of accuracy that is found in Dragon NaturallySpeaking.

"Whether the user is an executive road warrior, a commuter, a lawyer, or a physician that moves from place to place, the new Dragon NaturallyMobile software adds a new level of convenience to Dragon's software," Dr. Baker said.

BENEFITS AND FEATURES:

  • TRUE CONTINUOUS SPEECH - Users speak naturally and at a normal pace without the need to pause between words. Dragon NaturallySpeaking was the first product on the market that recognized large vocabulary general purpose continuous speech.
  • HIGHLY ACCURATE - Industry leading BestMatch™ technology is designed to improve accuracy for most users by an additional 25%. The accuracy in previous versions of Dragon NaturallySpeaking was already reported to be from 95% to 98%.
  • ENLARGED ACTIVE VOCABULARY - The enlarged active vocabulary comes with more than 62,000 words that are ready to use. A total of 230,000 words is on disk and can be automatically retrieved by the system. Each word contains spelling, pronunciation and language usage information for high accuracy. Users can customize the vocabulary by inserting up to 54,000 new words, proper names, and specialized terms.
  • DRAGON NATURALLYMOBILE™ SUPPORT - Users can create documents by speaking into a portable recording device, such as the Sony® MZ-R30 palm-sized mini-disc recorder and the Norcom Model 2500 hand-held tape dictation machine. The Dragon NaturallyMobile software enhances Dragon NaturallySpeaking by making it easier to transcribe the recordings right into the user's document. (Dragon NaturallyMobile is available in the Preferred, Professional, Legal and Medical product versions).
  • WORKS IN VIRTUALLY ANY WINDOWS APPLICATION - Users can just point their mouse and click in an application. Their text automatically appears in the text window where the cursor is. The product works well with Microsoft Word, Corel WordPerfect, Lotus WordPro, e-mail packages, Internet chat packages and many more.
  • NATURAL LANGUAGE COMMANDS - Easy to use and easy to remember commands can be issued in a natural way to edit and format text within Microsoft Word 97. For example, a user can say "Create a table of five columns and two rows, "Make a five by two table," or many other variations.
  • COREL® WORDPERFECT® SUPPORT - Dragon NaturallySpeaking Version 3.0 is the only product that comes fully integrated into both of the two leading word processors, Microsoft Word 97 and Corel WordPerfect 8. Integration makes it easier to create documents right in the application that is most familiar to the user.

DRAGON NATURALLYSPEAKING PRODUCT FAMILY

  • Dragon NaturallySpeaking STANDARD - This value packed edition includes all of the major features that made Dragon
  • NaturallySpeaking America's #1 selling continuous speech product, according to PC Data. It includes Dragon's BestMatch technology for superior accuracy, Natural Language Commands with Select-and-Say editing, dictation into virtually any Windows application and more.
  • Dragon NaturallySpeaking PREFERRED - Contains all of the features of Dragon NaturallySpeaking Standard and adds features designed for business and other users, including: Dragon NaturallyMobile for transcription of recorded speech with recorded speech playback and text-to-speech for easier editing.
  • Dragon NaturallySpeaking PROFESSIONAL - Contains all of the features of Dragon NaturallySpeaking Preferred and adds sophisticated customization features for the user who creates significant amounts of text. In addition, the Professional Edition adds advanced macro support which allows for total control of forms, the ability for users to add and customize vocabularies and DragonDictate 3.0 for complete hands-free use.
  • Dragon NaturallySpeaking Legal Suite - Dragon NaturallySpeaking Legal Suite is the first Dragon NaturallySpeaking continuous speech recognition software product designed specifically for the legal market. Dragon NaturallySpeaking Legal Suite allows attorneys, judges, paralegals, legal secretaries and others in the legal profession to improve their productivity by creating documents, briefs, notes and correspondence completely by voice.

The Legal Suite adds a comprehensive legal vocabulary, support for Corel WordPerfect and Microsoft Word, and a copy of Corel WordPerfect Legal Suite 8 to the already extensive list of Dragon NaturallySpeaking Professional features. These include multiple user and topic configurations, increased active vocabulary sizes, text-to-speech capabilities, recorded speech, and integration with DragonDictate® software which allows for complete hands-free operation of a PC.

  • Dragon NaturallySpeaking Medical Suite - Dragon NaturallySpeaking Medical Suite is a continuous speech recognition
  • system specifically designed for medical professionals to create patient records, medical reports, notes, correspondence and other documents. Medical professionals can dictate directly into Microsoft® Word, Corel® WordPerfect®, as well as custom medical reporting applications.
  • Dragon NaturallySpeaking Mobile Suite - The new Dragon NaturallySpeaking Mobile Suite for Medical and Legal products use the latest version of Dragon NaturallySpeaking continuous speech recognition software, including Dragon NaturallySpeaking Legal Suite or Medical Suite Version 3.0, Dragon NaturallyMobile™ software and an innovative palm-sized recorder from Sony. Users create text simply by speaking naturally into the Sony MZ-R50 mini-disc recorder, even if they are miles away from a PC. Later, the recording can be transcribed by Dragon NaturallySpeaking software using new features designed to make audio transcription easier. The new product is aimed at medical, legal and other professionals who need the highest accuracy and the most flexibility in their work. They can create briefs, reports, correspondence, email or other documents, while away from the office or when moving from one location to another.

AVAILABILITY AND SYSTEM REQUIREMENTS

Dragon NaturallySpeaking is scheduled to start shipping by the end of June. It supports Windows 98, Windows 95 and Windows NT. It requires a 133 MHz Pentium Processor IBM compatible PC, 32 MB RAM for Windows 95, and 48 MB RAM for Windows NT. To take advantage of improved accuracy with Dragon's BestMatch Technology, users require an additional 16 MB RAM. Dragon NaturallySpeaking Version 3.0 supports a broad range of built-in audio and industry standard sound cards, including Creative Labs SoundBlaster 16 and compatibles, as well as notebooks with built in 16-bit audio. Proprietary speech cards are not required for either desktops or portables. Users should refer to the latest compatibility list on the Dragon web site before they install the program: www.dragonsys.com/techsupport/complist/nscmplst.html.

L & H Voice Xpress has made substantial advances over earlier products. The use of Natural Language Commands is a substantial advance in ease of use, and is designed to operate directly in Word 97. Outlined below is a press release of some of the features of Voice Xpress:

Key Features:

Create Documents Directly in Microsoft® Word - You can create text, format, and edit documents all by voice and all directly within Microsoft Word - no need to cut and paste. If you don't use Microsoft Word, you can use the L&H Voice Xpress word processor create documents and then simply copy and paste the text into your favorite Windows application. You can create text for virtually any document -- small messages, chat room dialogues, and formal documents.

Natural Language Technology - Our unique Natural Language Technology lets you "Say It Your Way," enabling L&H Voice Xpress Plus to interpret your navigation, formatting and editing commands … making L&H Voice Xpress Plus easy to learn and more powerful than other voice programs.

Continuous Speech Technology - Lets you create text by dictating in a natural, conversational manner. No need to pause between words, so you can "type" up to 140 words per minute.

Create Entire Documents By Voice - When using L&H Voice Xpress Plus in Microsoft Word or in the L&H Voice Xpress word processor, you don't have to use your hands at all. Use your voice to quickly navigate the application menus and dialog boxes, or integrate keyboard and mouse with verbal control to maximize your efficiency.

Hundreds of Built-In Commands - Navigate, edit and format your documents with simple voice commands.

Outstanding Accuracy - L&H Voice Xpress Plus understands you without any training, and over time, L&H Voice Xpress Plus can automatically adapt to your voice, boosting ongoing accuracy upto 95% or higher. L&H Voice Xpress Plus even offers unique speech profiles developed too boost the accuracy of teen-agers and children.

Large, Customizable Vocabulary - L&H Voice Xpress Plus will understand you because it has a 30,000-word vocabulary that contains the words you use every day. Additionally you can add up to 30,000 words or phrases that are specific to your work, such as people's names, acronyms, and industry-specific terms for a total vocabulary of 60,000 words. You can even use L&H Voice Xpress Plus to scan documents on your PC for words you want to add to the L&H Voice Xpress Plus vocabulary. So easy!

Easy Correction Using Voice, Keyboard or Mouse - Use the method that makes you more productive!

Ability to Add Dictation SmartText - You can automate common tasks by creating a voice macro that inserts a complete block to often used text.

Text-To-Speech - You can hear your documents read back to you, making them easier to edit.

Microphone Included - A high quality noise-canceling headset specifically designed for speech recognition is inside.

Network Support - Install L&H Voice Xpress Plus on a network server and you can use L&H Voice Xpress Plus to create documents on any network client. If you are a systems administrator and you need to backup files, you need backup only the server.

Support for Multiple Users - If several people share the same PC, they can all use L&H Voice Xpress Plus to improve their productivity.

Natural Speech for Number, Dates, Dollar Amounts - With L&H Voice Xpress Plus, you not only dictate words in a natural manner, but you can also enter numbers, dates, and dollar amounts in your natural speech. For example, you say, "three thousand and four dollars" and L&H Voice Xpress Plus types "$3,004."

No Initial Training Required - You can boost your productivity immediately by using L&H Voice Xpress Plus right out of the box.

System Requirements

L&H Voice Xpress and L&H Voice Xpress Plus have the following minimum system requirements:

oPentium® 166 MHz Processor with MMX oWindows® 95 or Windows NT® 4.0 (with Service Pack 3) oA 16-bit sound card from Creative Labs® or other Sound Blaster®-16 compatible 16-bit sound cards. oApproximately 130MB of hard disk space o40MB of RAM if running on Windows 95 (additional 8MB RAM required for dictation directly to Microsoft Word with L&H Voice Xpress Plus) o48MB of RAM if running on Windows NT oCD-ROM drive oSpeakers (QuickTour tutorial, Help examples, and Text to Speech only) oMicrosoft Word 95 or Word 97 (i.e., versions 7.0 and 8.0) for dictation directly to Microsoft Word - This requirement/capability applies to L&H Voice Xpress Plus only

Some sound cards and notebook computers may require an auxiliary power supply to work with the microphone supplied with L&H Voice Xpress. Consult the microphone information inside the package for purchase details and requirements.

Some sound cards and notebook computers may have internal electrical noise that can adversely affect recognition. Consult the L&H Voice Xpress Hardware Compatibility List for notebook computers and sound cards that we have found to be compatible with L&H Voice Xpress and L&H Voice Xpress Plus.

Some notebook computers exhibit slower performance and may require a higher processor speed.

Outlined below is information from IBM's Home Page on the new version of ViaVoice 98

ViaVoice 98 products are easy and natural to use and offer customers:

  • User Wizard to help get started quickly and easily
  • Multimedia Quick Tours and VoiceTips to augment the learning process
  • Largest active vocabulary in the industry
  • 64,000 word base
  • 64,000 word add word capability
  • 260,000 word backup dictionary
  • Voice correction
  • "Natural Language Commands" in Microsoft Word 97 for easy formatting and editing
  • Natural dictation of numbers, weights, prices, etc...that will properly format
  • Modeless operation or the ability to use commands during dictation
  • Incremental enrollment process with choice of training scripts
  • VoiceText dictation macros for frequently used text (e.g. addresses)
  • VoiceForm dictation templates for boilerplate text (e.g. stationery)
  • Deferred and delegated correction
  • Multiple user and multiple environment support
  • Specialized vocabulary Topic included with each package.
  • Support of the Microsoft Windows 95 and Windows 98 operating systems
  • ViaVoice 98 Executive and Office products support Microsoft Windows NT 4.0.
  • Significant advances to the ViaVoice 98 speech recognition engine. Sampling of speech in put is now at 22KHz versus 11KHz. This means that a wider range of voices can be accommodated including most teenage children. Additionally, we now offer a variety of enrollment, or training scripts, from which the user can select.
With ViaVoice 98 You Can...

  • Create finished, formatted documents fast and easy
  • Use intuitive new "natural language commands" in Microsoft Word 97 for easy formatting and editing
  • Dictate directly into Microsoft Word 97, ViaVoice Speak Pad and with the ViaVoice 98 Executive Edition, into the most popular Windows applications
  • Add up to 64,000 personal words and phrases to the 64,000 word base vocabulary.
  • Activate additional ViaVoice Topics to further expand your active vocabulary
  • Dictate numbers, dates, times and prices the way your usually speak them
  • Create VoiceText dictation macros and VoiceForm dictation templates to make shortcuts for frequently used text.
  • Have documents or e-mail read aloud to you using our text-to-speech feature - ViaVoice OutloudTM
  • Find what commands you can say just by asking 'What Can I Say'.

ViaVoice 98 System Requirements

Microsoft Windows 95, Windows 98 or Windows NT 4.0* oProcessor performance equivalent to Intel Pentium 166MHz with MMX with 256K L2 cache (these include: IBM 6X86MX PR166; Cyrix 6X86MX PR166; and AMD K6 200MHz the AMD K6 3D or AMD K6 166 MHz, each with at least 256K of L2 cache) oMemory Requirements: Microsoft Windows 95/ Microsoft Windows 98 RAM: 32MB (48MB if dictating into Microsoft Word 97) and Microsoft Windows NT 4.0 RAM: 48MB (64MB if dictating into Microsoft Word 97) 180MB of available space on the hard disk oMicrosoft Windows 95 or Windows NT compatible 16 bit sound card (with a microphone input jack) with good recording quality Double speed CD-ROM drive or faster.

5. Conclusion

The potential uses for this technology by students with a disability are quite exciting. Developments are occurring rapidly, with a great deal of money and resources being dedicated to VR research & Development (E.g. Microsoft recently announced a 30% increase in its Research & Development budget, with 50% of the total budget to be devoted to voice research for the next 4 years. The aim is to make every Microsoft product completely voice compatible in and out by 2001.) It is not a "cure-all" answer for all students with a disability, and it does require some preparation and lead time to be used to its potential. If a student could benefit from this technology, organising a trial of the program, and the development of a voice file as soon as possible is desirable. The situation is changing rapidly at the moment, with new versions of the major products just released or about to be released. The use of Natural Language Commands is a great advance, which allows a more natural interaction with the programs. They are also being designed to work directly in a wider range of programs.

Unfortunately, there are no continuous voice recognition programs currently available for the Macintosh platform, although the PC Compatible Macintoshes may be able to run the programs under certain circumstances. I am currently working with Geoff Muldoon from IT Support at Southern Cross University, to test some aspects of the use of this software on Macintosh machines. One of the problems may be the sound card, which does not support the type of microphones supplied with the programs.

With all of these products, a relatively recent and powerful computer is necessary. Probably the minimum desirable platform is a Pentium 200 with 64 Mb of RAM. It is also desirable to check with the web pages of the manufacturers, to check recommended Sound Cards and Laptop Computers.

At this stage, the Project is developing very effectively, and has recovered from the delays and disruptions caused by problems with the supply of equipment, and the gap in Trevor Allan's employment as an RDLO. The next few months are very exciting, with the new product developments, and the opportunity to test some of these applications more extensively.

Trevor Allan

Attachment 1:

HOW TO TALK TO NATURALLY SPEAKING

The way you talk to Dragon NaturallySpeaking can have a big impact on how accurate the recognition results are. Here are some basic tips:

  • Talk at a normal pace. You do not have to talk slowly ton NatSpeak even if you have a slow machine. Talking slower
  • will actually degrade recognition accuracy.
  • Talk continuously. NatSpeak recognizes longer utterances much better than shorter ones. Avoid speaking one word at time since this does not help accuracy (although you can speak a single word while correcting.)
  • Verbalize every word. NatSpeak can not read your mind. If you want NatSpeak to recognize a word, you must say the word. In conversations between people, speakers can sometime drop words or slur words together. If you drop a word, do not expect NatSpeak to find it. That said, NatSpeak does include shortened pronunciations of small function words like "n" for "and".
  • Talk the way you trained. When you ran General Training, you spoke a certain way and Dragon NaturallySpeaking learnt the way you talk from those sample. If you speak like you are reading something, your accuracy may improve. Similarly, you can rerun General Training, trying to train the way you dictate.
  • Think first, talk second. Composing while dictating is a new skill for many people. The trick is to think ahead. Try to think about what you are going to say before you say it. Then, when you say it, your dictation will be clearer because you already know what you are going to say. If you talk without thinking, you will find that your words get mumbled as you change your mind as you speak.
  • Dragon NaturallySpeaking will understand "natural" speech but not "conversational" speech. Pretend that you are a broadcast journalist reading the news from behind an anchor desk. Not like you are talking to your friend over coffee.