![]() |
Division of Registrar & Student Services
Disability Services Unit
|
Progress Report 1 on Voice Recognition Research ProjectThe Universities Disabilities Co-operative Project (NSW) approved the submission for funding a research project into Tertiary Education Applications for Students with a Disability and the project began in February, 1998. The project involved the purchase of a Laptop Computer, related software and other necessary materials, the setting up and testing the computer and software, and the evaluation of the programs for use with students with a disability. The project has been progressing well, although there have been some delays due to equipment delays and a period when the Project Co-ordinator, Trevor Allan, was not employed as an RDLO. During this period, Trevor Allan was employed for one day a week as a consultant to the project, and Trevor Wilks, Manager of the Adtech Centre at the University of Newcastle, accepted responsibility for the co-ordination of the project. Trevor Allan returned to the position of Northern NSW RDLO on the 18th May, and has resumed co-ordination of the Project. The computer was purchased and arrived in March, 1998, and the available software, Dragon NaturallySpeaking Deluxe 2.02 and IBM ViaVoice were purchased and installed. The third program, Lernout & Hauspie Voice Xpress, became available in late June, and was also installed. We have also purchased a Tape Recorder and adaptor, and have ordered the most recent version of the JAWS 3.0 Screen Reader program to test its compatibility with the Voice Recognition Software. Due to microphone compatibility problems between the supplied microphone and the Audio System in the Laptop computer, a new microphone, with built-in adaptor has been ordered from the United States, through the Australian importers, Auscript. Both Trevor Wilks and Trevor Allan have subscribed to the Voice Users List, and regularly follow up information on the Internet. We have also purchased a Zip Drive to experiment with the tranferring of Voice Files between computers (See Section Transferring Voice Files). Some time has been necessary to learn about the operation and characteristics of the programs, to train ourselves and the programs, and to learn how they may be effectively used with students with a disability. Also, because of continued development with the software, we are having to try to keep up to date with the latest versions and developments. However, we have made a lot of progress in a number of areas.
1. Demonstrations/ConsultationsBoth Trevor Allan & Trevor Wilks have made a number of presentations/Consultations of the Voice Recognition software at a variety of venues: Trevor Allan:
Response to these presentations has been very positive and enthusiastic, with a high interest level shown by participants, and very positive feedback. In discussion about the most effective means of presenting our findings, we felt that demonstrating the capabilities of the products was a very effective starting point, since many people would not be able to visualise how effective this new technology could be, without seeing it in action. Once people are aware of the potential of the programs, and the possibilities for use with students with a disability, then written advice would take on more meaning. I have prepared a short Power Point presentation, which works in conjunction with the demonstration (See attachment)
2. Identified Potential Uses for VR SoftwareThe use of Voice Recognition Software for Students with a Disability has a range of potential applications. Some disability groups to potentially benefit from this technology would be:
All of these uses require some preparation and organisation to work effectively. All will require the Registration or Enrolment process to be completed before the programs can be used effectively (although the latest version of Voice Xpress will allow use without enrolment, but with less accurate recognition) and the more the programs are used, the better the recognition. This process takes approximately 30 minutes to an hour, plus about 15 minutes to process the Voice File. Because the program requires the user to read material on the screen, this poses problems for people with a visual impairment, or some people with a Learning Disability. TIP: This may be able to be addressed by having someone assist the registration process by dictating to the user what is on the screen, and for the user to then begin the recording process using their own voice. As the programs have to work off a dictionary of potential words, if a specific vocabulary (say, vocabulary specific to a particular subject, such as Psychology) is to be used, the program needs to be trained to recognise those words. This can be done by using what is called a Topic Builder or Vocabulary Builder function in the program, or by simply training the words as you use the program. TIP: One suggestion is to simply read subject notes or textbooks to the program, training as you go. This allows the recognition of the subject-specific vocabulary, and for the revision and study of the subject matter by the student. After Registration or enrolment, most Continuous Voice Recognition programs allow a dictation speed of about 140 to 160 words per minute, with a 95% plus accuracy. This can be affected by varying acoustics of the location (For example, an acoustically "live" room - i.e. lots of echoes from glass or reflective surfaces - can reduce accuracy), by a person's pronunciation and enunciation, by microphone position (It is important to position the microphone according to manufacturer's instructions) and by the speed of dictation. TIP: It is usually better to dictate faster with these programs, since they not only analyse the sounds, but also the context, and the more information you can give them, the better they can contextualise the words. The advice usually given is to dictate first, then edit and proofread later. Some potential uses for students with a disability are:
3. Transferring Voice Files (In Dragon NaturallySpeaking)The following section is an extract from a web page which provides some tips on using Dragon NaturallySpeaking. Since Voice Files occupy between 8Mb & 20Mb of Disk Space, a special Drive such as a Zip, Jazz or Super Drive needs to be used to handle files of this size. Each user in Dragon NaturallySpeaking has a set of files which represents all of the information which Dragon NaturallySpeaking has learned about how you speak, and about how you write. This information is known collectively as speech files and consists of an acoustic file and a set of vocabulary files. (In the Deluxe Edition, there can be multiple sets of vocabulary files and also your macro file.) The acoustic file which contains all of the information which Dragon NaturallySpeaking has learned about how you talk, represents a significant investment of time. Not only does this file contain information which was learned during General Training, but any time you perform a correction using the correction dialog, or trained one or more words using the Train Words Dialog, then that information is also stored in your acoustic file. The set of vocabulary files contains all of the words which are currently active for your user, as well as any statistical information computed when you ran the Vocabulary Builder program. In addition, any dictation shorthands which you created are also stored in your vocabulary files. Because your speech files represent such an investment in time, it is important to protect these as valuable data. Nobody likes having to train Dragon NaturallySpeaking all over again when the system crashes, or when they have to re-install. This section will explain where your speech files are located, and give you instructions on how to move them around. In the Personal Edition version 1.0, your speech files are stored in the following directory: c:\NatSpeak\Users\Customer\current. This directory contains one file which contains your acoustic information called dd10user.usr and five files which contain your vocabulary information called dd10voc1.voc, dd10voc2.voc, dd10voc3.voc, dd10voc4.voc, and general.voc. These file locations are shown in the following image taken from the Windows Explorer. In version 2.0 of Dragon NaturallySpeaking, the directory structure is slightly different. The current directory actually contains two or more subdirectories. Your acoustic file is stored in c:\NatSpeak\Users\USERNAME\current\voice and is called dd10user.usr. Note that "USERNAME" will be an abbreviation of the user name. Your vocabulary files are stored in c:\NatSpeak\Users\USERNAME\current\GeneralE. In the Deluxe Edition, it is also possible to have additional topics which are simply different sets of vocabulary files. In that case, the subdirectory name is formed from the topic name. These file locations are shown in the following image taken from the Windows Explorer. This example is from the Deluxe Edition, and there are two topics -- General English (GeneralE) and Speech Recognition (SpeechRe).
HOW TO BACKUP YOUR SPEECH FILESBecause your speech files represent an investment of time, you should back them up. In that you should be backing up all of the critical data on your computer, but this Web site will restrict itself to discussing your speech files. I recommend that you backup your acoustic files. You can also backup your vocabulary files, that is optional. To backup your acoustic files, copy dd10user.usr to a safe place. This represent your training. In the unlikely event that your Dragon NaturallySpeaking configuration gets screwed up and you have to recover, you can copy back dd10user.usr so that you do not have to retrain from the beginning. If you backup your vocabulary files, copy all five files with the extension of "voc". You must keep all five files together.
HOW TO RESTORE YOUR SPEECH FILESIn this scenario, we will assume that something has happened to your hard disk and you were forced to re-install Dragon NaturallySpeaking. Re-install Dragon NaturallySpeaking from the CD-ROM. Then, start Dragon NaturallySpeaking. Dragon NaturallySpeaking will ask you for the name of your user. Type in the name of the user you want to restore. (It does not actually require that you use the same username, but it is convenient.) Then Dragon NaturallySpeaking will ask you to run the Audio Setup Wizard, and then General Training. Click cancel on the Audio Setup Wizard, and then click cancel when General Training starts. Dragon NaturallySpeaking will then shutdown. It is necessary to start Dragon NaturallySpeaking so that Dragon NaturallySpeaking will create the appropriate directory structure on disk for your user. But it is not necessary to run General Training since you will be using the already trained speech files which you saved. Once Dragon NaturallySpeaking has terminated, copy the file dd10user.usr which is stored in a safe place back to its normal location (i.e. c:\NatSpeak\Users\USERNAME\voice). If you also save your set a vocabulary files, you can copy them back as well. You are not required to backup your vocabulary files and I usually do not since they can easily be re-created by running the Vocabulary Builder. Assuming that you have done everything right, when you start Dragon NaturallySpeaking again it will ask you to run the Audio Setup Wizard, but it will not ask you to run General Training. You should complete the Audio Setup Wizard to set the volume properly but then you should be able to use Dragon NaturallySpeaking as before.
HOW TO SHARE SPEECH FILES BETWEEN SYSTEMSYou can follow a similar procedure to share your speech files between two systems. For example, if you have trained Dragon NaturallySpeaking on your computer at work, and you want to copy your speech files to home where you have another copy of Dragon NaturallySpeaking, then you can take the file dd10user.usr home. And follow the previously describe procedure to restore your speech files (by running Dragon NaturallySpeaking once and canceling before training), except that you will restore the speech file which you took home your other system. That said, I recommend training again if your computer system at home has a different sound card then your system at work. 4. Recommended ProgramsThe 3 programs currently available for Continuous Voice Recognition are:
Among experienced Voice Users on the Voice Users List, and from experience with the programs, Dragon NaturallySpeaking seems to be the most popular and effective of the three. It is generally regarded as having a better recognition accuracy, and features such as multiple options in the Correction Dialogue section are very helpful. A new version 3.0 has been released in America, and has a number of improvements over the existing versions, notably Best Match technology, which reportedly increases accuracy substantially, improved pre-recorded transcription, and Natural Language Commands, which has a number of different ways of issuing the same commands, and a much more natural way of using the program to format and edit documents. It is also compatible with more programs. Version 3.0 is due to arrive in Australia in late July, and to be available for sale in early August. Features of Version 3.0 are outlined below (From Dragon Press Release) New York, N.Y. - June 16, 1998 - The speech recognition PC software rated by independent reviewers as the world's best in accuracy just became even more accurate with new BestMatch™ technology from Dragon Systems of Newton, Mass. Dragon NaturallySpeaking Version 3.0, announced today at PC Expo in New York, incorporates BestMatch technology, evaluated to be about 25% more accurate than Version 2.0. Last year, various major publications reported Version 2.0 accuracy rates of 95% to 98%. Other major enhancements include new Natural Language Commands, which allow users to edit and format documents by speaking commands in a more natural way, an enlarged active vocabulary, and Dragon NaturallyMobile™ software, which makes it easy to create documents using a hand-held recorder. "Accuracy and ease-of-use are the most important features in a continuous speech recognition product for dictation and it is what customers request the most. But, improvements in accuracy are the most challenging improvements to make," said Dr. Janet Baker, President and Co-Founder of Dragon Systems. "Our accomplishments in this area are the result of our extensive on-going investments in research and development, which keep Dragon NaturallySpeaking as the leading speech recognition product in the marketplace." Natural Language Commands build on the revolutionary Select-and-Say™ editing and formatting first introduced by Dragon Systems. Instead of requiring users to memorize a specific command, such as "bold that", the new Natural Language Commands recognize a wide variety of ways in which a user may issue a command. "Make that bold", "Bold the last paragraph", "Set font bold" will all accomplish the same task, as will many more conversational commands. Dragon NaturallySpeaking now recognizes hundreds of thousands of ways in which a person could issue a command; however, the user only needs to say what comes naturally. For the first time, Dragon products are designed with custom features that make it easier for users to create accurate documents with a hand-held recorder. Users can simply speak into an approved device, even if they are miles away from their personal computer. When the user returns to their computer, the Dragon NaturallyMobile software with Dragon NaturallySpeaking automatically transcribes the recording with the same high level of accuracy that is found in Dragon NaturallySpeaking. "Whether the user is an executive road warrior, a commuter, a lawyer, or a physician that moves from place to place, the new Dragon NaturallyMobile software adds a new level of convenience to Dragon's software," Dr. Baker said.
BENEFITS AND FEATURES:
DRAGON NATURALLYSPEAKING PRODUCT FAMILY
The Legal Suite adds a comprehensive legal vocabulary, support for Corel WordPerfect and Microsoft Word, and a copy of Corel WordPerfect Legal Suite 8 to the already extensive list of Dragon NaturallySpeaking Professional features. These include multiple user and topic configurations, increased active vocabulary sizes, text-to-speech capabilities, recorded speech, and integration with DragonDictate® software which allows for complete hands-free operation of a PC.
AVAILABILITY AND SYSTEM REQUIREMENTSDragon NaturallySpeaking is scheduled to start shipping by the end of June. It supports Windows 98, Windows 95 and Windows NT. It requires a 133 MHz Pentium Processor IBM compatible PC, 32 MB RAM for Windows 95, and 48 MB RAM for Windows NT. To take advantage of improved accuracy with Dragon's BestMatch Technology, users require an additional 16 MB RAM. Dragon NaturallySpeaking Version 3.0 supports a broad range of built-in audio and industry standard sound cards, including Creative Labs SoundBlaster 16 and compatibles, as well as notebooks with built in 16-bit audio. Proprietary speech cards are not required for either desktops or portables. Users should refer to the latest compatibility list on the Dragon web site before they install the program: www.dragonsys.com/techsupport/complist/nscmplst.html. L & H Voice Xpress has made substantial advances over earlier products. The use of Natural Language Commands is a substantial advance in ease of use, and is designed to operate directly in Word 97. Outlined below is a press release of some of the features of Voice Xpress: Key Features: Create Documents Directly in Microsoft® Word - You can create text, format, and edit documents all by voice and all directly within Microsoft Word - no need to cut and paste. If you don't use Microsoft Word, you can use the L&H Voice Xpress word processor create documents and then simply copy and paste the text into your favorite Windows application. You can create text for virtually any document -- small messages, chat room dialogues, and formal documents. Natural Language Technology - Our unique Natural Language Technology lets you "Say It Your Way," enabling L&H Voice Xpress Plus to interpret your navigation, formatting and editing commands … making L&H Voice Xpress Plus easy to learn and more powerful than other voice programs. Continuous Speech Technology - Lets you create text by dictating in a natural, conversational manner. No need to pause between words, so you can "type" up to 140 words per minute. Create Entire Documents By Voice - When using L&H Voice Xpress Plus in Microsoft Word or in the L&H Voice Xpress word processor, you don't have to use your hands at all. Use your voice to quickly navigate the application menus and dialog boxes, or integrate keyboard and mouse with verbal control to maximize your efficiency. Hundreds of Built-In Commands - Navigate, edit and format your documents with simple voice commands. Outstanding Accuracy - L&H Voice Xpress Plus understands you without any training, and over time, L&H Voice Xpress Plus can automatically adapt to your voice, boosting ongoing accuracy upto 95% or higher. L&H Voice Xpress Plus even offers unique speech profiles developed too boost the accuracy of teen-agers and children. Large, Customizable Vocabulary - L&H Voice Xpress Plus will understand you because it has a 30,000-word vocabulary that contains the words you use every day. Additionally you can add up to 30,000 words or phrases that are specific to your work, such as people's names, acronyms, and industry-specific terms for a total vocabulary of 60,000 words. You can even use L&H Voice Xpress Plus to scan documents on your PC for words you want to add to the L&H Voice Xpress Plus vocabulary. So easy! Easy Correction Using Voice, Keyboard or Mouse - Use the method that makes you more productive! Ability to Add Dictation SmartText - You can automate common tasks by creating a voice macro that inserts a complete block to often used text. Text-To-Speech - You can hear your documents read back to you, making them easier to edit. Microphone Included - A high quality noise-canceling headset specifically designed for speech recognition is inside. Network Support - Install L&H Voice Xpress Plus on a network server and you can use L&H Voice Xpress Plus to create documents on any network client. If you are a systems administrator and you need to backup files, you need backup only the server. Support for Multiple Users - If several people share the same PC, they can all use L&H Voice Xpress Plus to improve their productivity. Natural Speech for Number, Dates, Dollar Amounts - With L&H Voice Xpress Plus, you not only dictate words in a natural manner, but you can also enter numbers, dates, and dollar amounts in your natural speech. For example, you say, "three thousand and four dollars" and L&H Voice Xpress Plus types "$3,004." No Initial Training Required - You can boost your productivity immediately by using L&H Voice Xpress Plus right out of the box. System Requirements L&H Voice Xpress and L&H Voice Xpress Plus have the following minimum system requirements: oPentium® 166 MHz Processor with MMX oWindows® 95 or Windows NT® 4.0 (with Service Pack 3) oA 16-bit sound card from Creative Labs® or other Sound Blaster®-16 compatible 16-bit sound cards. oApproximately 130MB of hard disk space o40MB of RAM if running on Windows 95 (additional 8MB RAM required for dictation directly to Microsoft Word with L&H Voice Xpress Plus) o48MB of RAM if running on Windows NT oCD-ROM drive oSpeakers (QuickTour tutorial, Help examples, and Text to Speech only) oMicrosoft Word 95 or Word 97 (i.e., versions 7.0 and 8.0) for dictation directly to Microsoft Word - This requirement/capability applies to L&H Voice Xpress Plus only Some sound cards and notebook computers may require an auxiliary power supply to work with the microphone supplied with L&H Voice Xpress. Consult the microphone information inside the package for purchase details and requirements. Some sound cards and notebook computers may have internal electrical noise that can adversely affect recognition. Consult the L&H Voice Xpress Hardware Compatibility List for notebook computers and sound cards that we have found to be compatible with L&H Voice Xpress and L&H Voice Xpress Plus. Some notebook computers exhibit slower performance and may require a higher processor speed. Outlined below is information from IBM's Home Page on the new version of ViaVoice 98 ViaVoice 98 products are easy and natural to use and offer customers:
ViaVoice 98 System Requirements Microsoft Windows 95, Windows 98 or Windows NT 4.0* oProcessor performance equivalent to Intel Pentium 166MHz with MMX with 256K L2 cache (these include: IBM 6X86MX PR166; Cyrix 6X86MX PR166; and AMD K6 200MHz the AMD K6 3D or AMD K6 166 MHz, each with at least 256K of L2 cache) oMemory Requirements: Microsoft Windows 95/ Microsoft Windows 98 RAM: 32MB (48MB if dictating into Microsoft Word 97) and Microsoft Windows NT 4.0 RAM: 48MB (64MB if dictating into Microsoft Word 97) 180MB of available space on the hard disk oMicrosoft Windows 95 or Windows NT compatible 16 bit sound card (with a microphone input jack) with good recording quality Double speed CD-ROM drive or faster. 5. ConclusionThe potential uses for this technology by students with a disability are quite exciting. Developments are occurring rapidly, with a great deal of money and resources being dedicated to VR research & Development (E.g. Microsoft recently announced a 30% increase in its Research & Development budget, with 50% of the total budget to be devoted to voice research for the next 4 years. The aim is to make every Microsoft product completely voice compatible in and out by 2001.) It is not a "cure-all" answer for all students with a disability, and it does require some preparation and lead time to be used to its potential. If a student could benefit from this technology, organising a trial of the program, and the development of a voice file as soon as possible is desirable. The situation is changing rapidly at the moment, with new versions of the major products just released or about to be released. The use of Natural Language Commands is a great advance, which allows a more natural interaction with the programs. They are also being designed to work directly in a wider range of programs. Unfortunately, there are no continuous voice recognition programs currently available for the Macintosh platform, although the PC Compatible Macintoshes may be able to run the programs under certain circumstances. I am currently working with Geoff Muldoon from IT Support at Southern Cross University, to test some aspects of the use of this software on Macintosh machines. One of the problems may be the sound card, which does not support the type of microphones supplied with the programs. With all of these products, a relatively recent and powerful computer is necessary. Probably the minimum desirable platform is a Pentium 200 with 64 Mb of RAM. It is also desirable to check with the web pages of the manufacturers, to check recommended Sound Cards and Laptop Computers. At this stage, the Project is developing very effectively, and has recovered from the delays and disruptions caused by problems with the supply of equipment, and the gap in Trevor Allan's employment as an RDLO. The next few months are very exciting, with the new product developments, and the opportunity to test some of these applications more extensively. Trevor Allan Attachment 1:HOW TO TALK TO NATURALLY SPEAKINGThe way you talk to Dragon NaturallySpeaking can have a big impact on how accurate the recognition results are. Here are some basic tips:
|
|
Page last updated: 21 September 2005 Please direct all enquiries to: Student Business Solutions Page authorised by: Registrar |
| The Australian National University — CRICOS Provider Number 00120C |