Skip Navigation | ANU Home | Search ANU | Students | Staff
The Australian National University
Division of Registrar & Student Services
Disability Services Unit
Printer Friendly Version of this Document
AT Project Logo and Link to Index page

Progress Report 2 on Tertiary Education Applications of Voice Recognition Software for Students with Disabilities

The voice recognition research project has made considerable progress since the last report. Most of the DLOs in NSW have had the opportunity to attend a demonstration of voice recognition software, find out about hardware and software requirements, discuss options for use by themselves and the students and to establish a line of communication for further information, discussion and clarification. Along with the DLOs, many others, such as students, TAFE and school education staff (both government and private) and members of community have also taken the opportunity to investigate this technology further. Although the project was scheduled to run for 12 months, current developments in the field, feedback from staff and students and an apparent continuing need for this type of information would indicate the desirability of continuing this project on a longer-term basis. Consequently, while I am able to continue as an RDLO, I am happy to continue with the present project.

After the demonstrations, and as the project became more widely known, many people have taken the opportunity to contact me in person or by phone or email to seek further advice, information and assistance. On most occasions, I have been able to provide the advice and assistance required. Since a brief presentation at the Pathways Conference in Perth in December, I have received inquiries from all over Australia. Most people have been appreciative of the opportunity to obtain independent information and advice about this rapidly developing technology. From comments and feedback received, this aspect of the project appears to have been very valuable and appreciated. On some occasions I have also been able to provide direct personal support to individuals and institutions who are implementing this technology.

Developments

Developments in the field of speech recognition technology have been rapid and extensive. New programs, multiple versions of programs, new hardware needs, microphone developments, specialized applications, the integration with other technology and computer programs and upgrades to existing software have all contributed to an involved, extensive, complicated and demanding examination of this technology. Simply keeping track of the information available from the Voice Recognition Users Group involves working through an average of 60 to 70 email messages per day. Many of these messages involved follow-up Internet searches, reading and the tracking down and ordering of software and accessories. In the past year or so, we have gone from the situation where there was only one version of one continuous speech recognition program, to the present situation where we have three versions of three different programs, two versions of another, the specialized applications such as medical, legal, mathematical and science programs and a number of updates between versions. Speech recognition has become an integral part of the future plans of most of the major players in the computer industry. Corel has taken over Dragon, Microsoft has a substantial stake in Learnout & Hauspie, IBM has its own programs, IMSI has recently released its own programs using Dragon's speech engine and many minor players are developing niche market applications such as the Metroplex maths and science programs mentioned later in the report. The estimated world market for voice products in the year 2000 is 12 billion U.S. dollars annually. Following these developments in the technology and trying to remain well informed about the current situation with the technology has involved substantial efforts.

As the software has developed and improved, so have the demands on hardware. When this project began, the minimum requirement for a computer to run voice recognition software was a 133 MHz Pentium with 32 MB RAM. Now, the minimum requirement for most programs is a 166 MHz Pentium with 64 MB RAM, and most would recommend a 300 MHz plus processor with 128 MB RAM. Currently I am able to run the latest programs on the laptop, but the demands of the software have resulted in a significant slowing of the computer's operation. We have probably reached the limits of the computer's capacity to handle extra hardware demands. To retain the capacity of the laptop to demonstrate voice recognition, future versions and developments would probably need to be installed and tested on other computers.

The good news with these developments is that relative prices of software have fallen dramatically over the period of the project.

Major Software Currently Available

Dragon NaturallySpeaking Version 3.0 (Personal, Preferred, Professional and Naturally Mobile) Web site: http://www.dragonsys.com/

Learnout and Hauspie Voice Xpress (Standard, Advanced & Professional) Web site: http://voicerecognition.com/1998/products/lernout_hauspie/voicexpressplus.html

IBM ViaVoice 98 (Standard, Home and Executive) Web site: http://www.compu-media.com/ibm.htm

IMSI Voice Direct (Personal and Executive)

Recommended Programs

While not having the opportunity as yet to examine the IMSI products, we now have the top end versions of the other three products. Dragon NaturallySpeaking Professional arrived some months ago, Voice Xpress Professional arrived just after Christmas and ViaVoice 98 Executive arrived this week. Apparently, Dragon is about to release NaturallySpeaking Version 3.5 in the United States. Although we have not had much time to evaluate the latest versions of Voice Xpress and ViaVoice, initial impressions tend to confirm the results of overseas testing, which indicates that although the IBM and Learnout Hauspie products have some features and advantages over the Dragon product, such as flexible commands and better integration with other software, NaturallySpeaking scores a much better result for accuracy. It is the more accurate recognition, effective correction and training features, ease of use, ability to work with DragonDictate for full voice control and widespread acceptance of Dragon NaturallySpeaking as the leading edge of this technology which wins its nomination as the recommended software for tertiary education use. For further information about the comparative performance of these three products, please refer to the PC Magazine article at http://www.zdnet.com/pcmag/features/speech98/index.html

If users do not require complete voice control of the computer, then probably the best features per dollar version of the program is NaturallySpeaking Preferred. If full voice control of computer is required, then it is necessary to use the Professional Version, or to obtain Dragon Dictate as well, which may in some cases, be the cheaper option. Dragon Naturally Mobile is NaturallySpeaking Preferred, bundled with a digital hand-held recorder to enable dictation directly into the recorder, and then the transcription of the recorded material directly into the computer via a supplied connection. At the moment we are negotiating with the suppliers to obtain the recorder separately from the Naturally Mobile package, since we already have the mobile features as part of the NaturallySpeaking Professional version.

As yet, there are no Apple Macintosh versions of Continuous Speech programs, and Dragon has discontinued Power Secretary, the Apple version of Dragon Dictate. However, IBM has expressed some interest in adapting its programs for the Macintosh, Apple itself is planning on upgrading its PlainTalk voice navigation system, and the former developers of Power Secretary have formed a company called MacSpeech, which is working on developing a speech recognition program for the Macintosh.

Whichever way Speech recognition for the Macintosh platform goes - Apple, IBM, MacSpeech or another developer, it is hard to see any Macintosh Speech Recognition products on the market in under 12 months, although stranger things have happened. For further information on Mac Speech developments, see articles at: http://macworld.zdnet.com/pages/november.98/Column.451/.html Or: http://www.zdnet.com/zdnn/stories/news/0,4586,2185230,00.html

Hardware Requirements

While the packages advertise a minimum system of a Pentium 166 MHz processor with 48 MB RAM, the reality is, while programs will work with this system, some features would not be available and the program would work very slowly. The programs also require between 130 and 250 MB hard disk space, plus between 10 and 20 MB of hard disk storage for each user who develops a voice file. To take advantage of all the features of the programs and have a program which operates reasonably quickly and effectively, a recommended computer would be at least a 300 MHz Pentium II with 64 MB (but preferably 128 MB) RAM.

I have successfully run these new programs on a 233 MHz Pentium laptop with 98 MB of RAM and on a 266 MHz Pentium desktop with 64 MB of RAM, but they tend to be a little slow. Also, I have not installed the Best Match technology section of NaturallySpeaking, which improves the accuracy, but demands greater processor speed. A good quality Sound Blaster 16 or compatible sound card or better is required, and it is desirable to check the compatibility of sound cards in particular laptops to ensure compatibility, although it is now possible to improve recognition accuracy and performance by using microphone adapters such as the VXI Parrot Translator (see Accessories Section). However, it is better to maximize compatibility if possible.

Accessories

Microphones

All programs come supplied with their own headset microphone. These microphones vary in quality and features, and usually, the more expensive the program the better the microphone. Generally, the supplied microphones work quite well for the average user. Sometimes, to improve recognition accuracy and performance, or to meet particular requirements a person may have, purchasing another specialised microphone may be desirable. I will not attempt a detailed analysis of the available microphones here, but concentrate on the microphone we have purchased and successfully tested.

VXI Parrot Translator 10/3

Having had problems with the compatibility of the supplied microphones and the sound system of the Compaq laptop purchased for this project, I was interested to find out about a microphone from the VXI Corp. which featured the Parrot Translator, and electronic device which is designed to adjust the output characteristics of the microphone to match the input characteristics of the sound card. After much tracking down, negotiation and waiting, I obtained a VXI Parrot 10/3 Microphone with Translator from Auscript, the importers of Dragon products. This microphone produced about a 30% better recognition than the standard microphones. On the desktop computer, which already had good recognition, a lesser, but still significant improvement was noted. If sound card compatibility is a problem, then this microphone may be one solution. Auscript is now able to supply these microphones plus bios boxes for older microphones.

For further information about VXi Microphones, including pricing, contact David Horwitz at Auscript --Phone (02) 9238 6575 email: dhorwitz@auscript.com.au or visit the Vxi website at: http://www.vxicorp.com

Integration with Other Assistive Technology

We have tested Dragon NaturallySpeaking with two other Assistive Technology programs. As Trevor Wilks reports in his section of this report, NaturallySpeaking works well with ZoomText, the Screen Enlarger program for people with vision impairments, with no apparent clashes or problems in implementation

We have also used NaturallySpeaking in conjunction with JAWS 3.0, a Screen Reading program. JAWS works very well with all aspects of NaturallySpeaking, except in the General Training Section, where it will not read the text on the screen to be read into the computer to develop an initial voice file. This is an essential stage in being able to use Voice recognition. Although NaturallySpeaking does not allow JAWS to read the text in the Training section, this can be overcome by having someone read the text to the person doing the training, then that person repeating the text into the microphone, which is then registered as part of the training. The microphone does not pick up the initial reading, and will wait for the text to be repeated into the microphone. I have tested this with a student who was colour blind to the blue text which initially appears on the screen, before it is registered, and the process works quite effectively. It would be better if JAWS worked directly in the Training section, but at least this is a method which can provide students with vision or reading difficulties with access to the technology, with only a little initial outside support.

Maths and Science Voice Programs

With a client who needed to use Voice Recognition for Maths & Computer Aided Design, Trevor Wilks, Manager of the Adtech Centre at the University of Newcastle, tracked down some specialised Mathematics & CAD programs produced by Metroplex in the United States, and we have arranged to obtain evaluation copies of the programs. Those programs have now arrived and we are in the process of evaluating them. The programs are:

Products available from Metroplex Voice Products

Web address: http://www.metroplexvoice.com/prod.htm
  • (DD - uses Dragon Dictate, NSD - uses Naturally Speaking Deluxe)
  • ArithmeticTalk© - allows one to do addition, subtraction, division, and multiplication by voice. Designed for grades 1-5. This program was developed when educators conveyed to us the real need for a voiced arithmetic program. A readback version is available. (DD)
  • See how ArithmeticTalk works: http://www.metroplexvoice.com/atalk.htm
  • MathTalk© - the only voiced math program in the world that allows one to voice mathematics from grade 6 through Ph.D and professional levels. Over 2,400 math commands make MathTalk a powerful and versatile voiced math program. It works in tandem with word processors MSWordTM and WordPerfectTM and toggles to MathTypeTM for technical math processing. (DD)
  • See how Mathtalk works: http://www.metroplexvoice.com/mtalk.htm
  • MathTalk All Versions© is the same as MathTalk plus ArithmeticTalk. (DD)
  • MathTalk Deluxe© is the same as MathTalk and works with both Dragon Dictate and Naturally Speaking Deluxe. (DD, NSD)
  • MathBrailleTalk© - allows teachers/transcribers of the visually impaired to voice mathematics using the new Duxbury Braille TranslatorTM. The voiced text and mathematics emboss and/or print in Braille, thus saving time and hundreds of keystrokes per lesson, as well as, relieving the teacher of remembering all of those keystrokes. (DD)
  • See how MathBraille Talk works: http://www.metroplexvoice.com/braille.htm
  • VoiceEZcalc© - operates the Windows 95TM calculator by voice for use with all of the programs. A readback version is available. (DD)
  • VoiceEZsci© - runs Scientific NotebookTM by voice. All levels math software program with graphing capabilities. There is a Nemeth Braille filter that prints math/text into Braille. A readback version is being developed. For more information, link to scinotebook.tcisoft.com or www.nmsu.edu/~mavis. (DD)
  • See how VoiceEZsci works: http://www.metroplexvoice.com/sci.htm
  • VoiceEZcad© - operates AutoCADTM by voice. Works with version 13 or 14. AutoCAD lt.97 is also available. Forget the keyboard, the mouse and remembering the keystrokes; just (DD) See how VoiceEZCad works: http://www.metroplexvoice.com/cad.htm

Once the process of evaluation is under way, and we have more information about the effectiveness & features of the programs, we will post the information on the relevant List Servers. These products are not yet available in Australia.

Electronic Speech Enhancer

The Electronic Speech Enhancer (Web site: http://www.speechenhancer.com/ESEAdln1.html) is a small electronic device which a person can either wear around their waist or strap to a wheelchair to process, enhance, clarify and amplify their speech. The system uses a Proximity Microphone to pick up and amplify the signal closest to it, not the loudest signal. This means that when a person speaks into the Speech Enhancer in a noisy environment, it amplifies their voice, and not the background noise. It is designed to be used by people with a wide range of speech difficulties, and the speech produced is a clarified version of their own voices. I have been following the development of this technology since late 1996, and until very recently, was unable to see one in operation, since they were not available in Australia.

Late last year, I became aware that there was now a Speech Enhancer and a trained evaluator and operator in Australia, at the Spastic Centre in Sydney. I made arrangements with Colin Slattery, the Spastic Centre Technical Officer, to come to Newcastle to evaluate two people with Cerebral Palsy on the machine, and to make some preliminary assessment on its effectiveness. The evaluation and demonstration was conducted on the 2nd February at the Adtech Centre at the University of Newcastle, and the results were most impressive.

Both people tested showed a marked improvement in clarity and intelligibility in their speech, and the proximity microphone worked very effectively. The unit was tested for its noise filtering abilities by whispering into the microphone beside a noise generator developing 75 Db of noise. The whispered voice was able to be heard clearly, with no apparent amplification of the background noise. We also tested its real life capabilities by one of the people using it in the University Union Coffee Lounge. Once again, its ability to cancel background noise was impressive.

Both people showed a marked improvement in the clarity of their speech. So much so, that one of the people with quite severe disarthria, whom I have known for about 6 years, was able to conduct her first fluent conversation with me, with few repetitions and misunderstandings, for the first time since I met her. According to the manufacturers, most people who will benefit from this technology can expect about a 10 to 15% improvement in the intelligibility of their speech. This was confirmed by our testing. This can be sufficient for people to communicate effectively in person, on the telephone or to use Voice Recognition technology.

This technology is not for everyone with speech difficulties, and is not a "magic bullet" to solve all speech problems. A person has to learn to use the technology effectively, adjust their speaking style and to allow the machine to assume some of the roles they previously used their voices for. For example, when one of the people being evaluated raised her voice to overcome a noisy environment, the recognition dropped off markedly.

The production models, to become available later this year, will have an output to plug directly into a computer, to use with Voice recognition. The unit is currently undergoing evaluation and government certification. It will not be cheap (about $8,000 to $10,000) but may become a very liberating and effective tool for many people. The two people we tested are very enthusiastic about the technology, and are very keen to obtain their own machines when they become available.

University Of Newcastle Adtech Centre

Included below is a report from Trevor Wilks, Manager of the University of Newcastle Adaptive Technology Centre on their involvement and experiences with Voice Recognition software:

VOICE RECOGNITION RESEARCH PROJECT UPDATE

University of Newcastle Adaptive Technology Centre

The Centre has adopted "Dragon Naturally Speaking" as its preferred voice recognition package. We have tested the competing products but have concluded that Dragon is currently the superior software for use by tertiary students with a disability. Some of the reasons for this include:

Correction Method

Dragons method of correcting errors via the "correction dialogue box" is very disability friendly This is a leftover from the Dragon Dictate platform and gives the user a choice list of possible replacement words and phrases . This reduces the amount of typing or voiced spelling a user must perform to make a correction. As the recognition increases the probability of the correct word or phrase being one of the choices in the correction box dramatically increases. The user then only needs to "choose" the right word and the correction is made as well as making an adjustment to the users voice file.

Affordability

Students can purchase Dragon Naturally Speaking Ver 3.0 Standard edition through the University Software Sales office for $197.00. This is a special price for students and staff of Newcastle Uni.

Recognition Accuracy

Naturally Speaking still seems to have the edge on both of the competing products (ViaVoice and VoiceXpress). Dragon's "Best Match" technology has definitely increased its accuracy if installed on a PC with the appropriate specifications.

Voice Enrolment Process

Dragon's use of a literature based voice enrolment process is a lot more user friendly than its competitors. It is believed that it also provides a more realistic voice profile for continuous voice recognition.

Appropriateness of VR software for Specific types of Disabilities.

Voice recognition software has dramatically increased the education options for people with certain types of disability. It is a powerful tool and when combined with other support services can facilitate a positive academic outcome for many students. It is though, like most types of assistive software and hardware, context and person specific. Some students, regardless of the time spent in training and support, never get used to the idea of speaking to a computer and specifically verbalising, as opposed to writing in whatever form, their academic output. So for some students it is the door to a whole new world and for others it's just another piece of over-rated technology which they failed to master.

The challenge facing the Adaptive Technologist is to assess the context, the student's requirements and also their level of motivation before committing to the training and support process. If these three factors are not considered, then a lot of time and energy can be wasted both for the student and the trainer without a positive outcome.

Alternatives to voice recognition, such as screen-reading software or TEXTHELP software should be considered in many instances to increase a person's keyboard skills where possible. Some students demand to use only VR software rather than other forms of assistive technology as a way of avoiding dealing with the keyboard and their lack of typing skills. That is to say that there are many ways to get to the same point and voice recognition is only one of them.

Physical Disability

Keyboard Impairment

A number of students with a physical disability who have difficulty using a computer via a keyboard and mouse have benefited greatly from using VR software. The types of disability include Arthritis, RSI, injury to the hands or arms and quadreplegia.

Back and Neck Injury

A number of students with a neck or back injury have purchased VR software for use on their home PC's after a demonstration at the Centre. Using VR software alleviates some of the ergonomic problems associated with using a computer. They are able to maintain a relaxed, correct posture and in some cases lie on the floor whilst dictating to their PC.

Vision Impairment

The Centre has demonstrated VR software for a number of vision impaired students. Dragon Naturally Speaking has been used successfully with Zoomtext Extra screen magnification software using 2´ magnification. Using the two types of adaptive software may assist some VI students with the text entry process. This strategy has yet to be implemented but will be re-visited during 1999.

Learning Disabilities

Students with a learning disability currently constitute close to 20% of students registered with the Uni of Newcastle's disability service. Some of these students have recently begun using Voice Recognition Software (Naturally Speaking) to address some of the difficulties they experience producing print material using conventional methods. Many of these students have developed strong verbal communication skills and as such VR software offers them an effective strategy in the Tertiary environment. Once a student with a learning disability, such as dislexia, is shown that VR software does work and enters the training phase they experience a new sense of empowerment and control as well as an increase in confidence and self esteem. This can be a great platform from which to launch an attack on academia.

Trevor Wilks

Manager, Adtech Centre, University of Newcastle

CONCLUSION

This project has created a lot of interest, and from feedback received so far, has met a real need within the university disability support field in particular. The nature of the project involves ongoing monitoring of the technology. The extensive, complex and specialised nature of Assistive Technology information requires specialised skills, and the shortage of specialist Assistive Technology staff within the university sector means that this type of information is invaluable in making decisions about what technology to purchase, and which students may benefit from its use. I believe that we have established a very effective model for future technology support mechanisms within the university sector. This does not replace individual universities' obligations to supply appropriate staffing in this field, but is a supplementary mechanism which can enhance the appropriate use of technology to assist students with disabilities to achieve a higher level of independence and access to higher education.

Voice Recognition Technology is in the early stages of its very rapid development, and I believe that we will see even more developments in the near future. Even though this technology currently has its limitations, many students with disabilities are able to benefit significantly from the technology at the moment. Future developments should see even greater advantages and efficiencies associated with its incorporation into education. Voice Recognition Report 2.