Académique Documents
Professionnel Documents
Culture Documents
Text-to-Speech
Technology-Based
Programming Tool
Text-to-Speech Technology-Based Programming Tool
ABSTRACT
Introduction
Various scales have been developed to describe the extent of vision loss and
define blindness.[1] Total blindness is the complete lack of form and visual light
perception and is clinically recorded as NLP, an abbreviation for "no light
perception."[1] Blindness is frequently used to describe severe visual
impairment with residual vision. Those described as having only light perception
have no more sight than the ability to tell light from dark and the general direction
of a light source.
In order to determine which people may need special assistance because of their
visual disabilities, various governmental jurisdictions have formulated more
complex definitions referred to as legal blindness.[2] In North America and most
of Europe, legal blindness is defined as visual acuity (vision) of 20/200 (6/60) or
less in the better eye with best correction possible. This means that a legally
blind individual would have to stand 20 feet (6.1 m) from an object to see it—
with corrective lenses—with the same degree of clarity as a normally sighted
person could from 200 feet (61 m). In many areas, people with average acuity
who nonetheless have a visual field of less than 20 degrees (the norm being 180
degrees) are also classified as being legally blind. Approximately ten percent of
those deemed legally blind, by any measure, have no vision.
Text-to-Speech Technology-Based Programming Tool
The rest have some vision, from light perception alone to relatively good
acuity. Low vision is sometimes used to describe visual acuities from 20/70 to
20/200.[3]
Blind people with undamaged eyes may still register light non-visually for the
purpose of circadian entrainment to the 24-hour light/dark cycle. Light signals for
this purpose travel through the retinohypothalamic tract, so a damaged optic
nerve beyond where the retinohypothalamic tract exits it is no hindrance
Causes
Serious visual impairment has a variety of causes:
Text-to-Speech Technology-Based Programming Tool
Diseases
5
Eye injuries, most often occurring in people under 30, are the leading cause of
monocular blindness (vision loss in one eye) throughout the United States.
Injuries and cataracts affect the eye itself, while abnormalities such as optic
nerve hypoplasia affect the nerve bundle that sends signals from the eye to the
back of the brain, which can lead to decreased visual acuity.
People with injuries to the occipital lobe of the brain can, despite having
undamaged eyes and optic nerves, still be legally or totally blind.
Genetic defects
People with albinism often have vision loss to the extent that many are legally
blind, though few of them actually cannot see. Leber's congenital amaurosis can
cause total blindness or severe sight loss from birth or early childhood.
Recent advances in mapping of the human genome have identified other genetic
causes of low vision or blindness. One such example is Bardet-Biedl syndrome.
Poisoning
8
substances formaldehyde and formic acid which in turn can cause blindness, an
array of other health complications, and death. [15] Methanol is commonly found
in methylated spirits, denatured ethyl alcohol, to avoid paying taxes on selling
ethanol intended for human consumption. Methylated spirits are sometimes used
by alcoholics as a desperate and cheap substitute for regular ethanol alcoholic
beverages.
Willful actions
comorbidities
Management
A 2008 study published in the New England Journal of Medicine [19] tested the
effect of using gene therapy to help restore the sight of patients with a rare form
of inherited blindness, known as Leber Congenital Amaurosis or LCA. Leber
Congenital Amaurosis damages the light receptors in the retina and usually
begins affecting sight in early childhood, with worsening vision until complete
blindness around the age of 30.
The study used a common cold virus to deliver a normal version of the gene
called RPE65 directly into the eyes of affected patients. Remarkably all 3 patients
aged 19, 22 and 25 responded well to the treatment and reported improved
vision following the procedure. Due to the age of the patients and the
degenerative nature of LCA the improvement of vision in gene therapy patients is
encouraging for researchers. It is hoped that gene therapy may be even more
effective in younger LCA patients who have experienced limited vision loss as
well as in other blind or partially blind individuals.
Two experimental treatments for retinal problems include a cybernetic
replacement and transplant of fetal retinal cells. [20]
Text-to-Speech Technology-Based Programming Tool
10
Mobility
11
However, techniques for cane travel can vary depending on the user and/or the
situation. Some visually impaired persons do not carry these kinds of canes,
opting instead for the shorter, lighter identification (ID) cane. Still others require a
support cane. The choice depends on the individual's vision, motivation, and
other factors.
A small number of people employ guide dogs to assist in mobility. These dogs
are trained to navigate around various obstacles, and to indicate when it
becomes necessary to go up or down a step. However, the helpfulness of guide
dogs is limited by the inability of dogs to understand complex directions. The
human half of the guide dog team does the directing, based upon skills acquired
through previous mobility training. In this sense, the handler might be likened to
an aircraft's navigator, who must know how to get from one place to another, and
the dog to the pilot, who gets them there safely.
12
13
Some people access these materials through agencies for the blind, such as
the National Library Service for the Blind and Physically Handicapped in the
United States, the National Library for the Blind or the RNIB in the United
Kingdom.
Closed-circuit televisions, equipment that enlarges and contrasts textual items,
are a more high-tech alternative to traditional magnification devices.
There are also over 100 radio reading services throughout the world that provide
people with vision impairments with readings from periodicals over the radio. The
International Association of Audio Information Services provides links to all of
these organizations.
Computers
15
On US coins, pennies and dimes, and nickels and quarters are
similar in size. The larger denominations (dimes and quarters) have ridges along
the sides (historically used to prevent the "shaving" of precious metals from the
coins), which can now be used for identification.
Epidemiology
The WHO estimates that in 2002 there were 161 million visually impaired people
in the world (about 2.6% of the total population). Of this number 124 million
(about 2%) had low vision and 37 million (about 0.6%) were blind. [22] In order of
frequency the leading causes were cataract, uncorrected refractive errors (near
sighted, far sighted, or an astigmatism), glaucoma, and age-related macular
degeneration.[23] In 1987, it was estimated that 598,000 people in the United
States met the legal definition of blindness.[24] Of this number, 58% were over the
age of 65.[24] In 1994-1995, 1.3 million Americans reported legal blindness. [25]
Text-to-Speech Technology-Based Programming Tool
16
Speech synthesis
For specific usage domains, the storage of entire words or sentences allows for
high-quality output. Alternatively, a synthesizer can incorporate a model of
the vocal tract and other human voice characteristics to create a completely
"synthetic" voice output.[2]
Text-to-Speech Technology-Based Programming Tool
The quality of a speech synthesizer is judged by its similarity to the human voice
17
and by its ability to be understood. An intelligible text-to-speech program allows
people with visual impairments orreading disabilities to listen to written works
on a home computer. Many computer operating systems have included speech
synthesizers since the early 1980s.
History
Long before electronic signal processing was invented, there were those who
tried to build machines to create human speech. Some early legends of the
existence of "speaking heads" involved Gerbert of Aurillac (d. 1003 AD), Albertus
Magnus (1198–1280), and Roger Bacon (1214–1294).
Dominant systems in the 1980s and 1990s were the MITalk system, based
largely on the work of Dennis Klatt at MIT, and the Bell Labs system; [8] the latter
was one of the first multilingual language-independent systems, making
extensive use of Natural Language Processing methods.
Early electronic speech synthesizers sounded robotic and were often barely
intelligible. The quality of synthesized speech has steadily improved, but output
from contemporary speech synthesis systems is still clearly distinguishable from
actual human speech.
Text-to-Speech Technology-Based Programming Tool
Electronic devices
The first computer-based speech synthesis systems were created in the late
1950s, and the first complete text-to-speech system was completed in 1968. In
1961, physicist John Larry Kelly, Jr and colleague Louis Gerstman[10] used
an IBM 704 computer to synthesize speech, an event among the most prominent
in the history of Bell Labs. Kelly's voice recorder synthesizer (vocoder) recreated
the song "Daisy Bell", with musical accompaniment from Max Mathews.
Coincidentally, Arthur C. Clarke was visiting his friend and colleague John Pierce
at the Bell Labs Murray Hill facility. Clarke was so impressed by the
demonstration that he used it in the climactic scene of his screenplay for his
novel 2001: A Space Odyssey,[11] where the HAL 9000 computer sings the same
song as it is being put to sleep by astronaut Dave Bowman.[12] Despite the
success of purely electronic speech synthesis, research is still being conducted
into mechanical speech synthesizers.[13]
Text-to-Speech Technology-Based Programming Tool
Synthesizer technologies
Concatenative synthesis
22
Unit selection provides the greatest naturalness, because it applies only a small
amount of digital signal processing (DSP) to the recorded speech. DSP often
makes recorded speech sound less natural, although some systems use a small
amount of signal processing at the point of concatenation to smooth the
Text-to-Speech Technology-Based Programming Tool
results in less than ideal synthesis (e.g. minor words become unclear) even when
a better choice exists in the database.[19]
Diaphone synthesis
Domain-specific synthesis
Formant synthesis
Formant synthesis does not use human speech samples at runtime. Instead, the
synthesized speech output is created using additive synthesis and an acoustic
model (physical modelling synthesis).[23] Parameters such as fundamental
frequency, voicing, and noise levels are varied over time to create a waveform of
artificial speech. This method is sometimes called rules-based synthesis;
however, many concatenative systems also have rules-based components. Many
systems based on formant synthesis technology generate artificial, robotic-
sounding speech that would never be mistaken for human speech. However,
maximum naturalness is not always the goal of a speech synthesis system, and
formant synthesis systems have advantages over concatenative systems.
Formant-synthesized speech can be reliably intelligible, even at very high
speeds, avoiding the acoustic glitches that commonly plague concatenative
systems. High-speed synthesized speech is used by the visually impaired to
quickly navigate computers using a screen reader. Formant synthesizers are
usually smaller programs than concatenative systems because they do not have
a database of speech samples. They can therefore be used in embedded
systems, where memory and microprocessor power are especially limited.
Because formant-based systems have complete control of all aspects of the
output speech, a wide variety of prosodies and intonations can be output,
Text-to-Speech Technology-Based Programming Tool
conveying not just questions and statements, but a variety of emotions and tones
26
of voice.
Articulatory synthesis
Until recently, articulatory synthesis models have not been incorporated into
commercial speech synthesis systems. A notable exception is the NeXT-based
system originally developed and marketed by Trillium Sound Research, a spin-off
company of the University of Calgary, where much of the original research was
Text-to-Speech Technology-Based Programming Tool
HMM-based synthesis
Challenges
Text normalization challenges
Text-to-Speech Technology-Based Programming Tool
There are many spellings in English which are pronounced differently based on
context. For example, "My latest project is to learn how to better project my
voice" contains two pronunciations of "project".
Recently TTS systems have begun to use HMMs (discussed above) to generate
"parts of speech" to aid in disambiguating homographs. This technique is quite
successful for many cases such as whether "read" should be pronounced as
"red" implying past tense, or as "reed" implying present tense. Typical error rates
when using HMMs in this fashion are usually below five percent. These
techniques also work well for most European languages, although access to
required training corpora is frequently difficult in these languages.
Deciding how to convert numbers is another problem that TTS systems have to
address. It is a simple programming challenge to convert a number into words (at
least in English), like "1325" becoming "one thousand three hundred twenty-five."
Text-to-Speech Technology-Based Programming Tool
However, numbers occur in many different contexts; "1325" may also be read as
29
"one three two five", "thirteen twenty-five" or "thirteen hundred and twenty five".
A TTS system can often infer how to expand a number based on surrounding
words, numbers, and punctuation, and sometimes the system provides a way to
specify the context if it is ambiguous.[29] Roman numerals can also be read
differently depending on context. For example "Henry VIII" reads as "Henry the
Eighth", while "Chapter VIII" reads as "Chapter Eight".
Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for
"inches" must be differentiated from the word "in", and the address "12 St John
St." uses the same abbreviation for both "Saint" and "Street". TTS systems with
intelligent front ends can make educated guesses about ambiguous
abbreviations, while others provide the same result in all cases, resulting in
nonsensical (and sometimes comical) outputs.
Text-to-phoneme challenges
each word is a matter of looking up each word in the dictionary and replacing the
30
spelling with the pronunciation specified in the dictionary.
31
Evaluation challenges
32
Dedicated hardware
Votrax
TMS5110A
TMS5200
Oki Semiconductor
ML22825 (ADPCM)
ML22573 (HQADPCM)
Toshiba T6721A
Philips PCF8200
33
Atari
Arguably, the first speech system integrated into an operating system was the
1400XL/1450XL personal computers designed by Atari, Inc. using the Votrax
SC01 chip in 1983. The 1400XL/1450XL computers used a Finite State Machine
to enable World English Spelling text-to-speech synthesis. [32] Unfortunately, the
1400XL/1450XL personal computers never shipped in quantity.
The Atari ST computers were sold with "stspeech.tos" on floppy disk.
Apple
34
During 10.4 (Tiger) & first releases of 10.5 (Leopard) there was only one
standard voice shipping with Mac OS X. Starting with 10.6 (Snow Leopard), the
user can choose out of a wide range list of multiple voices. VoiceOver voices
feature the taking of realistic-sounding breaths between sentences, as well as
improved clarity at high read rates over PlainTalk. Mac OS X also includessay,
a command-line based application that converts text to audible speech.
The AppleScript Standard Additions includes a say verb that allows a script to
use any of the installed voices and to control the pitch, speaking rate and
modulation of the spoken text.
AmigaOS
35
Microsoft Windows
Android
Internet
The most recent TTS development in the web browser, is the JavaScript Text to
Speech work of Yury Delendik, which ports the Flite C engine to pure JavaScript.
This allows web pages to convert text to audio using HTML5 technology. The
ability to use Yury's TTS port currently requires a custom browser build that uses
Mozilla's Audio-Data-API. However, much work is being done in the context of
the W3C to move this technology into the mainstream browser market through
Text-to-Speech Technology-Based Programming Tool
the W3C Audio Incubator Group with the involvement of The BBC and Google
36
Inc.
Currently, there are a number of applications, plugging and gadgets that can
read messages directly from an e-mail client and web pages from a web
browser or Google Toolbar such as voice which is an add-on to Firefox . Some
specialized software can narrate RSS-feeds. On one hand, online RSS-narrators
simplify information delivery by allowing users to listen to their favorite news
sources and to convert them to podcasts. On the other hand, on-line RSS-
readers are available on almost any PC connected to the Internet. Users can
download generated audio files to portable devices, e.g. with a help
of podcast receiver, and listen to them while walking, jogging or commuting to
work.
A growing field in internet based TTS is web-based assistive technology, e.g.
'Browsealoud' from a UK company and Read speaker. It can deliver TTS
functionality to anyone (for reasons of accessibility, convenience, entertainment
or information) with access to a web browser. The non-
profit project Pediaphon was created in 2006 to provide a similar web-based TTS
interface to the Wikipedia.[36]Additionally SPEAK.TO.ME from Oxford Information
Laboratories is capable of delivering text to speech through any browser without
the need to download any special applications, and includes smart delivery
technology to ensure only what is seen is spoken and the content is logically
pathed.
]Others
37
38
Applications
39
Software such as Vocaloid can generate singing voices via lyrics and melody.
This is also the aim of the Singing Computer project (which uses GNU
LilyPond and Festival) to help blind people check their lyric input.[41]
Next to these applications is the use of text to speech software also popular
in Interactive Voice Response systems, often in combination with speech
recognition. Examples of such voices can be found
at speechsynthesissoftware.com or Nextup.
Text-to-Speech Technology-Based Programming Tool
40
41
Visual Studio includes a code editor supporting IntelliSense as well as code
refactoring. The integrated debugger works both as a source-level debugger and
a machine-level debugger. Other built-in tools include a forms designer for
building GUI applications, web designer, classdesigner, and database
schema designer. It accepts plug-ins that enhance the functionality at almost
every level—including adding support forsource-control systems
(like Subversion and Visual SourceSafe) and adding new toolsets like editors and
visual designers for domain-specific languages or toolsets for other aspects of
the software development lifecycle (like the Team Foundation Server client:
Team Explorer).
42
43
Introduction
However, the costs for these tools are high and there is no tool that integrates
the environment for compiling and debugging programs. Furthermore, there is
not enough assistance for helping them learn to program in the leading edge
language C#. Blind programmers could compete in the IT industry when
infrastructure suited mainframes more [6]. These days, with all of computers in
the workplace, graphical windows applications are far more common. This
means that blind programmers are now at a competitive disadvantage in the
workplace and require special tools to be productive.
Blind and vision impaired people require two things to become programmers.
They need up to date knowledge of leading technology, and tools that meet
Text-to-Speech Technology-Based Programming Tool
their own requirements [7]. This affects employment levels for blind and low
44
vision people. With the current unemployment rate for blind and vision impaired
at almost 70%, which is over four times the national average, specialized tools
could help a great deal of people [8]. Our research project is to design an audio
programming tool that meets specific needs of blind and vision impaired people
in learning C# programming language.
There are different forms of visual impairment, some people are blind from birth
or from a very early age, others lose their sight as a result of accidents, disease
or some affects of medication [10]. Therefore we concentrate on text-to-speech
technology and we assume that blind and vision impaired people are not hearing
impaired. The text-to-speech technology is used to make all components in the
programming tool voice enabled. Text and other graphics features such as
control size, location, and color that a normal vision user can see on the screen
will be spoken out by a speech synthesizer.
This tool has opened a great possibility that allows blind and vision impaired
users to become programmers in the future. Currently, blind and vision impaired
people have little access to current tools and assistance required for them to
learn programming languages. Our aim is to help them achieve equality of
access and opportunity in information technology education that will ensure
meaningful and equitable employment for their lives.
We have invited blind and vision impaired people to evaluate our programming
tool. Evaluations have shown that the tool can help them design and implement
Text-to-Speech Technology-Based Programming Tool
programs effectively. Our research project can potentially impact the lives of blind
45
and low vision people. This coupled with the impending labor shortage, as the
baby boomers retire, means that anything that can give blind people an
opportunity to acquire practical, technical qualifications could greatly benefit blind
people and the whole economy. A tool that teaches programming is also a
programming tool and it can potentially give jobs to people who were previously
unemployable. Our research project will also impact software development
companies, governments, and educational institutions to develop software
packages, educational programs and policies that meet the needs of blind and
vision impaired people.
Those are converted from text to speech and read aloud. Kurzweil system scan
documents, store in files, and convert those to audio output [13]. Furthermore,
Optical Braille Recognition (OBR) allows a user to scan a Braille page and
convert it in to text [14]. This is a Windows software application to retrieve
Text-to-Speech Technology-Based Programming Tool
information that can be presented as the text used in all types of Windows
46
applications. The Braille information in a small letter can be retrieved into
computer form in the same easy way. For reading text materials in computer, the
most popular software for blind users is JAWS [3].
This software provides speech and Braille access to Windows operating system
and applications including Internet Explorer without the need of special
configurations. JAWS also provides a way to access Web pages. A research
project has been undertaken by Curtin University, Cisco Systems and the
Western Australia Association [10].
The project is to identify tools and techniques appropriate for vision impaired
students to study computer at tertiary level. This project recommends
improvements included the need for professional development for lecturers and
improved student access to electronic educational materials.
This project aims to create a generation of blind computer users at different level
nationwide, and to provide a community place to acquire computer skills and
share information. However, there is no existing software application designed to
help blind and vision impaired people learn programming subjects in information
Text-to-Speech Technology-Based Programming Tool
It is seen that the more formats of material people can access, the higher their
employment opportunities are. There is a higher need for technical skills amongst
people who are blind or have low vision. Blind people require supporting tools
that meet their specific needs. The programming tool is designed not only for
blind users but also for vision impaired and normal vision users. The interface
should be designed in a way that complies W3C standards for vision impaired
users andshould be user friendly. The programming tool should be able to help a
blind user edit, save, compile, debug and run a program. Moreover, the tool
should have program templates and intellisense (auto-completion) options for
user convenience. In order to achieve these objectives, an iterative approach
was used. Each part was developed, tested then improved upon and tested
again.
This meant that usability issues were always found and improved. The tool has
been designed to provide voice for blind users and display suitable font,
font sizes and color scheme for vision impaired and normal vision people.
48
A user starts editing a program or loading an existing program using audio code
editor. The program on the editor can be saved to a file or can be compiled,
debugged and run. For each character entered, the code editor can speak it out.
The user can use left, right, up and down arrow key to check any character in the
program by voice. Some of key requirements for the code editor are as follows:
• Ask the user’s confirmation before it is closed; saving a file or opening a file.
• An option for the user to specify a line number and go to that line.
• For Windows Applications, the user will design the graphical user interface by
typing details (size, location, text, name, etc.) on the code editor. The code editor
will convert details to C# code and place the code to a file.
• Help the user write code quickly and correctly by speaking out properties,
49
classes, etc
The code compiler uses the C# software development toolkit (SDK) to compile
the program. However, to have voice output, we add code for voice accordingly
to the current program using a code modifier then use the C# SDK to compile the
modified program. For Console application, adding code for voice can be
performed by identifying code for text output then add code for voice accordingly.
For Windows program, adding code for voice is more complex. Mouse and key
event handlers will be added for the user to use mouse or keyboard to design a
Windows form. Voice will be output when a control on the Windows form is
focused to let the user know what the control is. The compiler also lets the user
know if the compilation is successful or if there is a compiling error.
When there is a compiling error it then tells the use that there are compiling
errors then reads out all the errors details, with the file name and line number. If
the user presses predefined shortcut keys, it stops reading, jumps to that line in
that file and reads that line to the user. The user can now fix the code and
presses the combination key to hear the next error if any.
The code compiler uses the C# software development toolkit (SDK) to compile
the program. However, to have voice output, we add code for voice accordingly
to the program before it is compiled. This is done for any program that provides
non-graphics or graphics output. Mouse or key event handlers will be added to
Text-to-Speech Technology-Based Programming Tool
provide audio output when the user moves the mouse over a control or presses
50
the Tab key to focus on that control.
When the user finishes the program and wants to compile and run it, the compiler
will analyze the program and add code to produce voice accordingly. The
modified program will be compiled and debugged. Errors if any will be output to a
file and the speech SDK will read out an error at a time and guide the user to the
line of code that contains the error in the program. This procedure will be
repeated until there is no error in the program and the C# SDK will run it. Voice
and text or graphics will be output and the user can use mouse or shortcut keys
to check the outputs.
Text-to-Speech Technology-Based Programming Tool
51
Text-to-Speech Technology-Based Programming Tool
52
It is noted that if the blind user save the project to files and run it in the normal
Visual Studio.NET, the output will be text or graphics only. Voice output is only
available if the user runs the project in the audio Studio.NET.
The proposed audio programming tool has been tested and evaluated by normal
vision users then by blind and vision impaired users. In the first test, normal
vision users were required not watching the computer monitor when they tested
the programming tool. It was observed that they were able to do all stages in
writing a program by listening to voices output from the tool. In the second
test, standard keyboards and built-in text-to-speech tools were used. We found
that vision impaired and blind users were also able to perform the same task.
However, vision impaired users were interested in applications with mouse and
blind users prefer those with keyboard. Most of blind and vision impaired people
are familiar with shortcut keys defined in JAWS, so adding new shortcut keys in
the programming tool is not recommended. Shortcut keys have been changed to
meet their specific needs. More programming lessons need to be provided to
help users be familiar with programming in .NET.
Text-to-Speech Technology-Based Programming Tool
53
5 Conclusion
The tool has opened a great possibility that allows blind and vision impaired
users to become programmers in the future and to achieve equality of access
and opportunity in information technology education that will ensure meaningful
and equitable employment for their lives.
Text-to-Speech Technology-Based Programming Tool
References:
54
55
5-
elkes.pdf?key1=964173&key2=4640659711&coll=
GUIDE&dl=GUIDE&CFID=22945606&CFTOKE
N=95515984.
http://www.abledata.com/abledata.cfm?pageid=193
56
27&ksectionid=19327&top=13293