Vous êtes sur la page 1sur 27

INDEX

1 Introduction 1.1 History 2. Speech Recognition


2.1 Performance of speech recognition systems 2.2 Hidden Markov model (HMM)-based speech recogni ion 2." D#namic ime $arping (D%&)-based speech recogni ion

1
2

4
5 ! '

3 Speech Understanding
( Text Generation ) Speech Syntheses * Language Resources !. Applications

11 13 14 15 3 4

!. "onclusion
'. Re#erences

1. In rod+c ion
In#or$ation Technology deals %ith the ac&uisition' organi(ation' storage' processing' trans$ission and deli)ery o# in#or$ation. *u$an +eings collect )arious type o# data %ith the intention o# extraction in#or$ation rele)ant to decision $a,ing. A large part o# data processing is conducted using co$puters than,s to their enor$ous capa+ility #or nu$erical co$putation. *o%e)er' co$puters e)en today play the role o# an assistant in decision $a,ing rather than the role o# a decision $a,er' and rightly so. They #ull this role +y presenting the in#or$ation and ,no%ledge gleaned #ro$ data processing to the hu$ans in a #or$ %hich is easily interpreta+le +y the hu$an +eings. -uite o#ten' people issue co$$and to the co$puter to rune the in#or$ation #ollo%ing so$e $ethodology %hich is dyna$ically deter$ined depending on the pro+le$ at hand. Thus' the hu$an decision $a,ing process %ith the help o# co$puters in)ol)es a dialogue +et%een $an and $achine.

"o$$unication a$ong hu$an +eings is inherently $ulti.$odal' )isual and aural $odes +eing the pri$ary $odes. "urrently' the principal $eans o# hu$an $achine co$$unication is hea)ily +iased to%ards the con)enience o# the $achine rather than that o# $an. /ouse and ,ey+oard are pri$ary input de)ices and )isual display unit is the pri$ary output de)ice. Usage o# such inter#aces re&uires special s,ills and $ental attitude %hich $any people are not endo%ed %ith. This $achine.centric $ode o# co$$unication needs to +e changed in #a)or o# hu$an.centric inter#aces so that the +eneath o# the po%er o# co$puters is shared +y all people. 0hile )isual $ode is $ost e##ecti)e in capturing In#or$ation' speech re$ains the pre#erred and $ost con)enient $eans o# con)eying in#or$ation. The ad)antage o# 1and the co$pelling reason #or2 )er+al
2

co$$unication has +eco$e e)en $ore stronger today due to con)ergence o# co$puters and teleco$$unication syste$s %hich allo%s people to access in#or$ation on co$puters located re$otely. The )er+al co$$unication in)ol)es natural language' and this +rings to #ore the role o# linguistics in the in#or$ation technology. 3ro$ the a+o)e discussion' it is clear that hu$an.centric inter#ace to co$puter is the they share in#or$ation' thoughts and ideas artlessly a$ong the$sel)es. 3acilitating hu$an $achine interaction using natural language in)ol)es se)eral #acets o# hu$an language technology4 speech co$pression' recognition and understanding o# speech and script' $achine translation' text generation' synthesis o# speech and cursi)e script. 5oth #or$s o# language.spo,en and %ritten.are use#ul #or interaction %ith $achine. *ere' %e connect oursel)es to the spo,en language and discuss the role o# linguistic ,no%ledge in de)eloping speech inter#aces. The rele)ance o# linguistics in speech recognition' speech understanding and speech synthesis %ill +e dealt %ith in the #ollo%ing sections.

1.1 History
The #irst speech recogni(er appeared in 165 and consisted o# a de)ice #or the recognition o# single spo,en digits another early de)ice %as the I5/ Shoe+ox' exhi+ited at the 1674 8e% 9or, 0orld:s 3air. ;ne o# the $ost nota+le do$ains #or the co$$ercial application o# speech recognition in the United States has +een health care and in particular the %or, o# the $edical transcriptionist 1/T2 According to industry experts' at its inception' speech recognition 1SR2 %as sold as a %ay to co$pletely eli$inate transcription rather than $a,e the transcription process $ore e##icient' hence it %as not accepted. It %as also the case that SR at that ti$e %as o#ten technically de#icient. Additionally' to +e used e##ecti)ely' it
3

re&uired changes to the %ays physicians %or,ed and docu$ented clinical encounters' %hich $any i# not all %ere reluctant to do. The +iggest li$itation to speech recognition auto$ating transcription' ho%e)er' is seen as the so#t%are. The nature o# narrati)e dictation is highly interpreti)e and o#ten re&uires <udg$ent that $ay +e pro)ided +y a real hu$an +ut not yet +y an auto$ated syste$. Another li$itation has +een the extensi)e a$ount o# ti$e re&uired +y the user and=or syste$ pro)ider to train the so#t%are. A distinction in ASR is o#ten $ade +et%een >arti#icial syntax syste$s> %hich are usually do$ain.speci#ic and >natural language processing> %hich is usually language.speci#ic. ?ach o# these types o# application presents its o%n particular goals and challenges.

2. ,peech -ecogni ion


4

Speech recognition' the process o# translating a speech signal into a se&uence o# %ords is at the heart o# speech input de)ices. Although tre$endous progress has +een $ade in the area o# speech recognition 1SR2 technology' $ost o# it has co$e #ro$ ad)ances in $odeling speech sounds and their innocence on sounds in the i$$ediate )icinity' and not Re$e$+er the pro)er+ A picture is %orth $ore than thousand %ords> I$agine an atte$pt to con)ey so$ething to a person outside a glass %all using only gestures 1%ithout the +eneath o# speech2 #ro$ ade&uate $odeling o# natural language. The gra$$ar is nor$ally $odeled in ter$s o# statistical roperties o# language not +ecause engineers pre#er statistical gra$$ar +ut +ecause there is no +etter %or,ing alternati)e in the #or$ o# language $odels %ith a strong #oundation in #or$al linguistics. @hrase.structure gra$$ars' #or exa$ple' co$prise o# se)eral hundreds or thousands o# rules descri+ing de#erent phrase types. ?ach o# these rules is annotated +y #eatures and so$eti$es also +y expressions in a progra$$ing language. 0hen such gra$$ars reach a certain si(e they +eco$e di##icult to $aintain' to extend and to reuse. The resulting syste$s $ight +e su##iciently enceinte #or so$e applications +ut they lac, the speed o# processing needed #or interacti)e syste$s 1such as applications in)ol)ing spo,en input2 or syste$s that ha)e to process large )olu$es o# texts 1as in $achine translation2. "ontext.#ree gra$$ars and their pro+a+ilistic )ersions ha)e +een tried and their success in $odeling unseen data has +een only partial. ?sti$ation o# 8.gra$ pro+a+ilities' the $ost popular statistical language $odel' has re$ained a sparse esti$ation pro+le$ despite the usage o# a )ery large corpus rele)ant to the tas, do$ain. 3or exa$ple' a#ter o+ser)ing all trigra$s 1i.e.' consecuti)e triplets2 in 3! $illion %ords: %orth o# ne%spaper articles' a #ull one third o# trigra$s in ne% articles #ro$ the sa$e source are no)el.
5

/oreo)er' current language $odels are extre$ely sensiti)e to changes in the style' topic or genre o# the text on %hich they are trained. A statistical language $odel trained %ith ne%s%ire text #ro$ one co$pany %ill see its perplexity 1the geo$etric a)erage +ranching #actor o# the language according to the $ode2 dou+led %hen applied to ne%s o# the sa$e ti$e period #ro$ a si$ilar agencyA The inade&uacy o# language $odeling is e)ident in the per#or$ance o# speech recognition syste$s in co$petiti)e BAR@A e)aluations. In the $ost recent test o# SR syste$s %ith noisy telephone speech' the +est SR syste$ sho%ed only 7 C %ord accuracy. /ost SR syste$s expect the user to spea, gra$$atically correct sentences. This puts a lot o# load on users to #or$ulate such syntactically correct sentence %ith no out.o#.)oca+ulary %ords' prior to spea,ing to the co$puter. A user. #riendly speech input syste$ should +e a+le to handle speech decencies and exile gra$$ar. This is %here co$putational linguists can play a crucial role.

2.1 Performance of speech recognition systems


The per#or$ance o# speech recognition syste$s is usually speci#ied in ter$s o# accuracy and speed. Accuracy $ay +e $easured in ter$s o# per#or$ance accuracy %hich is usually rated %ith %ord error rate 10?R2' %hereas speed is $easured %ith the real ti$e #actor. ;ther $easures o# accuracy include Single 0ord ?rror Rate 1S0?R2 and "o$$and Success Rate 1"SR2. /ost speech recognition users %ould tend to agree that dictation $achines can achie)e )ery high per#or$ance in controlled conditions. There is so$e con#usion' ho%e)er' o)er the interchangea+ility o# the ter$s >speech recognition> and >dictation>. "o$$ercially a)aila+le spea,er.dependent dictation syste$s usually re&uire only a short period o# training 1so$eti$es also called Denroll$ent:2 and $ay success#ully capture
6

continuous speech %ith a large )oca+ulary at nor$al pace %ith a )ery high accuracy. /ost co$$ercial co$panies clai$ that recognition so#t%are can achie)e +et%een 6!C to 66C accuracy i# operated under opti$al conditions. D;pti$al conditions: usually assu$e that users4

ha)e speech characteristics %hich $atch the training data' can achie)e proper spea,er adaptation' and 0or, in a clean noise en)iron$ent 1e.g. &uiet o##ice or la+oratory space2.

This explains %hy so$e users' especially those %hose speech is hea)ily accented' $ight achie)e recognition rates $uch lo%er than expected. Speech recognition in )ideo has +eco$e a popular search technology used +y se)eral )ideo search co$panies. Li$ited )oca+ulary syste$s' re&uiring no training' can recogni(e a s$all nu$+er o# %ords 1#or instance' the ten digits2 as spo,en +y $ost spea,ers. Such syste$s are popular #or routing inco$ing phone calls to their destinations in large organi(ations. 5oth acoustic $odeling and language $odeling are i$portant parts o# $odern statistically. +ased speech recognition algorith$s. *idden /ar,o) $odels 1*//s2 are %idely used in $any syste$s. Language $odeling has $any other applications such as s$art ,ey+oard and docu$ent classi#ication.

2.2 Hidden Markov model (HMM)-based speech recogni ion

/odern general.purpose speech recognition syste$s are generally +ased on *idden /ar,o) /odels. These are statistical $odels %hich output a se&uence o# sy$+ols or &uantities. ;ne possi+le reason %hy *//s are used in speech recognition is that a speech signal could +e )ie%ed as a piece%ise stationary signal or a short.ti$e stationary signal. That is' one could assu$e in a short.ti$e in the range o# 1E $illiseconds' speech could +e approxi$ated as a stationary process. Speech could thus +e thought o# as a /ar,o) $odel #or $any stochastic processes. Another reason %hy *//s are popular is +ecause they can +e trained auto$atically and are si$ple and co$putationally #easi+le to use. In speech recognition' the hidden /ar,o) $odel %ould output a se&uence o# n.di$ensional real.)alued )ectors 1%ith n +eing a s$all integer' such as 1E2' outputting one o# these e)ery 1E $illiseconds. The )ectors %ould consist o# ca$pestral coe##icients' %hich are o+tained +y ta,ing a 3ourier trans#or$ o# a short ti$e %indo% o# speech and decor relating the spectru$ using a cosine trans#or$' then ta,ing the #irst 1$ost signi#icant2 coe##icients. The hidden /ar,o) $odel %ill tend to ha)e in each state a statistical distri+ution that is a $ixture o# diagonal co)ariance Gaussians %hich %ill gi)e li,elihood #or each o+ser)ed )ector. ?ach %ord' or 1#or $ore general speech recognition syste$s2' each phone$e' %ill ha)e a di##erent output distri+utionF a hidden /ar,o) $odel #or a se&uence o# %ords or phone$es is $ade +y concatenating the indi)idual trained hidden /ar,o) $odels #or the separate %ords and phone$es. Bescri+ed a+o)e are the core ele$ents o# the $ost co$$on' *//.+ased approach to speech recognition. /odern speech recognition syste$s use )arious co$+inations o# a nu$+er o# standard techni&ues in order to i$pro)e results o)er the +asic approach descri+ed a+o)e. A typical large.)oca+ulary syste$ %ould need context
8

dependency #or the phone$es 1so phone$es %ith di##erent le#t and right context ha)e di##erent reali(ations as *// states2F it %ould use cepstral nor$ali(ation to nor$ali(e #or di##erent spea,er and recording conditionsF #or #urther spea,er nor$ali(ation it $ight use )ocal tract length nor$ali(ation 1GTL82 #or $ale.#e$ale nor$ali(ation and $axi$u$ li,elihood linear regression 1/LLR2 #or $ore general spea,er adaptation. The #eatures %ould ha)e so.called delta and delta.delta coe##icients to capture speech dyna$ics and in addition $ight use heteroscedastic linear discri$inate analysis 1*LBA2F or $ight s,ip the delta and delta.delta coe##icients and use splicing and an LBA.+ased pro<ection #ollo%ed perhaps +y heteroscedastic linear discri$inate analysis or a glo+al se$i tied co)ariance trans#or$ 1also ,no%n as $axi$u$ li,elihood linear trans#or$' or /LLT2. /any syste$s use so.called discri$inati)e training techni&ues %hich dispense %ith a purely statistical approach to *// para$eter esti$ation and instead opti$i(e so$e classi#ication.related $easure o# the training data. ?xa$ples are $axi$u$ $utual in#or$ation 1//I2' $ini$u$ classi#ication error 1/"?2 and $ini$u$ phone error 1/@?2. Becoding o# the speech 1the ter$ #or %hat happens %hen the syste$ is presented %ith a ne% utterance and $ust co$pute the $ost li,ely source sentence2 %ould pro+a+ly use the Giter+i algorith$ to #ind the +est path' and here there is a choice +et%een dyna$ically creating a co$+ination hidden /ar,o) $odel %hich includes +oth the acoustic and language $odel in#or$ation' or co$+ining it statically +e#orehand 1the #inite state transducer' or 3ST' approach2

2.3 D#namic ime $arping (D%&)-based speech recogni ion


Byna$ic ti$e %arping is an approach that %as historically used #or speech recognition +ut has no% largely +een displaced +y the $ore success#ul *//.+ased approach. Byna$ic ti$e %arping is an algorith$ #or $easuring si$ilarity +et%een t%o se&uences %hich $ay )ary in ti$e or speed. 3or instance' si$ilarities in %al,ing patterns %ould +e detected' e)en i# in one )ideo the person %as %al,ing slo%ly and i# in another they %ere %al,ing $ore &uic,ly' or e)en i# there %ere accelerations and decelerations during the course o# one o+ser)ation. BT0 has +een applied to )ideo' audio' and graphics H indeed' any data %hich can +e turned into a linear representation can +e analy(ed %ith BT0. A %ell ,no%n application has +een auto$atic speech recognition' to cope %ith di##erent spea,ing speeds. In general' it is a $ethod that allo%s a co$puter to #ind an opti$al $atch +et%een t%o gi)en se&uences 1e.g. ti$e series2 %ith certain restrictions' i.e. the se&uences are >%arped> non.linearly to $atch each other. This se&uence align$ent $ethod is o#ten used in the context o# hidden /ar,o) $odels.

Further information
@opular speech recognition con#erences held each year or t%o include I"ASS@' ?uro speech=I"SL@ 1no% na$ed Interspeech2 and the I??? ASRU. "on#erences in the #ield o# 8atural language processing' such as A"L' 8AA"L' ?/8L@' and *LT' are +eginning to include papers on speech processing. I$portant <ournals include the I??? Transactions on Speech and Audio @rocessing 1no% na$ed I??? Transactions on Audio' Speech and Language @rocessing2' "o$puter Speech and Language' and Speech "o$$unication. 5oo,s li,e >3unda$entals o# Speech Recognition> +y La%rence
10

Ra+iner can +e use#ul to ac&uire +asic ,no%ledge +ut $ay not +e #ully up to date 116632. Another good source can +e >Statistical /ethods #or Speech Recognition> +y 3rederic, Ieline, and >Spo,en Language @rocessing 1 EE12> +y Juedong *uang etc. /ore up to date is >"o$puter Speech>' +y /an#red R. Schroeder' second edition pu+lished in EE4. The recently updated text+oo, o# >Speech and Language @rocessing 1 EE!2> +y Iura#s,y and /artin presents the +asics and the state o# the art #or ASR. A good insight into the techni&ues used in the +est $odern syste$s can +e gained +y paying attention to go)ern$ent sponsored e)aluations such as those organi(ed +y BAR@A 1the largest speech recognition.related pro<ect ongoing as o# EEK is the GAL? pro<ect' %hich

in)ol)es +oth speech recognition and translation co$ponents2. In ter$s o# #reely a)aila+le resources' "arnegie /ellon Uni)ersity:s S@*I8J tool,it is one place to start to +oth learn a+out speech recognition and to start experi$enting. Another resource 1#ree as in #ree +eer' not #ree so#t%are2 is the *TL +oo, 1and the acco$panying *TL tool,it2. The ATMT li+raries GR/ li+rary' and B"B li+rary are also general so#t%are li+raries #or large.)oca+ulary speech recognition. A use#ul re)ie% o# the area o# ro+ustness in ASR is pro)ided +y Iun&ua and *aton 116652.

11

3 ,peech .nders anding


Speech understanding in)ol)es integration o# speech recognition' and natural language 18L2 understanding. This integration has great ad)antages4 To 8L' SR can +ring prosodic in#or$ation 1in#or$ation i$portant #or syntax and se$antics +ut not %ell represented in text2F 8L can +ring to SR additional ,no%ledge sources 1e.g.' syntax and se$antics2. The integration o# these technologies presents technical challenges' and challenges related to the &uite de#erent cultures' techni&ues and +elie#s o# the people representing the co$ponent technologies. In large part' 8L research has +een pursued in co$puter science and linguistics depart$entsF the goal is to $odel language understanding $oti)ated +y a desire to understand cogniti)e processes. *ence' the underlying theories tend to +e #ro$ linguistics and psychology. @ractical applications ha)e +een less i$portant than increasing intuitions a+out hu$an processes. There#ore' co)erage o# pheno$ena o# theoretical interest 1usually the $ore rare pheno$ena2 has traditionally +een $ore i$portant than +road co)erage. ;n the other hand' speech recognition research has largely +een practiced in engineering depart$ents %ith practical applications in $ind. Techni&ues $oti)ated +y ,no%ledge o# hu$an processes ha)e there#ore +een less i$portant than techni&ues that can +e auto$atically de)eloped or tuned' and +road co)erage o# a representati)e sa$ple is $ore i$portant than co)erage o# any particular pheno$enon. The integration o# SR and 8L needs to o)erco$e not only technical challenges +ut also the de#erenceNs in $oti)ation' interests' theoretical underpinnings' techni&ues' tools' and criteria #or success o# the t%o groups. *o%e)er' +oth groups ha)e $uch to gain #ro$ colla+oration and such a trend is )isi+le around the %orld. 3.1 Integration o# Speech Recognition and 8atural Language @rocessing SR is concerned %ith acoustic attri+utes o# %ords to a large extent' and %ith lexical and syntactic in#or$ation to lesser extent. ;n the other hand' hu$an speech
12

understanding in)ol)es the integration o# a great )ariety o# ,no%ledge sources' including ,no%ledge o# the %orld or context' ,no%ledge o# the spea,er and=or topic' lexical #re&uency' pre)ious uses o# a %ord or a se$antically related topic' #acial expressions' prosody. Thus' integration o# SR and 8L has +een a consistent goal. *o%e)er' as gra$$atical co)erage increases' standard 8L techni&ues can +eco$e co$putationally di##icult. 3urther' %ith increased co)erage' 8L tends to pro)ide less constraint #or SR. Si$ple $inded concatenation o# an existing speech recognition syste$ and an existing 8L syste$ is su+opti$al due to directed o%n o# in#or$ation. 8ot only errors in SR syste$ can propagate' +ut also there is no %ay the higher le)el ,no%ledge sources help the SR syste$ in pruning the search space. /ost i$portantly' $ost 8L syste$s deal %ith %ritten language rather than spo,en language. In the #or$er case' one can expect gra$$atically correct sentences' %hereas in an interacti)e dialogue' speech disuencies such as restart' re)ision' repetitions' leer sounds and hesitations are co$$on. /ost 8L syste$s are concerned %ith correct analyses o# co$plete sentences than to $ethods #or reco)ery o# interpretations %hen parses are inco$plete %hich the need o# spo,en language understanding is. Thus there is a need to re#or$ulate existing ,no%ledge o# 8L syste$s and to de)ise co$putational $odels o# spo,en language. Traditionally' linguists ha)e studied the properties o# natural language and docu$ented the o+ser)ations in )arious #or$s. This ,no%ledge has to +e trans#or$ed to an algorith$ic #or$ as #ar as possi+le so that the collecti)e ,no%ledge and %isdo$ o# the linguistic co$$unity +eco$es $ore use#ul to the hu$an ,ind. This ,ind o# colla+orati)e %or, +et%een linguists and engineers is especially rele)ant in a $ulti.lingual country such as India.

13

( %e/ 0enera ion


;ne o# the purposes o# interacting %ith co$puters is to access in#or$ation. This in#or$ation has to +e dra%n #ro$ a data+ase and +e presented to the user. A spo,en &uery o# a user is processed +y a speech understanding syste$ %hich #or$ulates a data+ase &uery. The in#or$ation in the data+ase has to +e trans#or$ed into natural language %hich can +e presented to o# spo,en output gi)en the data representation' context and dialogue state. In si$ple' speci#ic tas, do$ains' response sentence te$plates can +e used in si$ple' 1#or exa$ple' in#or$ation a+out the a)aila+ility o# rail%ay reser)ation2 to generate text. *o%e)er' a )ersatile text generation $odule should generate coherent $ulti.sentential responses' and interpreting and responding to users: su+se&uent utterances in the context o# an ongoing interaction. Spo,en language generation re&uires %hat concepts to include and ho% to reali(e the$ in %ords. In addition' it needs to deter$ine into national #or$s #or speech synthesis.

14

) ,peech ,#n hesis

The tas, o# a speech synthesis $odule is to synthesi(e an intelligi+le' natural' easily interpreted and appropriate spo,en )ersion o# the response ta,ing ad)antage o# the context and dialogue state to e$phasi(e certain in#or$ation. The acoustic e)idence needs to in#or$ a+stract units in syntax' se$antics' discourse' and prag$atics. 0hile the intelligi+ility o# speech generated +y current speech synthesis syste$s is good' the naturalness lea)es $uch to +e desired. The issue o# intelligi+ility is pri$arily related to the generation and co$+ination o# speech sounds %hereas i$parting naturalness in)ol)es incorporating supraseg$ental in#or$ation. This in)ol)es prosodic phrasing' i.e.' chun,ing a long sentence into prosodic phrases. @atterns o# )ariation in #unda$ental #re&uency' duration' a$plitude or intensity' pauses' and spea,ing rate ha)e +een sho%n to carry in#or$ation a+out such prosodic ele$ents as lexical stress' phrase +rea,s' and declarati)e or interrogati)e sentence #or$. A high &uality speech synthesis syste$ in any language has to deal %ith these issues in order to +eco$e accepta+le to people as a $eans o# deli)ery o# in#or$ation.

15

* 1ang+age -eso+rces

The ter$ linguistic resources re#ers to 1usually large2 sets o# language data and descriptions in $achine reada+le #or$' to +e used in +uilding' i$pro)ing' or e)aluating natural language 18L2 and speech recognition and synthesis syste$s. ?xa$ples o# linguistic resources are %ritten and spo,en corpora' lexical data+ases and gra$$ars. The need #or such linguistic resources in )ast &uantities is e)en $ore i$portant #or speech recognition syste$s as they use statistical $odels #or representing acoustic units as %ell as language. In the health care do$ain' e)en in the %a,e o# i$pro)ing speech recognition technologies' $edical transcriptionists 1/Ts2 ha)e not yet +eco$e o+solete. /any experts in the #ield Oanticipate that %ith increased use o# speech recognition technology' the ser)ices pro)ided $ay +e redistri+uted rather than replaced. Speech recognition is used to ena+le dea# people to understand the spo,en %ord )ia speech to text con)ersion' %hich is )ery help#ul. Speech recognition can +e i$ple$ented in #ront.end or +ac,.end o# the $edical docu$entation process. 3ront.?nd SR is %here the pro)ider dictates into a speech. recognition engine' the recogni(ed %ords are displayed right a#ter they are spo,en' and the dictator is responsi+le #or editing and signing o## on the docu$ent. It ne)er goes through an /T=editor.

16

5ac,.?nd SR or Be#erred SR is %here the pro)ider dictates into a digital dictation syste$' and the )oice is routed through a speech.recognition $achine and the recogni(ed dra#t docu$ent is routed along %ith the original )oice #ile to the /T=editor' %ho edits the dra#t and #inali(es the report. Be#erred SR is +eing %idely used in the industry currently. /any ?lectronic /edical Records 1?/R2 applications can +e $ore e##ecti)e and $ay +e per#or$ed $ore easily %hen deployed in con<unction %ith a speech.recognition engine. Searches' &ueries' and #or$ #illing $ay all +e #aster to per#or$ +y )oice than +y using a ,ey+oard. Su+stantial e##orts ha)e +een de)oted in the last decade to the test and e)aluation o# speech recognition in #ighter aircra#t. ;# particular note are the U.S. progra$ in speech recognition #or the Ad)anced 3ighter Technology Integration 1A3TI2=3.17 aircra#t 13.17 GISTA2' the progra$ in 3rance on installing speech recognition syste$s on /irage aircra#t' and progra$s in the UL dealing %ith a )ariety o# aircra#t plat#or$s. In these progra$s' speech recogni(ers ha)e +een operated success#ully in #ighter aircra#t %ith applications including4 setting radio #re&uencies' co$$anding an autopilot syste$' setting steer.point coordinates and %eapons release para$eters' and controlling #light displays. Generally' only )ery li$ited' constrained )oca+ularies ha)e +een used success#ully' and a $a<or e##ort has +een de)oted to integration o# the speech recogni(er %ith the a)ionics syste$. So$e i$portant conclusions #ro$ the %or, %ere as #ollo%s4 1. Speech recognition has de#inite potential #or reducing pilot %or,load' +ut this potential %as not reali(ed consistently.

17

. Achie)e$ent o# )ery high recognition accuracy 165C or $ore2 %as the $ost critical #actor #or $a,ing the speech recognition syste$ use#ul %ith lo%er recognition rates' pilots %ould not use the syste$. 3. /ore natural )oca+ulary and gra$$ar' and shorter training ti$es %ould +e use#ul' +ut only i# )ery high recognition rates could +e $aintained. La+oratory research in ro+ust speech recognition #or $ilitary en)iron$ents has produced pro$ising results %hich' i# extenda+le to the coc,pit' should i$pro)e the utility o# speech recognition in high.per#or$ance aircra#t. 0or,ing %ith S%edish pilots #lying in the IAS.36 Gripen coc,pit' ?ngland 1 EE42 #ound recognition deteriorated %ith increasing G.loads. It %as also concluded that adaptation greatly i$pro)ed the results in all cases and introducing $odels #or +reathing %as sho%n to i$pro)e recognition scores signi#icantly. "ontrary to %hat $ight +e expected' no e##ects o# the +ro,en ?nglish o# the spea,ers %ere #ound. It %as e)ident that spontaneous speech caused pro+le$s #or the recogni(er' as could +e expected. A restricted )oca+ulary' and a+o)e all' a proper syntax' could thus +e expected to i$pro)e recognition accuracy su+stantially.O P The ?uro #ighter Typhoon currently in ser)ice %ith the UL RA3 e$ploys a spea,er. dependent syste$' i.e. it re&uires each pilot to create a te$plate. The syste$ is not used #or any sa#ety critical or %eapon critical tas,s' such as %eapon release or lo%ering o# the undercarriage' +ut is used #or a %ide range o# other coc,pit #unctions. Goice co$$ands are con#ir$ed +y )isual and=or aural #eed+ac,. The syste$ is seen as a $a<or design #eature in the reduction o# pilot %or,load' and e)en allo%s the pilot to assign targets to

18

hi$sel# %ith t%o si$ple )oice co$$ands or to any o# his %ing$en %ith only #i)e co$$ands

Helicop ers2
The pro+le$s o# achie)ing high recognition accuracy under stress and noise pertain strongly to the helicopter en)iron$ent as %ell as to the #ighter en)iron$ent. The acoustic noise pro+le$ is actually $ore se)ere in the helicopter en)iron$ent' not only +ecause o# the high noise le)els +ut also +ecause the helicopter pilot generally does not %ear a #ace$as,' %hich %ould reduce acoustic noise in the $icrophone. Su+stantial test and e)aluation progra$s ha)e +een carried out in the past decade in speech recognition syste$s applications in helicopters' nota+ly +y the U.S. Ar$y A)ionics Research and Be)elop$ent Acti)ity 1AGRABA2 and +y the Royal Aerospace ?sta+lish$ent 1RA?2 in the UL. 0or, in 3rance has included speech recognition in the @u$a helicopter. There has also +een $uch use#ul %or, in "anada. Results ha)e +een encouraging' and )oice applications ha)e included4 control o# co$$unication radiosF setting o# na)igation syste$sF and control o# an auto$ated target hando)er syste$. As in #ighter applications' the o)erriding issue #or )oice in helicopters is the i$pact on pilot e##ecti)eness. ?ncouraging results are reported #or the AGRABA tests' although these represent only a #easi+ility de$onstration in a test en)iron$ent. /uch re$ains to +e done +oth in speech recognition and in o)erall speech recognition technology' in order to consistently achie)e per#or$ance i$pro)e$ents in operational settings. 5attle /anage$ent co$$and centers generally re&uire rapid access to and control o# large' rapidly changing in#or$ation data+ases. "o$$anders and syste$ operators need to &uery these data+ases as con)eniently as possi+le' in an eyes.+usy en)iron$ent %here
19

$uch o# the in#or$ation is presented in a display #or$at. *u$an.$achine interaction +y )oice has the potential to +e )ery use#ul in these en)iron$ents. A nu$+er o# e##orts ha)e +een underta,en to inter#ace co$$ercially a)aila+le isolated.%ord recogni(ers into +attle $anage$ent en)iron$ents. In one #easi+ility study speech recognition e&uip$ent %as tested in con<unction %ith an integrated in#or$ation display #or na)al +attle $anage$ent applications. Users %ere )ery opti$istic a+out the potential o# the syste$' although capa+ilities %ere li$ited. Speech understanding progra$s sponsored +y the Be#ense Ad)anced Research @ro<ects Agency 1BAR@A2 in the U.S. has #ocused on this pro+le$ o# natural speech inter#ace. Speech recognition e##orts ha)e #ocused on a data+ase o# continuous speech recognition 1"SR2' large.)oca+ulary speech %hich is designed to +e representati)e o# the na)al resource $anage$ent tas,. Signi#icant ad)ances in the state.o#.the.art in "SR ha)e +een achie)ed' and current e##orts are #ocused on integrating speech recognition and natural language processing to allo% spo,en language interaction %ith a na)al resource $anage$ent syste$.

%raining air ra33ic con rollers


Training #or $ilitary 1or ci)ilian2 air tra##ic controllers 1AT"2 represents an excellent application #or speech recognition syste$s. /any AT" training syste$s currently re&uire a person to act as a >pseudo.pilot>' engaging in a )oice dialog %ith the trainee controller' %hich si$ulates the dialog %hich the controller %ould ha)e to conduct %ith pilots in a real AT" situation. Speech recognition and synthesis techni&ues o##er the potential to eli$inate the need #or a person to act as pseudo.pilot' thus reducing training and support personnel. Air controller tas,s are also characteri(ed +y highly structured
20

speech as the pri$ary output o# the controller' hence reducing the di##iculty o# the speech recognition tas,. The U.S. 8a)al Training ?&uip$ent "enter has sponsored a nu$+er o# de)elop$ents o# prototype AT" trainers using speech recognition. Generally' the recognition accuracy #alls short o# pro)iding grace#ul interaction +et%een the trainee and the syste$. *o%e)er' the prototype training syste$s ha)e de$onstrated a signi#icant potential #or )oice interaction in these syste$s' and in other training applications. The U.S. 8a)y has sponsored a large.scale e##ort in AT" training syste$s' %here a co$$ercial speech recognition unit %as integrated %ith a co$plex training syste$ including displays and scenario creation. Although the recogni(er %as constrained in )oca+ulary' one o# the goals o# the training progra$s %as to teach the controllers to spea, in a constrained language' using speci#ic )oca+ulary speci#ically designed #or the AT" tas,. Research in 3rance has #ocused on the application o# speech recognition in AT" training syste$s' directed at issues +oth in speech recognition and in application o# tas,.do$ain gra$$ar constraints. The USA3' US/"' US Ar$y' and 3AA are currently using AT" si$ulators %ith speech recognition #ro$ a nu$+er o# di##erent )endors' including U3A' Inc' and Adacel Syste$s Inc 1ASI2. This so#t%are uses speech recognition and synthetic speech to ena+le the trainee to control aircra#t and ground )ehicles in the si$ulation %ithout the need #or pseudo pilots.

21

Another approach to AT" si$ulation %ith speech recognition has +een created +y Supre$es. The Supre$es syste$ is not constrained +y rigid gra$$ars i$posed +y the underlying li$itations o# other recognition strategies.

%elephon# and o her domains


ASR in the #ield o# telephony is no% co$$onplace and in the #ield o# co$puter ga$ing and si$ulation is +eco$ing $ore %idespread. Bespite the high le)el o# integration %ith %ord processing in general personal co$puting' ho%e)er' ASR in the #ield o# docu$ent production has not seen the expected increases in use. The i$pro)e$ent o# $o+ile processor speeds $ade #easi+le the speech.ena+led Sy$+ian and 0indo%s /o+ile S$art phones. "urrent speech.to.text progra$s are too large and re&uire too $uch "@U po%er to +e practical #or the @oc,et @". Speech is used $ostly as a part o# User Inter#ace' #or creating pre.de#ined or custo$ speech co$$ands. Leading so#t%are )endors in this #ield are4 /icroso#t "orporation 1/icroso#t Goice "o$$and2' 8uance "o$$unications 18uance Goice "ontrol2' Gito Technology 1GIT; Goice Go2' Speereo So#t%are 1Speereo Goice Translator2 and SG;J. @eople %ith disa+ilities can +ene#it #ro$ speech recognition progra$s. Speech recognition is especially use#ul #or people %ho ha)e di##iculty using their hands' ranging #ro$ $ild repetiti)e stress in<uries to in)ol)ed disa+ilities that preclude using con)entional co$puter input de)ices. In #act' people %ho used the ,ey+oard a lot and de)eloped RSI +eca$e an urgent early $ar,et #or speech recognition. Speech recognition is used in dea# telephony' such as )oice$ail to text' relay ser)ices' and captioned telephone. Indi)iduals %ith learning disa+ilities %ho ha)e pro+le$s %ith thought.to.

22

paper co$$unication 1essentially they thin, o# an idea +ut it is processed incorrectly causing it to end up di##erently on paper2 can +ene#it #ro$ the so#t%are

!. 4551I64%I7N,

Auto$atic translation Auto$oti)e speech recognition 1e.g.' 3ord Sync2 Tele$etric 1e.g. )ehicle 8a)igation Syste$s2 "ourt reporting 1Real.ti$e Goice 0riting2 *ands.#ree co$puting4 )oice co$$and recognition co$puter user inter#ace *o$e auto$ation Interacti)e )oice response /o+ile telephony' including $o+ile e$ail /ulti$odal interaction
23

@ronunciation e)aluation in co$puter.aided language learning applications Ro+otics Gideo ga$es' possi+le expansion into the RTS genre #ollo%ing To$ "lancy:s ?nd 0ar

Transcription 1digital speech.to.text2. Speech.to.text 1transcription o# speech into $o+ile text $essages2 Air Tra##ic "ontrol Speech Recognition

8. 9.%.-E ,675E

In #uture i$portant <ournals include the I??? Transactions on Speech and Audio @rocessing 1no% na$ed I??? Transactions on Audio' Speech and Language @rocessing2' "o$puter Speech and Language. A good insight into the techni&ues used in the +est $odern syste$s can +e gained +y paying attention to go)ern$ent sponsored e)aluations such as those organi(ed +y BAR@A 1the largest speech recognition.related pro<ect

24

ongoing as o# EEK is the GAL? pro<ect' %hich in)ol)es +oth speech recognition and translation co$ponents2.

8. 67N61.,I7N

The +eet o# in#or$ation technology re)olution can +e en<oyed +y the $asses in India only %hen hu$an oriented inter#aces to co$puters are de)eloped and deployed. Spo,en language is still the $eans o# co$$unication used #irst and #ore$ost +y hu$ans. There#ore' it is natural #or people to expect speech inter#aces %ith co$puters. Spo,en dialogue %ith $achines in)ol)es signi#icant ad)ances in the integration o# speech input=output technologies and natural language processing technologies. This necessitates

25

strengthening existing colla+orations +et%een linguists and speech engineers as %ell as initiating ne% ones.

'. :I:I170-45;
1.. 35!. 3!E' 1K! . . *. Budley and T. *. Tarnoc(y' The Spea,ing /achine o# 0ol#gang )on Le$pelen' I. Soc. A$.' Gol. 3.. ' pp. 151.177' 165E. ". G. Lrat(enstein' Sur la raissance de la #or$ation des )oyelles' I. @hys.' Golt 1' pp.

Acoust.

Sir "harles 0heatstone' The Scienti#ic @apers o# Sir "harles 0heatstone' London4
26

Taylor and 3rancis' 1!K6. 4. I. L. 3lanagan' Speech Analysis' Synthesis and @erception' Second ?dition' Springer. 16K . 5. 5. 3ry and @. Benes' the Besign and ;peration o# the /echanical Speech Recogni(er at Uni)ersity "ollege London' I. 5ritish Inst. Radio ?ngr.' Gol. 16' 8o. 4' pp. 11. 6' 1656. 7. T. 5. /artin' A. L. 8elson and *. I. Qadell' Speech Recognition +y 3eature A+straction Techni&ues' Tech. Report AL.TBR.74.1K7' Air 3orce A)ionics La+' 1674.

Gerlag'

27

Vous aimerez peut-être aussi