Académique Documents
Professionnel Documents
Culture Documents
Language Processing
Mehryar Mohri
AT&T Laboratories
mohri@research.att.com
b
a
a b a
0 1 2 3
a
b a
a 1 b 3
a
0 2
b b
b/1
a/0 b/0
0/4 1/0 2/0 a/0
a/2 3/0
b:a
a:a a:b 3 a:a
a:a
1 2
b:b b:a
0 b:a
b:a/1
a:b/0 a:b/1
On-the-fly expansion
lexical
analyzer
syntax
analyzer
semantic
analyzer
intermediate code
generator
code
optimizer
code
generator
Spellchecker
Inflected forms
Index
Set of positions
a:d ε :e d:a
0 1 2 3
On-the-fly implementation
(b)
a:d ε:e d:a
B 0 1 2 3
(c)
ε:ε1 ε:ε1 ε:ε1 ε:ε1 ε:ε1
(d)
ε2:ε ε2:ε ε2:ε ε2:ε
ε:e
2,1 2,2
(ε1:ε1)
c:ε c:ε
(ε2:ε2) (ε2:ε2)
4,3
ε2:ε1 ε1:ε1
x:x 1
x:x
0 ε2:ε2
ε2:ε2
x:x 2
c c
1 b a b a
b 2 3 4 5
0 a
b
b (0,1) (0,2) a
b a
(0,0) a (1,3) (2,4) (3,5)
b/1 a/3
a/3 1 b/2 3/0
a/4
0 2
b/6 b/7
c/0 c/1
c/0 c/1
10 a/5
ε/0 b/1
a/3
a/3 7
6 b/2 b/7 9/0
b/6 8 a/4
leave/61.9
leaves/51.4
Detroit/110
leave/64.6 10
Detroit/109
leaves/50.7
Detroit/106
flight/72.4 7
leave/68.9
leave/82.1 14 Detroit/91.9
flights/64
flights/53.5 6 leave/57.7
Detroit/103
which/81.6 4 flight/43.7
leave/70.9 Detroit/102
5 12 15/0
flights/50.2 leaves/34.6
0 which/72.9 Detroit/99.1
Detroit/106
which/77.7 2
flights/88.2
leaves/39.2
3 9
flights/83.8 leave/54.4 Detroit/102
11
Detroit/99.7
flights/83.4 leaves/67.6
flights/79
leave/31.3
8
leave/35.9 Detroit/99.4
leaves/60.4 Detroit/96.3
13
leave/73.6
leave/37.3
leave/41.9
Figure 14: Toy language model (16 states, 53 transitions, 162 paths).
leaves/62.3
flights/53.1 2 5 Detroit/105
which/69.9
0 1 8/0
flight/53.2 Detroit/105
leave/63.6
3 6
leaves/67.6 Detroit/101
b b
1
t4
a b
{0} {1,2} {3}
b
b/1 b/5
1/0
t4
{(1,2),(2,0)}/2
a/1 b/1
{(0,0)} {(3,0)}/0
b/1 b/3
{(1,0),(2,3)}/0
a:b
b:a 2 c:c
0 a:ba d:ε 3
1
b:aa
a:b
{(0,ε)} {(1,a),(2, ε)}
c:c
b:a
d:a {(3,ε)}
a:b/3
b:a/2 2 c:c/5
0 a:ba/4 d:ε/4 3/0
1/0
b:aa/3
a/1
a:b/3
{(0,e,0)} {(1,a,1),(2,ε,0)}
b:a/2 c:c/5
d:a/5 {(3,ε,0)}/0
On-the-fly implementation
Reduction of redundancy
Control of the resulting size (flexibility)
Equivalent function (or equal set)
No loss of information
b:c/3 5
0 b:a/5
c:b/2
a:a/3 1
b:a/7 c:a/3
4
3 a:a/3
a:a/5
{(2,ε,0)} a:b/3
c:a/2
3
a:c/2
0 b:ε/5 ε::b/1
{(3,ε,0)}
a: ε/3 a:a/3 5
ε:a/2
1 4
c:a/3
ε:a/0
{(1,a,0),(2,b,1),(3,a,2)}
2 c:b/2
{(1,ε,0)}
6
b:c/3
Definition of parameters
Applies to all automata, weighted automata, transducers, and
weighted transducers
Can be used with other semirings (ex: (R; +; ))
leaves/62.3
flights/53.1 2 5 Detroit/105
which/69.9
0 1 8/0
flight/53.2 Detroit/105
leave/63.6
3 6
leaves/67.6 Detroit/101
leave/0.0498
flights/0 2 leaves/0
which/291 Detroit/0
0 1 4 5/0
flight/1.34 leave/0
3
leaves/0.132
a
b t96
1 c a
a 2 4
b b
0 b
3
t97
2
a:3 c:0
d:0 c:1
d:6 3 e:0
a:6 e:0
0 1 6 7
d:6 e:0
b:7
5
b:0 c:0
4
d:0
a:3
c:1
a:6 b:0 2 c:0
0 1
b:7 e:0 e:0
3 4 5
d:6
2
b:ε c:ε a:DB
d:CDB e:C f: ε
a:ABCDB 1 3 6 7
0 c:ε
b:CCDDB
b:ε
4 5
a:DB
b:ε 2 c:ε
a:ABCDB e:C f: ε
0 1 3 5 6
b:CCDDB d:CDB
2
b: ε/0 c: ε/0 a:DB/2
a:DB/2
b:ε/0 2 c: ε/0
a:ABCDB/15 e:C/0 f: ε/0
0 1 3 5 6/0
b:CCDDB/15 d:CDB/9
Automata j j log(jQj))
O( E j j j j
O( Q + E )
Michael Riley
AT&T Laboratories
riley@research.att.com
Find s0 maximizing
X Y
P (s0 ; sk ) = P (sk js0 )P (s0 ) = P (s0 ) P (sj jsj 1)
s1 ;:::;sk 1
1 j k
“Viterbi” approximation:
Short-time (25 msec. Hamming window) spectrum of /ae/ – Hz. vs. Db.
Scale selection:
– Cepstral smoothing
– Parameter sampling (13 parameters)
x2A vs: x 2 X A; 8A X:
o
+
0.015
error rate
o
+
o
+
o
0.005
+ooo
+++ o
o
+ +ooooo oooo oo o o o o oooooo
+++++ +++ oo oo o o
++ ++ + + ++++++ + + + +++++++++++++++++++++
0.0
0 20 40 60 80 100
# of terminal nodes
+ = raw, o = cross-validated
yes
48294/52895
bprob:<27.29
bprob:>27.29
yes yes
5539/10020 42755/42875
eprob:<1.045
eprob:>1.045
no yes
3289/3547 5281/6473
next:cap,upcase+.
next:n/a,lcase,lcase+.,upcase,num
yes no
5156/5435 913/1038
type:n/a
type:addr,com,group,state,title,unit
yes no
5137/5283 133/152
Phonemic Context:
– Phoneme to predict
– Three phonemes to left
– Three phonemes to right
Stress (0, 1, 2)
Lexical Position:
– Phoneme count from start of word
– Phoneme count from end of word
The trees were grown until the data criterion of 500 frames per
distribution was met
Trees pruned using cost-complexity pruning and cross-validation to
select best contexts
About 44000 context-dependent phone models
About 16000 distributions
P [x1 x2 : : : xN ] = P [xN jx1 :::xN 1]P [xN 1 jx1:::xN 2] : : : P [x2 jx1 ]P [x1 ]
where, e.g.,
P [x1 : : : xn ] = c (x1 : : : xn )
N
is the maximum likelihood estimate of that n-gram probability.
For conditional probabilities,
c (x1 : : : xn )
P [xn jx1 : : : xn 1] = c(x : : : x ) :
1 n 1
O A D M
observations phones words
s/-6.200
f/-8.129
t/-6.200
hh/-9.043 aa/-2.743
0 14 20 s/-8.007
ao/-2.593 f/-4.100 f/-3.057
v/-0.421 s/-7.129
th/-3.264 31 t/-2.493
q/-0.336 ax/-0.379
pau/-4.207 th/-2.464
th/-6.257
s/-9.893
s/-10.657
s/-11.479
s/-12.007
ay:-/1.616
t:-/0.067 2
0 s:-/0.035 3 el:-/0.431
-:hostile/2.943 aa:-/0.055 4
18
hh:hostile/0.134
-:-/2.466 -:-/0.000 17
hv:hostile/2.635
1
-:-/0.014
-:-/0.014
l:-/0.112 el:-/0.164 6
-:-/2.466
10
l:-/0.112
l:-/0.112
5
-:-/2.466
battle/10.896
battle/6.603
-/1.882 2
-/2.306
battle/9.268
hostile/9.394
hostile/11.119 0 -/2.374
4 -/3.173
3 bottle/11.510
-/3.537 -/1.913 5
bottle/13.970
-/1.102 1
-/3.961
0 3
hostile/-19.407 battle/-15.250
1
x.x x/x_x:x
x/x_y:x
x.y
y/x_y:y x/y_x:x
y/y_x:y
y.x
B:start := A:start
B:final(state) := A:final(state)
B:arcs(state) := A:arcs(state)
Cache Disciplines:
Expand each state of A exactly once, i.e. always save in cache
(memoize).
Cache, but forget ’old’ states using a least-recently used criterion.
Use instructions (ref counts) from user (decoder) to save and forget.
states arcs
context 762 40386
lexicon 3150 4816
grammar 48758 359532
full expansion 1:6 106 5:1 106
For the same recognition accuracy as with a static, fully expanded
network, on-demand composition expands just 1.6% of the total number
of arcs.
Lexicon Grammar
– Pre-tabulate non-coassessible states in the composition of:
Det(Lexicon) Grammar
...:ε ...:wi
...:w1
...:ε
...:ε ...:wn
>
...:ε
O°A°D°M
cost-so-far(s) + lower-bound-to-complete(s)
With a tight bound, states not on good paths are not explored
With a loose lower bound no better than Dijkstra’s algorithm
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 100
Rescoring
Most successful approach in practice:
approximate
n best rescoring
w1
cheap detailed
o
…
models models wi
wn
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART II 101
PART III
Finite State Methods in Language
Processing
Richard Sproat
rws@bell-labs.com
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 102
Overview
Syntactic analysis
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 103
The Nature of the TTS Problem
Linguistic Analysis
phonemes, durations
and pitch contours
Speech Synthesis
speech waveforms
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 104
From Text to Linguistic Representation
Ñ«YoC
‘The rat is eating the oil’ & N V N
µ µ µ µ
σ σ σ σ
ω ω ω
LH L H LH
Φ
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 105
Russian Percentages: The Problem
How do you say ‘%’ in Russian?
Adjectival forms when modifying nouns
20% skidka ) dvadcat i -procentn a skidka
‘20% discount’ dvadcat i -procent naja skidka
s 20% rastvorom ) s dvadcat i -procent nym rastvorom
‘with 20% solution’ s dvadcat i -procent nym rastvorom
Nominal forms otherwise
21% ) dvadcat~ odin procent
dvadcat’ odin procent
23% ) dvadcat~ tri procent a
dvadcat’ tri procent a
20% ) dvadcat~ procent ov
dvadcat’ procent ov
s 20% ) s dvadcat~ procent ami
‘with 20%’ s dvadcat’ ju procent ami
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 106
Text Analysis Problems
Segment text into words.
Segment text into sentences, checking for and expanding
abbreviations :
Expand numbers
Word pronunciation
– Homograph disambiguation
Phrasing
Accentuation
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 107
Desiderata for a Model of Text Analysis for TTS
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 108
Overall Architectural Matters
Example: word pronunciation in Russian
Morphological analysis:
kost0 Ërfnoungfmascgfinang+0 afsggfgeng
Pronunciation: /kstr0 a/
(Sproat, 1996)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 109
Overall Architectural Matters
Language Model
fst α:β
Lexical Analysis WFST:
γ:δ −1 −1
S O L
Morphological Analysis
D
:
#KOST"{E}P{noun}{masc}{inan}+"A{sg}{gen}#
Phonological Analysis WFST:
L=D O M L O P
fst α:β
M γ:δ
MMA
#KOSTP"A#
S γ:δ γ:δ
P
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 110
Orthography ! Lexical Representation
A Closer Look
Numerals : Expansions
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 111
Chinese Word Segmentation
F ! F asp : le0 perf
FÑ F Ñ vb :
1 4 68
! liao3jie3 understand
j j vb :
2 1 8 11
! da4 big
jó j ó nc :
1 5 56
! da4jie1 avenue
£ £ adv :
1 11 45
! bu4 not
b b vb :
2 4 58
! zai4 at
Ñ Ñ vb :
4 45
! wang4 forget
Ñ£F Ñ vb++£ F npot
11 77
! wo3 I
ñ ñ vb :
4 88
! fang4 place
ñj ñj vb :
8 05
! fang4da4 enlarge
þÌ þ Ì nc :
1 10 70
! na3li3 where
ó ó nc :
1 11 02
! jie1 avenue
Ññ Ñ ñ nc :
10 35
! jie3fang4 liberation
Ññj Ñ ñj urnp :
1 10 92
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 112
Chinese Word Segmentation
Space = : #
BestPath( ÚÑ£FÑñjóbþÌ L) =
Ú Ñ £F
pro4:88 # vb+ Ññ
2 npot12:23 # 1 nc10:92 j 1 ó nc11:45 : : :
‘I couldn’t forget where Liberation Avenue is.’
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 113
Numeral Expansion
NumberLexicon
+
zwei+hundert+vier+und+dreißig
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 114
Numeral Expansion
0:0
1:1
1:1
2:2
2:2
3:3
ε:10∧1 3:3
4:4
0:0
0
5:5 1
4:4
6:6 ε:10∧2 1:1 ε:10∧1
4
5:5
5
7:7 6:6
2:2
8:8 7:7
9:9 3:3 8:8
2
4:4 3 9:9
5:5
6:6
7:7
8:8
9:9
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 115
German Numeral Lexicon
fg
/ 1 : f g f gjfneutg)fsggf##g)/
(’eins num ( masc
/f2g : (zw’eifnumgf##g)/
/f3g : (dr’eifnumgf##g)/
.
..
f gf+++gf1gf10^1g)
/( 0 : f
(z’ehn num gf##g)/
/(f1gf+++gf1gf10^1g) : (’elffnumgf##g)/
/(f2gf+++gf1gf10^1g) : (zw’ölffnumgf##g)/
/(f3gf+++gf1gf10^1g) : (dr’eif++gzehnfnumgf##g)/
..
.
f gf10^1g)
/( 2 : f g f gf##g)/
(zw’an ++ zig num
/(f3gf10^1g) : (dr’eif++gßigfnumgf##g)/
..
.
f ^g
/( 10 2 ) : f gf##g)/
(h’undert num
/(f10^3g) : (t’ausendfnumgfneutgf##g)/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 116
Morphology: Paradigmatic Specifications
Paradigm fA1g
# starke Flektion (z.B. nach unbestimmtem Artikel)
Suffix f++ger fsggfmascgfnomg
Suffix f++gen fsggfmascg(fgengjfdatgjfaccg)
Suffix f++ge fsggffemig(fnomgjfaccg)
Suffix f++gen fsgg(ffemigjfneutg)(fgengjfdatg)
Suffix f++ges fsggfneutg(fnomgjfaccg)
Suffix f++ge fplg(fnomgjfaccg)
Suffix f++ger fplgfgeng
Suffix f++gen fplgfdatg
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 117
Morphology: Paradigmatic Specifications
##### Possessiva ("mein, euer")
Paradigm fA6g
Suffix f++gfEpsg fsgg(fmascg|fneutg)fnomg
Suffix f++ge fsggffemigfnomg
Suffix f++ges fsgg(fmascg|fneutg)fgeng
Suffix f++ger fsggffemig(fgeng|fdatg)
Suffix f++gem fsgg(fmascg|fneutg)fdatg
Suffix f++gen fsggfmascgfaccg
Suffix f++gfEpsg fsggfneutgfaccg
Suffix f++ge fplg(fnomg|faccg)
Suffix f++ger fplgfgeng
Suffix f++gen fplgfdatg
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 118
Morphology: Paradigmatic Specifications
/fA1g : (’aalf++gglattfadjg)/
/fA1g : (’abf++gänderf++glichfadjgfumltg)/
/fA1g : (’abf++gartigfadjg)/
/fA1g : (’abf++gbauf++gwürdigfadjgfumltg)/
..
.
/fA6g : (d’einfadjg)/
/fA6g : (’euerfadjg)/
/fA6g : (’ihrfadjg)/
/fA6g : (’Ihrfadjg)/
/fA6g : (m’einfadjg)/
/fA6g : (s’einfadjg)/
/fA6g : (’unserfadjg)/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 119
Morphology: Paradigmatic Specifications
masc 9
8 neut nom
sg
0 m 1 ’ 2 e 3 i 4 n 5 adj 6 ++ 7 13 femi
sg 11 acc
e nom
pl
masc
s sg 25 neut 23
12 24 gen
r 10
pl
n gen
20
sg
21 femi 22 dat
m acc
17 sg 18 masc 19 dat
pl
14 sg neut 16
15
masc
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 120
Morphology: Finite-State Grammar
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 121
Morphology: Finite-State Grammar
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 122
Morphology: Finite-State Grammar
Unanständigkeitsunterstellung
‘allegation of indecency’
+
"unf++g"anf++gst’ändf++gigf++gkeitf++gsf++gunterf++gst’ellf++gung
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 123
Rewrite Rule Compilation
General form:
! =
; ; ; regular expressions.
Constraint: cannot be rewritten but can be used as a context
Example:
a ! b=c b
(Johnson, 1972; Kaplan & Kay, 1994; Karttunen, 1995; Mohri & Sproat,
1996)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 124
Example
a ! b=c b
w = cab
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 125
Example
Input:
c a b
0 1 2 3
After r:
c a > b
0 1 2 3 4
After f:
c <1 a > b
0 1 2 3 4 5
<2
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 126
Example
After replace:
<1 2 b
c b
0 1 <2 a 4 5
3
After l1 :
b
c b
0 1 <2 a 2 4
After l2 :
c b b
0 1 2 3
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 127
Rewrite Rule Compilation
Principle
Efficiency
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 128
Rule Compilation Method
r f replace l1 l2
r: Σ ! Σ >
f: (Σ [ f>g) >! (Σ [ f>g) f<1 ; <2 g >
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 129
Marking Transducers
a:a c:c
b:b q d:d
a:a c:c
ε:#
b:b q q’ d:d
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 130
Marker of Type 2
#:ε
a:a c:c
b:b q d:d
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 131
The Transducers as Expressions using Marker
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 132
Example: r for rule a ! b=c b
Σ reverse()
c:c
b:b
= a:a
b:b
a:a
0 1
c:c
b:b
0 1
Eps:>
Eps:>
0 1
b:b
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 133
The Replace Transducer
0 2
>:ε
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 134
Extension to Weighted Rules
! =
; ; regular expressions,
formal power series on the tropical semiring
Example:
c ! (:9c) + (:1t)=a t
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 135
Rational power series
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 136
Compilation of weighted rules
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 137
Rewrite Rules: An Example
$:$
#:#
VStop:VStop z:z
z:z #:#
$:$ #:# 3
V:V
s:s s:s $:$
0
z:z s:z
1
s:s
V:V
s:z
s:z
2
#:#
VStop:VStop
$:$
4
VStop:VStop
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 138
Syllable structure
+
T
1 (C V C1:0 $ )
T
2 : (Σ $ (CC \ : (CG [ OL)) Σ )
T
3 : (Σ $ ([+cor] [ /x/ [ / /) /l/ Σ )
4 : (Σ $ ([+cor,+strid] [ /x/ [ / /) /r/ Σ )
estrella: /estreya/ Intro( $ ) Syl )
$/0 4 t/0
0
##/0 1
e/0 2
s/1 3
t/1 14
$/0 5 r/0 $/0 8 y/0
r/1 e/0 a/0 $/0 ##/0
BestPath(/estreya/
atlas: /atlas/ Intro( $ ) Syl )
$/0 4 l/0
0
##/0 1
a/0 2
t/1 3 5
a/0 6
s/1 7
$/0 8
##/0 9/0
l/1 $/0
BestPath(/atlas/
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 139
Russian Percentage Expansion: An example
s 5% skidkoĭ
Lexical Analysis FST
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 140
Language Model FSTs:
! ? = procentnoun (Σ \ :#) # (Σ\ :#)noun
? ! = procentnadj (Σ \ :#) #
(Σ \ :#)noun
! ? = procentn (Σ \ :#) Case\:instr # (Σ \ :#)instr
! ? = procentn (Σ \ :#) sg+Case # (Σ \ :#)pl
:(Σ ? Σ )
+
s pjatigen -procentnadj ojsg+instr skidkoj
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 141
Percentage Expansion: Continued
s 5% skidkoĭ
+
s pjatigen -procentnadj ojsg+instr skidkoj
LP
+
s # PiT"!pr@c"Entn&y # sK"!tk&y
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 142
Phrasing Prediction
Problem: predict intonational phrase boundaries in long
unpunctuated utterences:
For his part, Clinton told reporters in Little Rock, Ark., on Wednesday
k that the pact can be a good thing for America k if we change our
economic policy k to rebuild American industry here at home k and if
we get the kind of guarantees we need on environmental and labor
standards in Mexico k and a real plan k to help the people who will
be dislocated by it.
Bell Labs synthesizer uses a CART-based predictor trained on labeled
corpora (Wang & Hirschberg 1992).
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 143
Phrasing Prediction: Variables
For each < wi ; wj >:
length of utterance; distance of wi in syllables/
stressed syllables/words : : : from the beginning/end of the sentence
automatically predicted pitch accent for wi and wj
part-of-speech (POS) for a 4-word window around < wi ; wj >;
(largest syntactic constituent dominating wi but not wj and vice
versa, and smallest constituent dominating them both)
whether < wi ; wj > is dominated by an NP and, if so, distance of
wi from the beginning of that NP, the NP, and distance/length
(mutual information scores for a four-word window around
< wi ; wj >)
The most successful of these predictors so far appear to be POS, some
constituency information, and mutual information
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 144
Phrasing Prediction: Sample Tree
no
69923/82958
punc:NO
punc:YES
no yes
69826/74358 8503/8600
j3f:CC,CS,EX,FORIN,IN,ININ,ONIN,TO,TOIN j3n:NN,NP
j3f:AT,CD,PART,UH,NA j3n:NNS,PN,NA
no no yes yes
9968/13032 59858/61326 777/848 7726/7752
7
no no no no no yes
1502/2985 8466/10047 8536/9549 51322/51777 30/39 768/809
9 11 12 13
syls:<7.5 j2n:NN,NNS,NP
syls:>7.5 j2n:PN,NA
no yes no no
361/452 1392/2533 6111/7106 2425/2443
21
j3f:CC j1f:AT,CS,FORIN,TOIN
j3f:CS j1f:CC,CD,IN,ININ,NA
yes no no yes
167/295 281/404 56/88 182/277
162 163
j1f:AT,CS,IN,TO raj4:CL,DEACC
j1f:CC,CD,FORIN,ININ,ONIN,TOIN,NA raj4:ACC
no yes no yes
40/60 147/235 275/379 19/25
284 285 286 287
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 145
Phrasing Prediction: Results
Results for multi-speaker read speech:
– major boundaries only: 91.2%
– collapsed major/minor phrases: 88.4%
– 3-way distinction between major, minor and null boundary:
81.9%
Results for spontaneous speech:
– major boundaries only: 88.2%
– collapsed major/minor phrases: 84.4%
– 3-way distinction between major, minor and null boundary:
78.9%
Results for 85K words of hand-annotated text, cross-validated on
training data: 95.4%.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 146
Tree-Based Modeling: Prosodic Phrase Prediction
no
920/1800
[1] dpunc:<3.5
dpunc:>3.5
no yes
620/1080 420/720
no no no yes
495/900 125/180 134/240 314/480
5 7
[3] rpos:N,A lpos:A
rpos:V,Adv,D,P lpos:N,V,Adv,D,P
no no no no
190/300 305/600 30/40 104/200
12
[4] dpunc:<2.5 dpunc:<1.5 lpos:V,Adv
dpunc:>2.5 dpunc:>1.5 lpos:N,D,P
no no no yes no yes
133/200 57/100 117/200 212/400 47/80 63/120
16 26
lpos:N lpos:V,A,Adv rpos:V,Adv,D rpos:N
lpos:V,A,Adv,D lpos:N,D rpos:P rpos:A
yes no no yes
23/40 22/40 64/120 96/180
74 75
rpos:V rpos:V
rpos:Adv,D rpos:Adv,D
yes no no yes
24/40 48/80 33/60 69/120
152 153 154 155
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 147
The Tree Compilation Algorithm
(Sproat & Riley, 1996)
Each leaf node corresponds to single rule defining a constrained weighted
mapping for the input symbol associated with the tree
Decisions at each node are stateable as regular expressions restricting the left
or right context of the rule(s) dominated by the branch
The full left/right context of the rule at a leaf node are derived by intersecting
the expressions traversed between the root and leaf node
The transducer for the entire tree represents the conjunction of all the
constraints expressed at the leaf nodes; it is derived by intersecting together
the set of WFSTs corresponding to each of the leaves
– Note that intersection is defined for transducers that express same-length
relations
The alphabet is defined to be an alphabet of all correspondence pairs that
were determined empirically to be possible
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 148
Interpretation of Tree as a Ruleset
Node 16
1 (Σ (I! [ I! #! [ I! #! #! ))
\ 3 N [A
2 (Σ (N [ V [ A [ Adv [ D)) \
4 (Σ (I! [ I! #! ))
# ) (I1:09 [ #0:41 ) = I (! #)?(N [ V [ A [ Adv [ D) N [A
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 149
Summary of Compilation Algorithm
Each rule represents a weighted two-level surface coercion rule
\ \
RuleL = Compile(T ! L = p p )
2
p PL 2
p PL
\
RuleF = RuleT
2
T F
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 150
Lexical Ambiguity Resolution
Word sense disambiguation:
She handed down a harsh sentence. peine
This sentence is ungrammatical. phrase
Homograph disambiguation :
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 151
Homograph Disambiguation 1
N-Grams
Evidence ld lid Logprob
lead level/N 219 0 11.10
of lead in 162 0 10.66
the lead in 0 301 10.59
lead poisoning 110 0 10.16
lead role 0 285 10.51
narrow lead 0 70 8.49
Predicate-Argument Relationships
follow/V + lead 0 527 11.40
take/V + lead 1 665 7.76
Wide Context
zinc$ lead 235 0 11.20
copper $ lead 130 0 10.35
Other Features (e.g. Capitalization)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 152
Homograph Disambiguation 2
r (P ron1 jCollocationi )
Sort by Abs(Log ( P
P r (P ron2 jCollocationi ) ))
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 154
Homograph Disambiguation 4: Use
Choose single best piece of matching evidence.
Decision List for lead
Logprob Evidence Pronunciation
11.40 follow/V + lead ) lid
11.20 zinc $ lead ) ld
11.10 lead level/N ) ld
10.66 of lead in ) ld
10.59 the lead in ) lid
10.51 lead role ) lid
10.35 copper $ lead ) ld
10.28 lead time ) lid
10.16 lead poisoning ) ld
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 155
Homograph Disambiguation: Evaluation
Word Pron1 Pron2 Sample Size Prior Performance
lives laivz livz 33186 .69 .98
wound waYnd wund 4483 .55 .98
Nice nais nis 573 .56 .94
Begin biqgin beigin 1143 .75 .97
Chi tSi kai 1288 .53 .98
Colon koYqloYn qkoYln 1984 .69 .98
lead (N) lid ld 12165 .66 .98
tear (N) t ti 2271 .88 .97
axes (N) qæksiz qæksiz 1344 .72 .96
IV ai vi fAMW 1442 .76 .98
Jan dcæn jn 1327 .90 .98
routed Mutid MaYtid 589 .60 .94
bass beis bæs 1865 .57 .99
TOTAL 63660 .67 .97
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 156
Decision Lists: Summary
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 157
Decision Lists as WFSTs
The lead example
##:##
φ:φ
1 l:l
φ:φ 2 e:e
φ:φ φ:φ 3
a:a
0
φ:φ 4
d:d
φ:φ 5 1:1
φ:φ nn:nn
6
7
<e>:H1
φ:φ 8
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 158
Decision Lists as WFSTs
Construct an environmental classifier consisting of a pair of transducers C1 and C2 ,
where
– C1 optionally rewrites any symbol except the word boundary or the homograph tags
H0, H1 : : : , as a single dummy symbol ∆
– C2 classifies contextual evidence from the decision list according to its type, and
assigns a cost equal to the position of the evidence in the list; and otherwise passes
∆, word boundary and H0, H1 : : : through:
## follow vb ## ! ## ∆ V0 ## <1>
## zinc nn ## ! ## ∆ C1 ## <2>
## level(s?) nn ## ! ## ∆ R1 ## <3>
## of pp ## ! ## ∆ [1 ## <2>
## in pp ## ! ## ∆ 1] ## <2>
.
..
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 159
Decision Lists as WFSTs
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 160
Decision Lists as WFSTs
0
## c o p p e r nn ## a n d ## l e a d 1
18 nn
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ##
vb 19 20
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 161
Syntactic Parsing and Analysis
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 162
Intersection Grammars
hold vb pl tuna nn sg
0
@@ 1 2 3
@@ 4 5
nn 17
pl 7
@@ 8 9 10 11
@@ 12 13 14 15
@@ 16
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 163
Series of syntactic FSAs to be intersected with the text automaton,
constraining it.
dt
tuna
dt dt
hold
dt 1 @
the vb
ρ ρ @@
2 3 4
factory
0 ρ
cans
ρ
0
@@ 1
the 2
dt 3
@ 4
cans 5
nn 6
pl 7
@ 8
hold 9
vb 10
pl 11
@ 12
tuna 13
nn 14
sg 15
@@ 16
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 164
Top Down Parsing
S = [S the cans hold tuna S]
Tdic =
cans:cans
tuna:tuna
1
[NP:(NP the:the
factory:factory
4 5
cans:cans
tuna:tuna
ρ:ρ
[S:[S NP]:NP)
[VP:[VP ρ:ρ
[NP:[NP [S:(S
3
e:[NP 7
e:NP] e:[VP ρ:ρ
9
0
VP]:VP) 11
e:VP]
[VP:(VP cans:cans ρ:ρ e:NP] 10
2
6
e:[NP 8
hold:hold 12
S]:S)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 165
Local Grammars
dt
φ φ 1
@
φ
0 2
WORD
φ
3 vb
φ 4
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 166
Indexation of natural language texts
Motivation
– Use of linguistic knowledge in indexation
– Optimal complexities
Preprocessing of a text t, O(jtj)
Search for positions of a string x,
O(jxj + NumOccurrences(x))
Existing efficient indexation algorithms (PAT), but not convenient
(use with large linguistic information)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 167
Example (1)
4
5
a:0 2 b:0
a:0 3
0 1 b:0
b:1
b:2 4
b:0 a:0
5 6
a:1
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 168
Example (2)
{2}
{0,1,2,3,4,5} {1,2,5} 2 {3}
a b
a 3
0 1 b
b {4}
b 4 {5}
b a
5 6
a
{3,4}
Figure 28: Indexation with automata t = aabba.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 169
Algorithms
Based on the definition of an equivalence relation R on Σ :
(w1 R w2 ) iff w1 and w2 have the same set of ending positions in t
Construction
– Minimal machines (subsequential transducer or automaton)
– Use of a failure function to distinguish equivalence classes
Can be adapted to natural language text
(not storing list of positions of short words)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 170
Indexation with finite-state machines
Complexity
– Transducers (Crochemore, 1986): Preprocessing O(jtj), Search
O(jxj + NumOccurrences(x)) if using complex labels
– Automata (Mohri, 1996b): Preprocessing quadratic, Search
O(jxj + NumOccurrences(x))
Advantage: use of linguistic information
– Extended search: composition with morphological transducer
– Refinement: composition with finite-state grammar
Applications to WWW (Internet)
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing PART III 171
References
[1] Aho, Alfred V., John E. Hopcroft, and Jeffrey D. Ullman. 1974. The
design and analysis of computer algorithms. Addison Wesley: Reading,
MA.
[2] Aho, Alfred V., Ravi Sethi, and Jeffrey D. Ullman. 1986. Compilers,
Principles, Techniques and Tools. Addison Wesley: Reading, MA.
[3] Austin, S., R. Schwartz, and P. Placeway. 1991. The
forward-backward search algorithm. Proc. ICASSP ‘91, S10.3.
[4] Bahl, L.R., F. Jelinek, and R. Mercer. 1983. A maximum likelihood
approach to continuous speech recognition. IEEE Transactions on PAMI,
pp 179-190, March, 1983.
[5] Bellegarda, J. 1990. Tied mixture continuous parameter modeling for
speech recognition IEEE Transactions on ASSP 38(12) 2033-2045.
[6] Berstel, Jean. 1979. Transductions and Context-Free Languages.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 172
Teubner Studienbucher: Stuttgart.
[7] Berstel, Jean, and Christophe Reutenauer. 1988. Rational Series and
Their Languages. Springer-Verlag: Berlin-New York.
[8] Bird, Steven, and T. Mark Ellison. 1994. One-level phonology:
Autosegmental representations and rules as finite automata.
Computational Linguistics, 20(1):55–90.
[9] Breiman, Leo, Jerome H. Friedman, Richard A. Olshen, and Charles J.
Stone. 1984. Classification and Regression Trees. Wadsworth & Brooks,
Pacific Grove CA.
[10] Chen, F. 1990. Identification of contextual factors for pronunciation
networks. Proc. ICASSP ‘90, S14.9,
[11] Choffrut, Christian. 1978. Contributions à l’étude de quelques
familles remarquables de fonctions rationnelles. Ph.D. thesis, (thèse de
doctorat d’Etat), Université Paris 7, LITP: Paris, France.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 173
[12] Cormen, T., C. Leiserson, and R. Rivest. 1992. Introduction to
Algorithms. The MIT Press: Cambridge, MA.
[13] Eilenberg, Samuel. 1974-1976. Automata, Languages and Machines,
volume A-B. Academic Press.
[14] Elgot, C. C. and J. E. Mezei. 1965. On relations defined by
generalized finite automata. IBM Journal of Research and Development,
9.
[15] Gildea, Daniel, and Daniel Jurafsky. 1995. Automatic induction of
finite state transducers for simple phonological rules. In 33rd Annual
Meeting of the Association for Computational Linguistics, pages 9–15.
ACL.
[16] Good, I. 1953. The population frequencies of species and the
estimation of population parameters. Biometrica V 40(3,4) 237-264.
[17] Gopalakrishnam, P., L. Bahl, and R. Mercer. 1995. A tree search
strategy for large-vocabulary continuous speech recognition Proc.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 174
ICASSP ‘95, 572-575.
[18] Gross, Maurice. 1989. The use of finite automata in the lexical
representation of natural language. Lecture Notes in Computer Science,
377.
[19] Hearst, Marti. 1991. Noun homograph disambiguation using local
context in large text corpora. In Using Corpora, Waterloo, Ontario.
University of Waterloo.
[20] Hopcroft, John E. and Jeffrey D. Ullman. 1979. Introduction to
Automata Theory, Languages, and Computation. Addison Wesley:
Reading, MA.
[21] Johnson, C. Douglas. 1972. Formal Aspects of Phonological
Description. Mouton, Mouton, The Hague.
[22] Joshi, Aravind K. 1996. A parser from antiquity: an early
application of finite state transducers to natural language processing. In
ECAI-96 Workshop, Budapest, Hungary. ECAI.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 175
[23] Kaplan, Ronald M. and Martin Kay. 1994. Regular models of
phonological rule systems. Computational Linguistics, 20(3).
[24] Karlsson, Fred, Atro Voutilainen, Juha Heikkila, and Atro Anttila.
1995. Constraint Grammar, A language-Independent System for Parsing
Unrestricted Text. Mouton de Gruyter.
[25] Karttunen, Lauri. 1995. The replace operator. In 33rd Meeting of the
Association for Computational Linguistics (ACL 95), Proceedings of the
Conference, MIT, Cambridge, Massachussetts. ACL.
[26] Karttunen, Lauri. 1996. Directed replacement. In 34th Annual
Meeting of the Association for Computational Linguistics, pages
108–115. ACL.
[27] Karttunen, Lauri, and Kenneth Beesley. 1992. Two-level rule
compiler. Technical Report P92–00149, Xerox Palo Alto Research Center.
[28] Karttunen, Lauri, Ronald M. Kaplan, and Annie Zaenen. 1992.
Two-level morphology with composition. In Proceedings of the fifteenth
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 176
International Conference on Computational Linguistics (COLING’92),
Nantes, France. COLING.
[29] Katz, S. 1987. Estimation of probabilities from sparse data for the
language model component of a speech recognizer. IEEE Trans. of ASSP,
35(3):400–401.
[30] Kiraz, George Anton . 1996. S.emH.e: A generalized two-level
system. In 34th Annual Meeting of the Association for Computational
Linguistics, pages 159–166. ACL.
[31] Koskenniemi, Kimmo. 1983. Two-Level Morphology: a General
Computational Model for Word-Form Recognition and Production. PhD
thesis. University of Helsinki, Helsinki.
[32] Koskenniemi, Kimmo. 1985. Compilation of automata from
morphological two-level rules. In Proceedings of the Fifth Scandinavian
Conference of Computational Linguistics, Helsinki, Finland.
[33] Koskenniemi, Kimmo. 1990. Finite-state parsing and
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 177
disambiguation. In Proceedings of the thirteenth International
Conference on Computational Linguistics (COLING’90), Helsinki,
Finland. COLING.
[34] Kuich, Wener and Arto Salomaa. 1986. Semirings, Automata,
Languages. Springer-Verlag: Berlin-New York.
[35] Lee, C. and L. Rabiner. 1989. A frame-synchronous network search
algorithm for connected word recognition IEEE Transactions on ASSP
37(11) 1649-1658.
[36] Lee, K.F. 1990. Context dependent phonetic hidden Markov models
for continuous speech recognition. IEEE Transactions on ASSP. 38(4)
599–609.
[37] van Leeuwen, Hugo. 1989. TooL iP: A Development Tool for
Linguistic Rules. PhD thesis. Technical University Eindhoven.
[38] Leggestter, C.J. and P.C. Woodland. 1994. Speaker adaptation of
continuous density HMMs using linear regression. Proc. ICSLP ’94 2:
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 178
451-454. Yokohama.
[39] Ljolje, A.. 1994. High accuracy phone recognition using context
clustering and quasi-triphonic models. Computer Speech and Language,
8:129–151.
[40] Ljolje, A., M. Riley, and D. Hindle. 1996. Recent improvements in
the AT&T 60,000 word speech-to-text system. Proc. DARPA Speech and
Natural Language Workshop, February 1996. Harriman, NY.
[41] Lothaire, M. 1990. Mots. Hermes: Paris, France.
[42] Lucassen, J., and R. Mercer 1984. An information theoretic
approach to the automatic determination of phonemic baseforms.
Proceedings of ICASSP 84, 42.5.1-42.5.4.
[43] Magerman, David. 1995. Statistical decision-tree models for
parsing. In 33rd Annual Meeting of the Association for Computational
Linguistics, pages 276–283. ACL.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 179
[44] Mohri, Mehryar. 1994a. Compact representations by finite-state
transducers. In 32nd Meeting of the Association for Computational
Linguistics (ACL 94), Proceedings of the Conference, Las Cruces, New
Mexico. ACL.
[45] Mohri, Mehryar. 1994b. Minimization of sequential transducers.
Lecture Notes in Computer Science, 807.
[46] Mohri, Mehryar. 1994c. Syntactic analysis by local grammars
automata: an efficient algorithm. In Papers in Computational
Lexicography: COMPLEX ’94, page s 179–191, Budapest. Research
Institute for Linguistics, Hungarian Academy of Sciences.
[47] Mohri, Mehryar. 1995. Matching patterns of an automaton. Lecture
Notes in Computer Science, 937.
[48] Mohri, Mehryar. 1996a. Finite-state transducers in language and
speech processing. Computational Linguistics, to appear.
[49] Mohri, Mehryar. 1996b. On some applications of finite-state
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 180
automata theory to natural language processing. Journal of Natural
Language Engineering, to appear.
[50] Mohri, Mehryar, Fernando C. N. Pereira, and Michael Riley. 1996.
Weighted automata in text and speech processing. In ECAI-96 Workshop,
Budapest, Hungary. ECAI.
[51] Mohri, Mehryar and Richard Sproat. 1996. An efficient compiler for
weighted rewrite rules. In 34th Meeting of the Association for
Computational Linguistics (ACL 96), Proceedings of the Conference,
Santa Cruz, California. ACL.
[52] Murveit, H., J. Butzbeger, V. Digalakis, and M. Weintrab. 1993.
Large-vocabulary dictation using SRI’s decipher speech recognition
system: Progressive search techniques.
In Proceedings of the International Conference on Acoustics, Speech, and
Signal Processing, pages II.319–II.322. ICASSP93.
[53] Odell, J.J, V. Valtchev, P. Woodland, and S. Young. 1994. A one pass
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 181
decoder design for large vocabulary recognition Proc. ARPA Human
Language Technology Workshop. March 1994. Morgan Kaufmann.
[54] Oncina, José, Pedro Garcı́a, and Enrique Vidal. 1993. Learning
subsequential transducers for pattern recognition tasks. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 15:448–458.
[55] Ostendorf, M., and N. Veilleux. 1994. A hierarchical stochastic
model for automatic prediction of prosodic boundary location.
Computational Linguistics, 20(1):27–54.
[56] Paul, D. 1991. Algorithms for an optimal A* search and linearizing
the search in the stack decoder. Proc. ICASSP ‘91, S10.2.
[57] Pereira, Fernando C. N., and Michael Riley. 1996. Speech
recognition by composition of weighted finite automata. CMP-LG
archive paper 9603001.
[58] Pereira, Fernando C. N., Michael Riley, and Richard Sproat. 1994.
Weighted rational transductions and their application to human language
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 182
processing. In ARPA Workshop on Human Language Technology.
Advanced Research Projects Agency.
[59] Pereira, Fernando C. N. and Rebecca N. Wright. 1991. Finite-state
approximation of phrase structure grammars. In 29th Annual Meeting of
the Association for Computational Linguistics, (ACL 91), Proceedings of
the conference, Berkeley, California. ACL.
[60] Perrin, Dominique. 1990. Finite automata. In J. Van Leuwen, editor,
Handbook of Theoretical Computer Science, Volume B: Formal Models
and Semantics. Elsevier, Amsterdam, pages 1–57.
[61] Rabiner, L. 1989. A tutorial on hidden Markov models and selected
applications in speech recognition. Proc. IEEE. 77(2) February 1989.
[62] Randolph, M. 1990. A data-driven method for discovery and
predicting allophonic variation. Proc. ICASSP ‘90, S14.10.
[63] Revuz, Dominique. 1991. Dictionnaires et lexiques, méthodes et
algorithmes. Ph.D. thesis, Université Paris 7: Paris, France.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 183
[64] Riccardi, G. , E. Bocchieri, and R. Pieraccini. 1995.
Non-deterministic stochastic language models for speech recognition. In
Proceedings of the International Conference on Acoustics, Speech, and
Signal Processing, pages I.237–I.240. ICASSP95, May 1995.
[65] Riley, M. 1989. Some applications of tree-based modeling to speech
and language. Proc. DARPA Speech and Natural Language Workshop.
Cape Cod, MA, 339-352, Oct. 1989.
[66] Riley, M. 1991. A statistical model for generating pronunciation
networks. In Proceedings of the International Conference on Acoustics,
Speech, and Signal Processing, pages S11.1.–S11.4. ICASSP91, October
1991.
[67] Riley, M., and A. Ljolje. 1992. Recognizing phonemes vs.
recognizing phones: a comparison. Proceedings of ICSLP, Banff,
Canada, October 1992.
[68] Riley, M., A. Ljolje, D. Hindle, and F. Pereira. 1995. The AT&T
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 184
60,000 word speech-to-text system. Eurospeech ’95, Madrid, Spain, Sept.
1995.
[69] Riley, M., F. Pereira, and E. Chung. 1995. Lazy transducer
composition: a flexible method for on-the-fly expansion of
context-dependent grammar networks. IEEE Workshop of Automatic
Speech Recogntion Snowbird, Utah, December 1995.
[70] Roche, Emmanuel. 1993. Analyse syntaxique transformationnelle du
français par transducteur et lexique-grammaire. Ph.D. thesis, Université
Paris 7: Paris, France.
[71] Roche, Emmanuel. 1996. Finite-state transducers: Parsing free and
frozen sentences. In Proceedings of the ECAI-96 Workshop on Extended
Finite State Models of Language, Budapest, Hungary. European
Conference on Artificial Intelligence.
[72] Salomaa, Arto and Matti Soittola. 1978. Automata-Theoretic Aspects
of Formal Power Series. Springer-Verlag: New York.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 185
[73] Schützenberger, Marcel Paul. 1961. A remark on finite transducers.
Information and Control, 4:185–196.
[74] Schützenberger, Marcel Paul. 1961. On the definition of a family of
automata. Information and Control, 4.
[75] Schützenberger, Marcel Paul. 1977. Sur une variante des fonctions
séquentielles. Theoretical Computer Science.
[76] Schützenberger, Marcel Paul. 1987. Polynomial decomposition of
rational functions. In Lecture Notes in Computer Science, volume 386.
Lecture Notes in Computer Science, Springer-Verlag: Berlin Heidelberg
New York.
[77] Silberztein, Max. 1993. Dictionnaires électroniques et analyse
automatique de textes: le système INTEX. Masson: Paris, France.
[78] Richard, Sproat. 1995. A finite-state architecture for tokenization
and grapheme-to-phoneme conversion in multilingual text analysis. In
Proceedings of the ACL SIGDAT Workshop, Dublin, Ireland. ACL.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 186
[79] Sproat, Richard. 1996. Multilingual text analysis for text-to-speech
synthesis. In Proceedings of the ECAI-96 Workshop on Extended Finite
State Models of Language, Budapest, Hungary. European Conference on
Artificial Intelligence.
[80] Sproat, Richard, Julia Hirschberg, and David Yarowsky. 1992. A
corpus-based synthesizer. In Proceedings of the International Conference
on Spoken Language Processing, pages 563–566, Banff. ICSLP.
[81] Sproat, Richard, and Michael Riley. 1996. Compilation of weighted
finite-state transducers from decision trees. In 34th Annual Meeting of the
Association for Computational Linguistics, pages 215–222. ACL.
[82] Sproat, Richard, Chilin Shih, William Gale, and Nancy Chang. 1996.
A stochastic finite-state word-segmentation algorithm for Chinese.
Computational Linguistics, 22(3).
[83] Tarjan, R. 1983. Data Structures and Network Algorithms. SIAM.
Philadelphia.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 187
[84] Traber, Christof. 1995. SVOX: The implementation of a
text-to-speech system for German. Technical Report 7. Swiss Federal
Institute of Technology, Zurich.
[85] Tzoukermann, Evelyne, and Mark Liberman. 1990. A finite-state
morphological processor for Spanish. In COLING-90, Volume 3, pages 3:
277–286. COLING.
[86] Voutilainen, Atro. 1994. Designing a parsing grammar. Technical
Report 22, University of Helsinki.
[87] Wang, Michelle, and Julia Hirschberg. 1992. Automatic
classification of intonational phrase boundaries. Computer Speech and
Language, 6:175–196.
[88] Woods, W.A. 1970. Transition network grammars for natural
language analysis. Communications of the Association for the
Computational Machinery, 13(10).
[89] Woods, W.A. 1980. Cascaded ATN grammars. Computational
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 188
Linguistics, 6(1).
[90] Yarowsky, David. 1992. Word-sense disambiguation using statistical
models of roget’s categories trained on large corpora. In Proceedings of
COLING-92. Nantes, France. COLING.
[91] Yarowsky, David. 1996. Homograph disambiguation in
text-to-speech synthesis. In Jan van Santen, Richard Sproat, Joseph Olive,
and Julia Hirschberg, editors, Progress in Speech Synthesis. Springer,
New York.
[92] Young, S.J., J.J. Odell, and P.C Woodland. 1994. Tree-based
state-tying for high accuracy acoustic modelling. Proc. ARPA Human
Language Technology Workshop. March 1994. Morgan Kaufmann.
M.Mohri-M.Riley-R.Sproat Algorithms for Speech Recognition and Language Processing REFERENCES 189