Académique Documents
Professionnel Documents
Culture Documents
Overview :
Ø
ØPresenting an Artificial Neural Network to
recognize and classify speech.
Ø
ØSpoken digits.
Ø
Ø“one”,”two”,”three”, etc…
Ø
ØChoosing a speech representation scheme.
Ø
ØTraining Perceptron.
Ø
ØResults.
Representing Speech :
vProblem
vSolution
Ø
ØExtract speech-related information.
ØSee: Spectrogram.
Representing Speech :
“one” “one”
Waveform
Spectrogram
Spectrogram :
ØShows change in
amplitude spectra
over time.
ØThree dimensions
ØX Axis: Time
ØY Axis: Frequency
ØZ axis: Color
intensity represents
magnitude.
Mel Frequency Cepstrum Coefficients :
vInput layer
Ø26 Cepstral Coefficients
vHidden Layer
Ø100 fully-connected hidden-
layer units
ØWeight range between -1 +1
ØInitially random
ØRemain constant
vOutput
Ø1 output unit for each target
ØLimited to values between 0
and +1
Sample Training Stimuli
(Spectrograms) :
“one”
“two”
“three”
Training the network :
vSupervised learning
ØChoose intended target and create a target vector.
Ø56 dimensional target vector.
ØIf training the network to recognize spoken “one”, target
has a value of +1 for each of the known “one” stimuli and 0
for everything else.
v
vUpdate weights
Øv=vprevious+γ(t-o)hT
Øv is weight vector between hidden-layer units and
output
Øγ (gamma) is learning rate
Results :
Target = “one”
ØLearning rate: +1
ØBias: -1
Ø100 hidden-layer
units
Ø3000 iterations
Ø316 seconds to
learn target
Results :