Académique Documents
Professionnel Documents
Culture Documents
Speech
P.Mahesha D.S.Vinod
Department of Computer Science and Engineering Department of Information Science and Engineering
S.J.College of Engineering S.J.College of Engineering
Mysore-570006 Mysore-570006
maheshsjce@yahoo.com dsvinod@daad-alumni.de
594
3. METHODOLOGY diagram for computing MFCC is given in figure 2. The step-by-
The overall process of dysfluent and normal speech classification step computations of MFCC are discussed briefly in the following
is divided into 4 steps as shown in figure 1. sections.
595
3.3.4 Step 4 :Mel Filter Bank Processing only approximated locally and all computation is delayed until the
A set of triangular filter banks is used to approximate the classification is done. Each query object (test speech signal) is
frequency resolution of the human ear. The Mel frequency scale is compared with each of training object (training speech signal).
linear up to 1000 Hz and logarithmic thereafter[1]. A set of Then the object is classified by a majority vote of its neighbors
overlapping Mel filters are made such that their center frequencies with the object being assigned to the class most common amongst
are equidistant on the Mel scale. The Filter banks can be its k nearest neighbors (k is a positive integer, typically small). If
implemented in both time domain and frequency domain. For the k = 1, then the object is simply assigned to the class of its nearest
purpose of MFCC processing, filter banks are implemented in neighbor [8].
frequency domain. The filter bank according to Mel scale is In this study minimum distance is calculated from test speech
shown in figure 3. signal to each of the training speech signal in the training set. This
classifies test speech sample belonging to the same class as the
most similar or nearest sample point in the training set of data. A
Euclidean distance measure is used to find the closeness between
each training set data and test data. The Euclidean distance
measure equation is given by :
n
d e ( a , b) = (b a )
i =1
i i
2
(8)
596
Table 3. Dysfluent speech classification result with 3 different [4] C.Buchel C and Sommer M. What causes stuttering? PLoS
set Biol 2(2): e46 doi:10.1371/journal.pbio.0020046, 2004.
Speech samples Dysfluent [5] D.Sherman. Clinical and experimental use of the iowa scale
Set 1 Set 2 Set 3 of severity of stuttering. Journal of Speech and Hearing
Training number 40 40 40 Disorders, pages 316-320, 1952.
Test number 10 10 10 [6] Johnson et al. The Onset of Stuttering; Research Findings
Correct classification 8 8 8 and Implications. University of Minnesota
Rate of classification (%) 90 80 80 Press., Minneapolis.
Average classification (%) 86.67 [7] Lindasalwa et al. Voice recognition algorithms using mel
frequency cepstral coefficients (mfcc) and dynamic time
Table 4. Normal speech classification result with 3 different warping (dtw) techniques. Journal of Computing, 2(3):2151-
set 9617, March 2010.
[8] Hao Luo Faxin Yu, Zheming Lu and Pinghui Wang. Three-
Speech samples Dysfluent dimensional model analysis and processing. Advanced topics
Set 1 Set 2 Set 3 in science and technology, Springer, 2010.\
Training number 40 40 40 [9] J.G.Proakis and D.G.Manolakis. Digital signal processing.
Test number 10 10 10 Principles, Algorithms and Applications. MacMillan, New
Correct classification 9 9 9 York.
Rate of classification (%) 90 90 90 [10] J.Harrington and S.Cassidy. Techniques in Speech Acoustics.
Average classification (%) 86.67 Kluwer Academic Publishers, Dordrecht.
[11] M.A.Young. Predicting ratings of severity of stuttering.
Journal of Speech and Hearing Disorders, pages 31- 54,
5. CONCLUSION 1961.
The speech signal can be used as a reliable indicator of speech [12] M.E.WIngate. Criteria for stuttering. Journal of Speech and
abnormalities. We have proposed an approach to discriminate Hearing Research, 13:596-607, 1977.
dysfluent and normal speech based on MFCC feature analysis. By
using k-NN classifier, an average accuracy of 86.67% and 93.34% [13] Ibrahim Patel and Y Srivinasa Rao. A frequency spectral
is obtained for dysfluent and normal speech respectively. In this feature modeling for hidden markov model based automated
work we have considered combination of three types of speech recognition. The second International conference on
dysfluencies which are important in classification of dysfluent Networks and Communications, 2010.
speech. In the future work number of training data can be [14] P.Howell and M. Huckvale. Facilities to assist people to
increased to improve the accuracy of testing data and different research into stammered speech. Stammering research: an
feature extraction algorithm can be used to improve the on-line journal published by the British Stammering
performance. Association, 1:130-242, 2004.
[15] S.Devis P.Howell and J.Batrip. The UCLASS archive of
6. REFERENCES stuttered speech. Journal of Speech, Language and Hearing
[1] Speech technology: A practical introduction Research, 52:556-559, April 2009.
topic:Spectrogram, cepstrum and mel-frequency [16] E.M.Prather W.L.Cullinan and D.Williams. Comparison of
analysis.Technical report, Carnegie Mellon University procedures for scaling severity of stuttering. Journal of
andInternational Institute of Information Technology Speech and Hearing Research, pages 187-194, 1963.
Hyderabad.
[2] C. Becchetti and Lucio Prina Ricotti. Speech Recognition.
John Wiley and Sons, England.
[3] Oliver Bloodstein. A Handbook on Stuttering. Singular
Publishing Group Inc., San-Diego and London.
597