Vous êtes sur la page 1sur 3

7

CHAPTER 2 LITERATURE SURVEY

2.1

Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection

In 2011, Sudhendu R. Sharma, Mark J.T. Smith made a system based on automatic speech recognition (ASR) technology can provide important functionality in computer assisted language learning applications. This is a young but growing area of research motivated by the large number of students studying foreign languages. They propose a Hidden Markov Model (HMM) based method to detect mispronunciations. Exploiting the specific dialog scripting employed in language learning software, HMMs are trained for different pronunciations. New adaptive features have been developed and obtained through an adaptive warping of the frequency scale prior to computing the cepstral coefficients. The optimization criterion used for the warping function is to maximize separation of two major groups of pronunciations (native and nonnative) in terms of classification rate. Experimental results show that the adaptive frequency scale yields a better coefficient representation leading to higher classification rates in comparison with conventional HMMs using Mel-frequency cepstral coefficients. In this mispronunciation detection project, the talkers are 20 native speakers of English who have completed one year college-level introductory Spanish course and 20 native Spanish students. 10 Spanish words comprise the corpus. Each speaker pronounces each word 10 times. The human scoring juries are composed of 22 adult native speakers of Spanish. Scores range from 1 (poor) to 7 (excellent) based on the level of mispronunciation. In the training of correct and incorrect pronunciation groups, outliers, such as samples from non-native speakers pronounced well enough (closer to the mean score of the native group) or vice versa, are removed so that the samples within each group are more homogeneous.

SRI SATYA SAI INSTITUTE OF SCIENCE & TECHNOLOGY, SEHORE Isolated Word Speech Recognition System Using Mel Spectrum and Dynamic Time Warping

LITERATURE SURVEY

2.2

Unsupervised Intralingual and Cross-Lingual Speaker Adaptation for HMM-Based Construction Speech Synthesis Using Two-Pass Decision Tree

In 2011, Matthew Gibson and William Byrne, made Unsupervised Intralingual and CrossLingual Speaker Adaptation for HMM-Based Speech Synthesis Using Two-Pass Decision Tree Construction. This paper first presents an approach to the unsupervised speaker adaptation task for HMM-based speech synthesis models which avoids the need for such supplementary acoustic models. This is achieved by defining a mapping between HMM-based synthesis models and ASR-style models, via a two-pass decision tree construction process. Second, it is shown that this mapping also enables unsupervised adaptation of HMM-based speech synthesis models without the need to perform linguistic analysis of the estimated transcription of the adaptation data. Third, this paper demonstrates how this technique lends itself to the task of unsupervised cross-lingual adaptation of HMM-based speech synthesis models, and explains the advantages of such an approach. Finally, listener evaluations reveal that the proposed unsupervised adaptation methods deliver performance approaching that of supervised adaptation.

2.3

Proposed Work

Although the HMM based speech recognition systems are fairly accurate, but the amount of processing done in HMM is not suitable for mobile devices. Thus, there is a need for speech recognition system capable of recognizing speech using fewer resources than HMM. Although in a sense that means sacrificing some accuracy, but the resource demanded by HMM cant be met by small mobile devices as well as the operating system working on small platform cant provide the memory required by HMM algorithm. A more simple way of doing is through dynamic programming, this is implemented in this project. The idea for dynamic programming came from the fact that HMM is employed where strict speaker dependent speech recognition is necessary, not in places where word to word recognition

SRI SATYA SAI INSTITUTE OF SCIENCE & TECHNOLOGY, SEHORE Isolated Word Speech Recognition System Using Mel Spectrum and Dynamic Time Warping

LITERATURE SURVEY

based on isolated things is required. A graphical User Interface is proposed based on our work on dynamic programming, and there is also a facility to store ten different voices.

SRI SATYA SAI INSTITUTE OF SCIENCE & TECHNOLOGY, SEHORE Isolated Word Speech Recognition System Using Mel Spectrum and Dynamic Time Warping

Vous aimerez peut-être aussi