Académique Documents
Professionnel Documents
Culture Documents
1. Formants
When the frequency spectrum of a speech signal is obtained, frequency peaks can be observed. These peaks correspond to a multiple of the fundamental frequency of the sampled voice. An example is shown in Figure 1.
Figure 1: Spectrum of the vowel ah showing three formant regions. Obtained from http://www.sfu.ca/sonic-studio/handbook/Formant.html A formant is a peak in an acoustic frequency spectrum which results from the resonant frequencies of any acoustical system. These formants describe the spectral structure of voiced speech. They are the characteristic partials that enable us to identify the type of sound produced, especially the vowels.
The advantages of using this algorithm are as follows: Conceptually simple. Easy to implement in parallel. Behaves well in the presence of nasalization. Provides realistic formant estimates.
Formant tracking is then achieved by searching the codebook for the most suitable set of formant values. There are two methods of formant tracking: 1. Frame-by-frame formant tracking: Estimate formants for each frame independently. MAP estimate reduces to the ML estimates. 2. Formant tracking with continuity constraints: Continuity constraints added in the form of formant transition probabilities. Tracking performed using a Viterbi search. The advantages of using this method over other approaches: The relationship between formant values and their contribution to the acoustic measurement is explicitly represented through the predictor codebook. Explores the complete formant space, thus avoiding errors due to premature elimination of formant candidates during the analysis step.
3. Brief Conclusion
The majority of these papers are motivated by the error rate or disadvantages when using LP polynomials to perform formant tracking. I still do not have a thorough understanding of these methods and I have not yet researched on the LP methods. This is what I will be working on for the next few weeks.