Mel-Spectrum Computation

Seminar
Seminar Speech
Speech Recognition
Recognition
Mel-spectrum computation
new_fe_sp.c
Presentation by Yu Zhang
scuyuzh@hotmail.com
Oct 1st,2003
Mel-frequency
Mel-frequency Wrapping
Wrapping
We know that human ears, for frequencies lower than
1 kHz, hears tones with a linear scale instead of
logarithmic scale for the frequencies higher that 1 kHz.
The mel-frequency scale is a linear frequency spacing
below 1000 Hz and a logarithmic spacing above 1000
Hz.
The voice signals have most of their energy in the low
frequencies. It is also very natural to use a mel-spaced
filter bank showing the above characteristics.
Mel-frequency
Wrapping
Use the following approximate formula to compute the mels for a
given frequency f in Hz:
mel ( f ) 2595 * log 10(1 f / 700)

line 165 of new_fe_sp.c
float32 fe_mel(float32 x)
{
return( 2595.0 * ( float32 ) log10 (1.0 + x / 700.0 ) );
}
float32 fe_melinv(float32 x)
{
return( 700.0 * ( ( float32 ) pow (10.0 , x / 2595.0 ) - 1.0 ) );
}
The mel-frequency scale is a linear

frequency spacing below 1000 Hz
and a logarithmic spacing above
1000 Hz.
For each tone with an actual frequency, f, measured in Hz, a subjective pitch
is measured on a scale called the mel scale.
The pitch of a 1 kHz tone, 40 dB above the perceptual hearing threshold, is
defined as 1000 mels.
Mel-frequency
Wrapping
Figure 1
Figure 2
Figure 1: Power Spectrum without Mel-frequency Wrapping

Figure 2: Mel-frequency Wrapping of Power Spectrum
Considering the full image with the mel frequency wrapping set, there is less imformation than the one without the mel
frequency.But instead if we looking in details,we see that the image with the mel frequency wrapping keeps the low frequences
and removes some informaiton.
To summarize,the Mel Frequency wrapping set allows us to keep only the part of useful information.
Mel
Mel spectrum
spectrum
The Mel spectrum is computed by multiplying the Power
Spectrum by each of the Triangular Mel Weighting filters
and integrating the result.
N /2
S [l ] S [k ]Ml[k ]
= 0,1,,L-1
k 0
S[k] is the power spectrum

N is the length of the Discrete Fourier Transform
L is total number of Triangular Mel weighting filters.
Building
Building the
the Triangular
Triangular Mel
MelWeighting
Weighting filters
filters
line 62 in new_fe_sp.c
int32 fe_build_melfilters(melfb_t *MEL_FB)
{
//estimate filter coefficients
MEL_FB->filter_coeffs = (float32 **)fe_create_2d(MEL_FB->num_filters,
MEL_FB->fft_size, sizeof(float32));
MEL_FB->left_apex = (float32 *) calloc(MEL_FB->num_filters,sizeof(float32));
MEL_FB->width = (int32 *) calloc(MEL_FB->num_filters,sizeof(int32));
filt_edge = (float32 *) calloc(MEL_FB->num_filters+2,sizeof(float32));
melmax = fe_mel(MEL_FB->upper_filt_freq);
melmin = fe_mel(MEL_FB->lower_filt_freq);
for (i=0;i<=MEL_FB->num_filters+1; ++i){
filt_edge[i] = fe_melinv(i*dmelbw + melmin);
}
for (whichfilt=0;whichfilt<MEL_FB->num_filters; ++whichfilt) {

//Building the triangular mel weighting filters
Building
Building the
the Mel
Mel spectrum
spectrum
line 156 in new_fe_sp.c
void fe_mel_spec(fe_t *FE, float64 *spec, float64 *mfspec)
{
int32 whichfilt, start, i;
float32 dfreq;
dfreq = FE->SAMPLING_RATE/(float32)FE->FFT_SIZE;
for (whichfilt = 0; whichfilt<FE->MEL_FB->num_filters; whichfilt++){
start = (int32)(FE->MEL_FB->left_apex[whichfilt]/dfreq) + 1;
mfspec[whichfilt] = 0;
for (i=0; i< FE->MEL_FB->width[whichfilt]; i++)
mfspec[whichfilt] +=FE->MEL_FB->filter_coeffs[whichfilt][i]*spec[start+i];
}
}
N /2
S [l ] S [k ]Ml[k ]
/*
k 0
*FE is the triangular mel weighting filter
*spec is the power spectrum
l=0,1,L-1
*mfspec is the mel spectrum
variables marked in red are coefficients of mel weighting filter
*/
REFERENCES
(1)SPHINX III Signal Processing Front End Specification
31 August 1999, Michael Seltzer (mseltzer@cs.cmu.edu)
CMU Speech Group
(2) Digital Signal Processing Mini-Project
An Automatic Speaker Recognition System
Minh N. Do, Audio Visual Communications Laboratory
Swiss Federal Institute of Technology, Lausanne, Switzerland
(3) Project of Digital Signal Processing - AN AUTOMATIC SPEAKER
RECOGNITION SYSTEM
Swati Rastogi (DSC) swati.rastogi@epfl.ch
David Mayor (DSC) david.mayor@epfl.ch

Mel-Spectrum Computation

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Mel-Spectrum Computation

Transféré par

Droits d'auteur :

Formats disponibles

Seminar

mel ( f ) 2595 * log 10(1 f / 700)

The mel-frequency scale is a linear

Figure 1: Power Spectrum without Mel-frequency Wrapping

S[k] is the power spectrum

for (whichfilt=0;whichfilt<MEL_FB->num_filters; ++whichfilt) {

Vous aimerez peut-être aussi