Vous êtes sur la page 1sur 20 Robust Pitch Detection using DCT
based Spectral Autocorrelation
Under the guidance of
Dr. Rajesh kumar Dubey
Submitted by:
Sudhakar Rai(15316004)
Nikhil Singh Gaur(15316024) Pitch can be defined as the extent to which sound is
high or low.
Pitch is the perceived fundamental frequency of sound.
Pitch detection is known as determining the level of
intensity of voice.
Pitch detection is very important in
some related tasks of voice processing. Pitch detection
is crucial task in singing voice separation also. Pitch
detection also play important role in Musical information
retrieval, Identification of the singer and in lyric
recognition.
Pitch can identify gender of singing voice.
Pitch also can examine or find the time of voice
recording or the time slot of voice recording. Techniques in pitch
extraction
 Time domain approaches
 (1) ACF (Autocorrelation function) and MACF
(Modified Autocorrelation function)
 (2) Normalized cross correlation function NCCF
 (3) AMDF (Average magnitude difference function)
 Frequency domain approaches
 (4) CPD (Cepstrum Pitch Determination)
 (5) DCT (discrete cosine transformation) based
spectral autocorrelation
3 Method 1:
ACF (Autocorrelation function)
 Autocorrelation function (ACF)
By definition , auto - correlatio n is
N
1
R m
(
)
lim
x n
(
)
x n
(
m
),
0
m
M
0
N 
2
N 
1
n

N
R
for
n
'
'
and
' -'
are symmetrica l, so only
n
0 is used.
N
 
1
m
1
R m
(
) 
x n
(
)
x n
(
m
),
0
m
M
0
N
n  0
4 What is Auto-
correlation, R(m)?
 E.g.
 x=[1 5
7
1
4 ]
 N=5,
 R(0)=[x(0)*x(0)+x(1)*x(1)+x(2)*x2+x(3)*x(3)+x(4)*x(4)]
 R(0)= (1+ 25+49+1+16)=92
 R(1)=[x(0)*x(1)+x(1)*x(2)+x(2)*x(3)+x(3)*x(4)]
 x=[1
5
7
1
[1 5 7
(5+ 35+
4 ]
1 4 ]
7+ 4)=51
 And so on…
 R=[92.0000
51.0000
40.0000
21.0000
4.0000]
5 Importance of linear
prediction analysis in speech
Speech signal is produced by the convolution of
excitation source and time varying vocal tract system
components.
These excitation and vocal tract components are to be
separated from the available speech signal to study
these components independently . For deconvolving
the given speech into excitation and vocal tract system
components, method the Linear Prediction analysis is
developed.
6  The speech sample s(n) are related to the
excitation u(n) by the simple difference
equation
Between the
pitch
pulses
Gu(n) is
zero.
So
the
present speech sample is predicted from the linear
weighted summation of the past speech samples Excitation is zero during pitch
pulses so u(n)=0 We process the speech signal through the linear
predictor with predictor coefficients and the
output is :
The error between the actual signal and predicted value is given by:  E(n) consist of train of impulses .before performing spectral
autocorrelation function we do linear prediction analysis so
that residual of LP analysis are impulses who Fourier
transformation will be flat . Example 2: Discrete Cosine Transform
(DCT)
a
...
...
a
11
1 N
...
...
...
...
A
 C
N
N
...
...
...
...
a
...
...
a
N
1
NN
1
,
k
1,1
l
N
N
a kl
(2
l
1)(
k
1)
2
cos
,2
k
N
,1
l
N
N
2 N
*
1
T
C  C
,
C
 C
13 However, Fourier transformation has strong
– it is complex
– it has poor energy compaction
• energy compaction is the ability to pack the energy of
the spatial sequence into as few frequency coefficients
as possible
if compaction is high we only have to transmit a few
coefficients.
14 algorithm
15 Algorithm
1.Record a speech signal.
2.Preprocess the speech signal through linear prediction analysis to
flatten the spectrum
3.Take the frame size of 20ms with overlap of 10 ms to get a pitch
contour.
4.Find out dct magnitude spectrum for each analysis of frame.
Dct spectrum is smoothed by following window
W(k)=1
for 0<k<N/2
W(k)=0.5*(1-cos(2*pi*K/N))
for N/2<k<N
5.SAF is applied on smoothed DCT spectrum
16 Result :
This algorithm was tested on number
of speech segment and it is found to
be a robust tool for obtaining a good
estimate of the fundamental
frequency.
Which is clearly shown in the graph.   THANK YOU
ANY QUESTION