Vous êtes sur la page 1sur 49

CONTENT MONITORING OF MUSIC USING

MATLAB
B.E (EL) PROJECT REPORT

PREPARED BY:
SYEDA KOMAL FATIMA (EL-096)
MASOOMA BATOOL (EL-117)
AIMEN NAZ ASLAM (EL-080)

PROJECT ADVISORS
MR. DANISH MAHMOOD KHAN
MR. SYED ABBAS ALI

(INTERNAL)
(EXTERNAL)

Department of Electronic Engineering


NED University of Engg. & Tech.,
Karachi 7527

ACKNOWLEDGEMENT
First and foremost we owe our sincere gratitude to ALMIGHTY ALLAH for blessing and
helping us in every difficulty we faced, big and small ,and taking us out of every little and
huge trouble. It was only Him ,who showed us path when we couldnt find any. We are
immeasurably thankful to Him for providing us with all the massive support ,emotional &
spiritual, during our journey of this project .It was only due to His blessings ,what we thought
was impossible became possible. All praise and gratefulness to ALLAH ALMIGHTY .May He
make us follow right path always. (Ameen)
It is a mind teasing exertion to design a task of Final Year projects caliber. It requires an outof-the box vision and massive knowledge .We thank our external advisor Mr. Abbas Ali from
the bottom of our hearts , who proposed the idea of this project. We are thankful to him for
taking time out of his much busy schedule to guide and supervise us .The idea proposed by
him contained all the elements of innovation and learning that should be the part of a Final
Year Project.
We also deeply thank our internal advisor Mr. Danish Mehmood Khan, for his guidance on a
path which was new and unfamiliar to us, but under his supervision, eventually the path
remained no new to us.

DEDICATION
This report is dedicated to our parents for their support, encouragement, and to their
unconditional love. To see us successful and complete our work with flying colors, they bore
hardships. We also dedicate it to our respectable and honorable teachers who were with us
in this journey towards success.

ABSTRACT
We represent a system for facilitating music and entertainment industry by automating
the judging system in reality shows related to music. In this system, matching is done
between the songs as sung by the candidates to the songs as sung by experts .The
system will identify the song as a whole and then will split the waveform into segments
for easier comparison. Each piece or segment of waveform is seen for musical feature.
The flaws are detected when the pieces of information are consolidated back into a
single waveform through comparison between musical information of both songs. The
comparison then produce the result in the form of percentage.

TABLE OF CONTENTS
ACKNOWLEDGMENT
DEDICATION..
ABSTRACT
TABLE OF FIGURES
LIST OF ABBREVIATIONS

1
2
3
6
7

CHAPTER 1...... 8
INTRODUCTION...... 8
1.1 BACKGROUND.. 8
1.2 OUR FOCUS. 8
1.3 CURRENT TREND 9
1.4 MOTIVATION AND NEED. 11
1.5 SCOPE 11
1.6 TOOLS 12
CHAPTER 2. 13
2.1 SOUND (AUDIO) PROCESSING.. 13
2.2 FORMAT OF AUDIO FILES. 17
2.3 WHAT IS MATLAB?......................................................................................... 18
2.3.1 MATLAB 18
2.3.2 GRAPHICS AND GRAPHICAL USER INTERFACE PROGRAMMING. 19
2.3.3 AUDIO PROCESSING IN MATLAB. 20
2.3.4 MATLAB AUDIO PROCESSING EXAMPLES. 20
2.3.5 AUDIO FORMATS SUPPORTED BY MATLAB 22
2.3.6 ADVANTAGES OF USING MATLAB FOR SIGNAL PROCESSING 22
2.4 MIR TOOLBOX 23
CHAPTER 3.
PROCESSES INVOLVED
3.1 INTRODUCTION.
3.2 AUDIO FILES.
3.3 SEGMENTATION
3.4 FEATURE EXTRACTION.
3.5 MATCHING
3.6 RESULT.

24
24
24
25
25
25
26
26

CHAPTER 4. 27
METHODOLOGY. 27
4.1 LOADING OF AN AUDIO FILE.... 27

4.2 SEGMENTATION METHODS..


4.3 PITCH EXTRACTION METHODS...
4.3.1 FILTER BANK.
4.3.2 AUTO CORRELATION.
4.3.3 SUMMING.
4.3.4 PEAK PICKING..
4.3.5 PITCH DETERMINATION
4.5 MATCHING METHOD

28
29
31
32
35
37
38
39

CHAPTER 5.. 40
RESULTS AND DISCUSSIONS 40
CHAPTER 6. 46
CONCLUSIONS AND FURTHER ENHANCEMENT.. 46
REFERENCES............................................................................................................ 47

TABLE OF FIGURES
CHAPTER 2
2.1 EXAMPLE OF FFT 1 . 15
2.2 EXAMPLE OF FFT 2 . 16
2.3 TABLE OF AUDIO FORMATS SUPPORTED BA MATLAB.. 22
CHAPTER 3
3.1 FLOW DIAGRAM OF PROJECT. 24
CHAPTER 4
4.1 AUDIO WAVEFORM GENERATED BY miraudio COMMAND..
4.2 SEGMENTED AUDIO WAVEFORM
4.3 PITCH EXTRACTION STEPS.
4.4 USING AUTO CORRELATION TO FIND PITCH
4.5 ENVELOP AUTO CORRELATION
4.6 SUMMATION OF ALL AUTO CORRELATED SIGNALS .
4.7 PEAK PICKING OF A SIGNAL.
4.8 PITCH WAVEFORM OF A SIGNAL.
4.9 FILTER BANK METHOD

27
29
30
32
34
36
37
38
41

CHAPTER 5
5.1 AUDIO WAVEFORM OF alvida..
5.2 AUDIO WAVEFORM OF alvida-sample
5.3 SEGMENTED WAVEFORM OF alvida.
5.4 SEGMENTED AUDIO WAVEFORM OF alvida-sample
5.5 PITCH GRAPH OF alvida.
5.6 PITCH GRAPH OF alvida-sample..

41
41
42
42
43
43

LIST OF ABBREVIATIONS
HCDF
RMS
MIR
MFCC
MIDI

The Harmonic Change Detection Function


Root Mean Square
Music Information Retrieval
Mel-Frequency Cepstrum Co-efficient
Musical Instrument Digital Interface

CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
As years are passing by ,all of us are noticing great popularity being received by newly
emerging singers in the field of entertainment. Owing to the immense number of TV shows
that are coming up, this business has become one of the prominent industries across the
world in the sector of entertainment. Due to the fact that more and more people now spend
a great part of their day following such reality shows to relax and to get relieve from their
everyday stress, such entertainment has now become vital part of their daily lives. It has now
become a need of time to automate this system as it is global now. First of all we need to
understand the main issue of the system due to absence of automation in these reality
shows. This leads the judges towards biasing. The deserving candidate needs to win .Since
now a great part of world audiences are following the judgment made in these programs and
assess it , this system would be much needed and appreciated.

1.2 OUR FOCUS


Our focus in this project will to design a technique for matching the music as sung by the
candidates to the music already embedded by us within our system. The system will load the

musical tone as a whole and then will segment the waveform into parts for easier
comparison. Each piece of music is seen for several structures .The defects are detected
when the sections of information are combined back in a single waveform. This single
waveform is matched with the music information that is embedded within the software as
sung by the experts. The comparison then produce the result.

1.3 CURRENT TREND


A lot of research is being done in the area of music by the engineers all across the globe.
Some of the research and work that has been already done are as follows:
MATCHING PURSUIT
This algorithm takes in music and puts out sheet music automatically. A team from the
University of Jaen (UJA) Spain, has created a particular method to automatically detect and
find the musical notes in an audio file and generate sheet music. The system has the quality
to determine the notes even if different type of instrument ,different musicians, a different
genre of music or varied recording studio environment is used .It is capable of adapting to
the variance in recording medium and hence very effective.

10

SPECIALIZED TECHNIQUE FOR MELODY MATCHING


Literal matching of musical notes is quite different from musical perception. In this paper the
analyses is being done on the properties of music, its perception, and the manner in which it
is being used by database users, and its usage for the analysis of the extraction of
monophonic music from polyphonic music; such melodies can eventually be used for
matching of different pieces of music. This paper has been presented by Computer Science
Department of Royal Melbourne Institute of Technology, Melbourne, Victoria, Australia
PITCH MATCHING AND SCANNING
The major issue here was to differentiate between the quality of music as sung my musicians
and non musicians. In this paper the aim was to produce a fast yet reliable procedure that
contains the comparison of original songs temporal ordination .It is used as a way of
determining who actually knows the correct application of notes and who is a layman. This
research work is done by Department of Speech-Language Pathology and Audiology,
Universidade Federal de So Paulo - UNIFESP - So Paulo (SP), Brazil

11

1.4 MOTIVATION AND NEED


As there is a strong need for the development of an automated process looking at the hype being
created by reality shows now a days, analyzing how far it has gone and how fast it has been gaining
popularity with growing time, there is an urgent need to automate the process of manual judgment
There are thousands of people who love melodies .Numerous number of reality shows are happening
all around the world having competitions based on singing but the judgment is purely based on mans
response to the notes detection. Here we are introducing an automatic judgmental system that
would store the notes of the original song and would compare it with the singers notes. The result
will be shown in a percentage form that how accurate the persons performance was. Participants are
often unhappy on the judgments made as they think that it is based on unfair means. This project
could bring in some advancement in the media.
Shows like X FACTOR , AMERICAN IDOL,INDIAN IDOL and now PAKISTAN IDOL are the most famous
among the youth so this project would be well appreciated by every person who loves to sing.

1.5 SCOPE
It will be helpful for the music industry which has rapid growth now a days as well as for
business and academics .Taking a broader point of view ,in foreign countries, music is taken
as a subject so this work can be very useful on international basis. It can also be used for
Music Information Retrieval which have several applications such as Recommender systems,

12

Track separation and instrument recognition , Automatic music transcription ,Automatic


categorization ,Music generation etc

1.6 TOOLS
MATLAB

13

CHAPTER 2
2.1 SOUND (AUDIO) PROCESSING
Audio signal processing: the field of audio processing was developed to encounter the
problems associated when dealing with audio, the use of which is nowadays very common. It
is actually the manipulation of sound or auditory signals to meet our desired effect. Signal
processing may occur in analog as well as in digital domain .While analog signals work on
actual signals , digital signals works on binary representation of that signal.
However human ear do not differentiate between an analog or digitally processed signals
but experiments and logics show that control of audio signal is better achieved when dealing
with digital information having discrete packets.
Audio processing is often used for the enhancement of particular signals as may be wanted
before they are actually transmitted. Audio signal processing is popular all over the world
these days, for there is a vast range of techniques available here for the manipulation of the
audio signals. Audio processing is also a requirement for altering the sound characteristics
such as pitch, timbre, jitter etc, in all these we need variation of sound system, for example,
lowering or enhancement of sound that is required to meet our needs and at the same time
elimination of noise bursts that might occur.

14

When we send a signal it is changed either to one of the two forms either analog or digital,
then when it reaches the destination it is further made compatible for that particular device,
by conversion.

Digital Signal Processing for Music: Musical tones can be differentiated from each other on
the basis of volume, timbre and pitch. Volume has the unit called decibels and it is defined as
the power or amplitude of the corresponding wave. Frequency is measured in the unit
hertz(Hz),and shows the intensity of the tone. The typical piano has notes between 28 and
4,000 Hz. The third distinguishing feature, timbre, is used to differentiate between
instruments as all instruments shows combination of sine waves so timbre provides good
information for differentiating when they play the same note.

Sampling: The sound that we receive or want to send needs to be sampled and quantized. To
store sound wave created by human voice (or a musical instrument) in a computer, it needs
to be discretized. This is done through sampling and quantizing respectively. For example, if
the signal has highest frequency represented as F Hz, the sampling frequency(Fs) needs to
be 2F ,according to Nyquist-Shannon Theorem. This Sampling theorem has to be observed if
least distortion is required.

15

Frequency and Fourier Transforms: For splitting up a signal, which is a musical tone here,
into constituent sinusoids, Fourier transform is used. The type of Fourier transform used for
discrete signals is DFT, discrete Fourier transform and FFT.As this system deals with pitches ,
frequency is essential.

Through FFT it is easier to observe lower pitches.

Fig2.1: Example Of An FFT

16

On x-axis is the frequency in Hz, while the corresponding magnitude is on y-axis.

A bigger picture of first peak:

The fundamental frequency and its harmonics are easily detectable here. Lowest vibration
depicts the lowest frequency present in the sound. The harmonics that are multiples of the
fundamental are shown through peaks in the graph above.

17

2.2 FORMATS OF AUDIO FILES


There are currently many formats of audio files are being used. Some of them are listed
below:

Wav- it is best used in Windows PCs, they have the ability to store large files of
music.

mp3 this format is the most famous for music purpose, and for downloading
and storing music. The quality of audio is kept same by removal of such
components of music that are merely audible. Mp3 format is useful where
storage size is needed to be kept small.

au it is one of the standard audio file format that is best used with Sun, Unix and
Java.

Other formats include:

ogg

dct

flac

aiff

vox

raw

wma it the most common ,Windows Media Audio format owned by Microsoft.

18

aac

ram

dss

msv

dvf

The other uncommon formats are as follows:

atrac (.oma, .omg, .atp)

mid

ape

2.3 WHAT IS MATLAB?


2.3.1 MATLAB
(matrix laboratory) is a multipurpose interactive environment and works on High-level
language .It is developed by MathWorks. MATLAB allows solution of matrices, plotting of
graphs, algorithms creation and manipulation. It has great ability for interfacing the
programs that might have been written in other languages for example in C, C++, Java,
python etc
Although MATLABs main focus was on numerical computing, many additional toolboxes
enhances the features of MATLAB. We have tried some of them before coming onto MIR

19

Toolbox. They were Chroma Toolbox, MIDI Toolbox etc. Other features are SIMULINK, signal
processing toolbox etc.
MATLAB do all the complex tasks together through its abilities and provide compact and easy
solutions.
Today, MATLAB is being utilized by thousands of users all around the world for commercial
as well as for private use.
Some of the many fields in which MATLAB shows its expertise are given below:

Audio processing.

Image processing.

Video processing.

Control systems.

Signal communication.

Digital signal processing and many more.

2.3.2 GRAPHICS AND GRAPHICAL USER INTERFACE PROGRAMMING


The good thing about MATLAB is that it supports integrating and performing applications
with graphical user interface features. MATLAB uses GUIDE. (GUI development environment)
for graphically designing GUIs. It also has built-in features that does graph-plotting .using the
function plot, For example, a graph between two vectors x and y may be determined.

20

2.3.3 Audio Processing in Matlab


We use the help of Matlab for signal processing and analysis. Some key concepts and
function used in Matlab that are handy in music and audio processing are following:

Matlab can be used to extract and perform manipulations on discrete signals as used
in digital processing.

Audio file can be easily imported to Matlab using wavread() command.

We may type in individual expressions in the Matlab interpreter. A variety of C


commands can be saved in Matlab using .m extensions and they may be run
whenever desired. Users can also write Matlab functions.

Most used functions: size, abs, sum, plot, axis, stem, fft, ifft, grid etc

The help function in MATLAB is of immense help .Even if someone who is not fully
aware of MATLAB commands may go through those help commands to get aware of
any of the features requiring for manipulation of audio.

Through MATLAB we are now able to handle real time processing implementation,
the result of Matlab in this regard is above satisfactory.

2.3.4 Matlab Audio Processing Examples


A lot of audio processing project has been done through Matlab, some of them are described
briefly here as an example. It can be observed that these projects also work on the matching
of music, but in these projects matching is done solely on the basis of notes produced by

21

musical instruments. Same musical instrument, wherever in the world they are present will
have same frequencies for same notes, or even if different instrument is being used ,it can
easily be related to other instruments, but a song sung by two human beings can never have
same frequency at similar notes, therefore our system is unique with respect to the projects
described below as it is based on the matching of music created by human voice.

Synchronization of MIDI scores to music audio This project was designed to


obtain the transcript of a real audio file containing music. For this purpose MIDI
version of the track is found and is then aligned in time. The resultant MIDI file can
then be used as approximated transcript of that musical audio file. To do this
procedure, the MIDI file is first converted into a mask in Matlab , and its
spectrogram is observed. The cells of spectrogram containing energy are determined,
then this is aligned with the spectrogram of the original musical audio file.

Identification of chroma features - Chroma features takes in the melodic/harmonic


spectra and converts it into special spectral features like MFCC. It combines parts or
elements of audio with chromatic information.

Matching music through a beat tracker Through chroma features, beats can be
defined, through this, songs may be matched which have similar harmonic content
although the instruments and temporal order used might differ.

2.3.5 Audio Formats Supported By Matlab:

22

TABLE 2.2

2.3.6 Advantages Of Using Matlab For Signal Processing

Vast and diverse library with immense help of built-in functions is available.

Algorithms involving audio and music are quickly built.

It has the ability to manipulate variety of file formats.

Audio can be visualized through vivid plots.

It blocks the processing of very large audio files to avoid memory issues.

23

2.4 MIR TOOLBOX


MIRtoolbox is a Matlab based toolbox which contains numerous functions for musical
feature extractions. These musical feature includes pitch, tonality, rhythm , structures etc.
The objective of this toolbox is to provide over all view of the computational approaches
used in the process of Music informational retrieval. The basic functions are already built in
this toolbox for auto-correlation, filter bank, pitch extraction etc which will be used in this
system.

24

CHAPTER 3
PROCESSES INVOLVED
3.1 INTRODUCTION
The basic flow of the project is as following:

Loading of an Audio file

Segmentation

Feature Extraction

Matching

Result
Figure3.1: Flow Diagram Of Project

25

3.2 AUDIO FILES


In this system there will be two audio files on which processing will be applied. First audio file
will be the original song as sung by the expert and the second audio file will the sample as
sung by the contestant. Our aim is to match the Sample audio file with the Original Audio
file. Both of these files are loaded on MATLAB separately.

3.3 SEGMENTATION
The loaded audio files are then segmented i:e to say the whole audio files are divided into
smaller chunks separately. The segmentation can be done manually on the basis of temporal
positions or it can be automated using different methods.

3.4 FEATURE EXTRACTION


In this project we majorly focused on one feature i:e pitch. The segmented audio files are
now processed and pitch is extracted for each segment. Different pitch extraction methods
can be used but auto-correlation pitch extraction is mainly used in this project. The lesser
the segment size , the greater will be the accuracy in the result.

26

3.5 MATCHING
The extracted pitches for segmented audio files are represented in the form of graphs.
Therefore there are two graphs ,One representing the pitches of original audio file and other
representing the pitches for sample audio file. For matching purpose ,the slopes are found
out for both the graphs and the individual slopes are matched in form of percentage.

3.6 RESULT
After slopes are individually matched, all the resultant percentages are averaged and the
final result is found out. Final result shows the matching of both the audio files , original and
sample , in form of a percentage.

27

CHAPTER 4
METHODOLOGY
4.1 LOADING OF AN AUDIO FILE
Audio files are loaded with the help of command miraudio (audio-name).This command
loads the audio file named as whatever written inside the inverted commas. The audio file
must be of WAV format or of AU format. In this project all audio files used are in WAV
format. Miraudio loads the audio file, transform it and display it in form of waveform. The
audio waveform can be resampled according to our requirement by using this syntax:
miraudio(mysong, Sampling, sampling-rate).The desired rate is written in place of
sampling-rate, it should be numeric in nature. By default the sampling rate is kept 44100
Hz.

Fig4.1: Audio Waveform Generated By miraudio Command.

28

4.2 SEGMENTATION METHODS


Audio file can be segmented according to different bases. The syntax used for segmentation
in MIRtoolbox is mirsegment (audio-file, segmentation-method).

Manual Segmentation:
If segmentation is manual and done on temporal position, a row matrix is provided in
the place of segmentation-method .Suppose v is the row matrix having N columns
then each column indicates the segmentation point .The command will look like this
mirsegment(audio-file, v) .

Automated Segmentation:
Automated segmentation methods can also be used. Methods supported by
MIRtoolbox are:
1. HCDF
2. Novelty
3. RMS

If any one of the methods are being used then mirsegment command will appear as
follows:
Mirsegment(audio-file,Novelty)

29

In this project manual segmentation is applied.


The different colors in segmented audio files waveform show different segments.

Fig4.2: Segmented audio waveform.

4.3 PITCH EXTRACTION METHOD


Pitch can be defined as the perceived frequency of a sound. To extract pitch of a
segmented musical signal multiple steps are followed in our system .The flow chart of these
steps are is as follows:

30

Fig4.3: Pitch Extraction Steps

To extract pitch first each segment is subjected to filter bank which decomposes the segment
into audio signal of two frequency channels. Now auto-correlation theorem is applied on the
channels and they are summed back. Peaks of resultant waveform is then determined. After
peaks have been determined from the resultant waveform of each segment, pitch is
extracted and peaks of final waveform is found out to represent resultant pitches.

31

4.3.1 FILTER BANK:


In order to make our system as near to human beings listening perception as possible, it is
useful to decompose audio signal into different frequency components .Human ear basically
decomposes the audio signal into frequency bands in cochlea. Bank of filters can be used to
separate the audio signal into different frequency components. Filter banks supported by
MIRtoolbox are Gamma tone Filter Bank and 2Channels Filter bank. The command used in
MIRtoolbox for filtering input signal through bank of filters is
Mirfilterbank(,type-of-filterbank).The type of filter bank used in the system is 2channels
filter bank.

2Channels Filter Bank:


To computationally simplify filter bank , mirfilterbank(,2channels) is used ,which
uses two frequency channels . One channel is for low-frequencies i:e below 1KHz and
one channel is for high-frequencies i:e above 1Khz .Envelope extraction is done using
half-wave rectification on both high-frequency channel and low-frequency
channel.2channels filter bank method is specifically very useful for multi-pitch
extraction.

32

Fig4.9: Filter Bank Process

4.3.2 AUTO-CORRELATION
Speech processing is a challenging task due to the complexity of human voice. Pitch is one of
the characteristics which can define human speech and in our case musical sound as sung by
human.
Pitch of a tone is perceived by the brain by its fundamental frequency as well as its
periodicity. Even if fundamental frequency is missing from a sound signal , same melody is
perceived ,this is because of same periodicity. Therefore, we can say that pitch detection
depends upon the periodic qualities of the sound waveform.
Auto-correlation is a method which is not based on the amplitude characteristic of a sound
wave signal but rather it is based on the periodic characteristic of the signal. The autocorrelation function basically transforms the signal to display the structure of waveform.

33

Therefore it can be used for pitch-detection on the basis of same principle. Let suppose x(m)
is a signal which is exactly periodic with period P, we can write it as follows:
x(m)=x(m+P)

for all values of m

Then it can be easily shown that:


Rx(n) = Rx(n+ P)

(1)

Equation 1 tells that auto-correlation of periodic signal will also be periodic with the same
periodicity (P).
General method of finding pitch through auto-correlation method is to first decompose the
sound signal into smaller frames, then overlap a frame over itself and shift it with respect to
time axis. And in the last multiply all the signals together. If any of the two frames are similar
or almost similar, the result of auto-correlation will show distinctive peak. The difference
between the lag of maximum peak and the second maximum peak is defined as fundamental
period of the signal. By dividing this difference from the sampling frequency, pitch is
obtained.

34

Fig4.4: Using Autocorrelation To Find Out Pitch

In MIRtoolbox the function used for finding autocorrelation of the signal is mirautocor. Each
segment passed from the filter bank has been decomposed into 2 channels. Now autocorrelation is applied on these channels separately.
mirautocor(..., Generalized, k) performs the auto-correlation in frequency domain .It
includes compression of magnitude of spectral representation. Autocorrelation using
Discrete Fourier Transform is expressed as :
y = IDFT(|DFT(x)|^2),
and more generally as,
y = IDFT(|DFT(x)|^k),

35

It is recommended to compress the auto-correlation, k <2 ,to decrease the width of peaks in
the result of auto-correlation, but this compression may increase the signals sensitivity to
noise. According to the study of Tolonen & Karjalainen, (2000), using value of k=0.67 a good
compromise can be achieved.
Due to harmonics, in autocorrelation function, peaks other than at the lag corresponding to a
particular periodicity will be shown which are actually the multiples of that particular
periodicity. For avoiding this redundancy and automatically remove these harmonics
mirautocor(..., Enhanced, b) is used. Where value of b is by default 2 to 10,this function first
half-wave rectify the original auto-correlation function and time scale it by the factor b or a
factor list b, and thereafter subtract it from originally clipped function.

4.3.3 SUMMING
Till this step, all segments have been passed through filter bank ,through which envelope has
been extracted and low frequency part and high frequency are separated and on both of
them auto-correlation function has been applied. Now to sum these auto-correlated
enveloped channel signals mirsum command is used.
For example , autocorrelation has been applied to envelopes of the channels using
mirautocor function:
ac=mirautocor(e)

36

Fig4.5: Envelope Autocorrelation


then these channels can be summed back using command:
s=mirsum(e)

Fig4.6: Summation Of All Auto correlated Signals

37

4.3.4 PEAK PICKING


To find out the pitch using resultant of autocorrelation function, peak is required. To find out
this peak MIRtoolbox uses function mirpeaks().If x is a curve ,peaks are represented by
mirpeaks(x).

Fig4.7: Peak Picking Of A Signal

38

4.3.5 PITCH DETERMINATION


After peaks has been determined by each segment, corresponding pitch height is found out
and finally pitches of each segment is represented in form of a waveform graph where
maximum and minimum pitch is shown by pulse sign using mirpeaks() function, the
same is represented by the last block of flow chart of pitch extraction .The last peak of each
segment is not shown by default .The whole steps of filter bank ,auto-correlation and peak
picking can be done in MIRtoolbox using mirpitch(..) command. Where input in the case of
this system is the segmented musical sound waveform.

Fig4.8: Pitch Waveform Of A Signal

39

4.5 MATCHING METHOD


The aim of this project was to find out the similarity between a song sung by different
singers. After obtaining the graph from mirpitch command, which shows the pitches of each
segment, it was observed that it is possible to match or compare the high notes and low
notes of both songs using slope matching method. The slope between the two points are
found at same time instants at which segmentation was done.
Formula of slope used is:

Slope=

First the individual slopes of each segment of both songs are found at same temporal
positions.
The slope of first song at a particular temporal position is matched with the slope of second
song at the same temporal position, and the result is stored in form of a percentage in an
array. Now it is found out how many of the slopes are matched above 90% , how many
slopes matched in between 90% and 80% and so on till 0%.These results are again stored in
an array and averaged to give final result.

40

CHAPTER 5
RESULTS AND DISCUSSION
Here results are observed by observing two audio files. The audio file named alvida contains
the song sung by an expert and the audio file named alvida-sample contains the song sung
by a contestant.
Both of these files are uploaded separately , and the whole process of pitch extraction is
applied individually.
The following are the waveform graphs obtained of both the audio files :
First using miraudio(alvida) and miraudio(alvida-sample) to load both of the files.

41

Fig5.1:Audio Waveform Of Alvida.

Fig5.1: Audio Waveform Of Alvida-Sample

42

Now audio waveform of both songs are segmented using mirsegment function.

Fig5.2: Segmented Audio Waveform Of Alvida.

Fig5.3: Segmented Audio Waveform Of Alvida-Sample.

43

And, finally pitches for both the songs are found out using mirpitch function.

Fig5.4: Pitch Graph Of Alvida.

Fig5.5: Pitch Graph Of Alvida-Sample

44

Now slopes are to be matched for which an algorithm has been designed according to the
principle as discussed in chapter 4
The individual matching of slopes came out to be as follows:

The number of slopes matching in the range of 90-100 % ,80-90% and so on till 10% and
above are determined. The slopes which matched below 10% have been ignored for
simplification
purpose as these slopes were degrading the result .21 slopes out of 55 were matched 90% or
above, but when they were being averaged along with the below 10% matching slopes, the
result was not as accurate to human perception as it was after ignoring these slopes. Also
when the sample was listened and judged by a musical expert, the result given by him was
nearer to the result obtained by ignoring slopes matched below 10%. 21 slopes are in above
90% range,3 in 80-70%,4 in 70-60%,2 in 60-50%,0 in 50-40%,2 in 40-30%,3 in 40-30%,4 in 3020% and 4 in 20-10% ranges.

45

Now these numbers are multiplied by their respective ranges and averaged to find out final
answer.
Finally ,the final percentage is found to be as follows:

The final answer shows that there is approximately 66 % match in song sung by expert and
song as sung by a contestant. In this system threshold to get qualified is kept above 45-55%
match. If the result is in between 45-55%,then the contestant is considerable ,if result is
below this range then contestant is disqualified ,and if result is above this range then
contestant can be considered qualified. This range is set by observing many samples, and
comparing their results with their actual human ear perception i:e how good and matched
sample sounds to human ears.

46

CHAPTER 6
CONCLUSION AND FUTURE
ENHANCEMENT
In conclusion it can be said that this unique system provides a basic structure to facilitate
judging in reality shows. But there is still a lot of room for improvement in this system to
make it more accurate. This system is based on only one feature of musical sound signals
whereas, there are multiple of features including rhythm, tonality, jitter etc which play an
important part to enhance or to degrade the quality of a song. Therefore future work can be
done through including more features in this system. Also the pitch extraction in our system
is done through auto-correlation method which is the basic method among all the other
methods but has its limitations, therefore other methods can also be used and tested against
the method used in this system. But nevertheless this system provides the backbone to all
the future enhancements that can be done on this system and automate the judging in ever
growing area of entertainment i:e music.

47

REFERENCES
http://en.wikipedia.org/wiki/MATLAB
http://www.wisegeek.com/what-is-audio-processing.htm
http://en.wikipedia.org/wiki/Audio_signal_processing
http://www.cs.hmc.edu/~kperdue/MusicalDSP.html
http://www.ee.columbia.edu/~dpwe/resources/matlab/
http://scitation.aip.org/content/asa/journal/poma/18/1/10.1121/1.4794857

http://www.mathworks.com/help/matlab/import_export/supported

http://www.montgomerycollege.edu/Departments/StudentJournal/Automatic.pdf
http://auditoryneuroscience.com/topics/missing-fundamental
https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox
http://en.wikipedia.org/wiki/Missing_fundamental
http://en.wikipedia.org/wiki/Pitch_detection_algorithm
http://www.ee.columbia.edu/~dpwe/papers/KarjT99-pitch.pdf

48

http://en.wikipedia.org/wiki/Pitch_%28music%29#Pitch_and_frequency
http://www.researchgate.net/publication/228854783_Pitch_detection_algorithm_autocorrelation_metho
d_and_AMDF/links/0deec52c640858b90b000000.
http://www.cse.cuhk.edu.hk/~khwong/www2/cmsc5707/5707_4_pitch.ppt

Vous aimerez peut-être aussi