Académique Documents
Professionnel Documents
Culture Documents
MATLAB
B.E (EL) PROJECT REPORT
PREPARED BY:
SYEDA KOMAL FATIMA (EL-096)
MASOOMA BATOOL (EL-117)
AIMEN NAZ ASLAM (EL-080)
PROJECT ADVISORS
MR. DANISH MAHMOOD KHAN
MR. SYED ABBAS ALI
(INTERNAL)
(EXTERNAL)
ACKNOWLEDGEMENT
First and foremost we owe our sincere gratitude to ALMIGHTY ALLAH for blessing and
helping us in every difficulty we faced, big and small ,and taking us out of every little and
huge trouble. It was only Him ,who showed us path when we couldnt find any. We are
immeasurably thankful to Him for providing us with all the massive support ,emotional &
spiritual, during our journey of this project .It was only due to His blessings ,what we thought
was impossible became possible. All praise and gratefulness to ALLAH ALMIGHTY .May He
make us follow right path always. (Ameen)
It is a mind teasing exertion to design a task of Final Year projects caliber. It requires an outof-the box vision and massive knowledge .We thank our external advisor Mr. Abbas Ali from
the bottom of our hearts , who proposed the idea of this project. We are thankful to him for
taking time out of his much busy schedule to guide and supervise us .The idea proposed by
him contained all the elements of innovation and learning that should be the part of a Final
Year Project.
We also deeply thank our internal advisor Mr. Danish Mehmood Khan, for his guidance on a
path which was new and unfamiliar to us, but under his supervision, eventually the path
remained no new to us.
DEDICATION
This report is dedicated to our parents for their support, encouragement, and to their
unconditional love. To see us successful and complete our work with flying colors, they bore
hardships. We also dedicate it to our respectable and honorable teachers who were with us
in this journey towards success.
ABSTRACT
We represent a system for facilitating music and entertainment industry by automating
the judging system in reality shows related to music. In this system, matching is done
between the songs as sung by the candidates to the songs as sung by experts .The
system will identify the song as a whole and then will split the waveform into segments
for easier comparison. Each piece or segment of waveform is seen for musical feature.
The flaws are detected when the pieces of information are consolidated back into a
single waveform through comparison between musical information of both songs. The
comparison then produce the result in the form of percentage.
TABLE OF CONTENTS
ACKNOWLEDGMENT
DEDICATION..
ABSTRACT
TABLE OF FIGURES
LIST OF ABBREVIATIONS
1
2
3
6
7
CHAPTER 1...... 8
INTRODUCTION...... 8
1.1 BACKGROUND.. 8
1.2 OUR FOCUS. 8
1.3 CURRENT TREND 9
1.4 MOTIVATION AND NEED. 11
1.5 SCOPE 11
1.6 TOOLS 12
CHAPTER 2. 13
2.1 SOUND (AUDIO) PROCESSING.. 13
2.2 FORMAT OF AUDIO FILES. 17
2.3 WHAT IS MATLAB?......................................................................................... 18
2.3.1 MATLAB 18
2.3.2 GRAPHICS AND GRAPHICAL USER INTERFACE PROGRAMMING. 19
2.3.3 AUDIO PROCESSING IN MATLAB. 20
2.3.4 MATLAB AUDIO PROCESSING EXAMPLES. 20
2.3.5 AUDIO FORMATS SUPPORTED BY MATLAB 22
2.3.6 ADVANTAGES OF USING MATLAB FOR SIGNAL PROCESSING 22
2.4 MIR TOOLBOX 23
CHAPTER 3.
PROCESSES INVOLVED
3.1 INTRODUCTION.
3.2 AUDIO FILES.
3.3 SEGMENTATION
3.4 FEATURE EXTRACTION.
3.5 MATCHING
3.6 RESULT.
24
24
24
25
25
25
26
26
CHAPTER 4. 27
METHODOLOGY. 27
4.1 LOADING OF AN AUDIO FILE.... 27
28
29
31
32
35
37
38
39
CHAPTER 5.. 40
RESULTS AND DISCUSSIONS 40
CHAPTER 6. 46
CONCLUSIONS AND FURTHER ENHANCEMENT.. 46
REFERENCES............................................................................................................ 47
TABLE OF FIGURES
CHAPTER 2
2.1 EXAMPLE OF FFT 1 . 15
2.2 EXAMPLE OF FFT 2 . 16
2.3 TABLE OF AUDIO FORMATS SUPPORTED BA MATLAB.. 22
CHAPTER 3
3.1 FLOW DIAGRAM OF PROJECT. 24
CHAPTER 4
4.1 AUDIO WAVEFORM GENERATED BY miraudio COMMAND..
4.2 SEGMENTED AUDIO WAVEFORM
4.3 PITCH EXTRACTION STEPS.
4.4 USING AUTO CORRELATION TO FIND PITCH
4.5 ENVELOP AUTO CORRELATION
4.6 SUMMATION OF ALL AUTO CORRELATED SIGNALS .
4.7 PEAK PICKING OF A SIGNAL.
4.8 PITCH WAVEFORM OF A SIGNAL.
4.9 FILTER BANK METHOD
27
29
30
32
34
36
37
38
41
CHAPTER 5
5.1 AUDIO WAVEFORM OF alvida..
5.2 AUDIO WAVEFORM OF alvida-sample
5.3 SEGMENTED WAVEFORM OF alvida.
5.4 SEGMENTED AUDIO WAVEFORM OF alvida-sample
5.5 PITCH GRAPH OF alvida.
5.6 PITCH GRAPH OF alvida-sample..
41
41
42
42
43
43
LIST OF ABBREVIATIONS
HCDF
RMS
MIR
MFCC
MIDI
CHAPTER 1
INTRODUCTION
1.1 BACKGROUND
As years are passing by ,all of us are noticing great popularity being received by newly
emerging singers in the field of entertainment. Owing to the immense number of TV shows
that are coming up, this business has become one of the prominent industries across the
world in the sector of entertainment. Due to the fact that more and more people now spend
a great part of their day following such reality shows to relax and to get relieve from their
everyday stress, such entertainment has now become vital part of their daily lives. It has now
become a need of time to automate this system as it is global now. First of all we need to
understand the main issue of the system due to absence of automation in these reality
shows. This leads the judges towards biasing. The deserving candidate needs to win .Since
now a great part of world audiences are following the judgment made in these programs and
assess it , this system would be much needed and appreciated.
musical tone as a whole and then will segment the waveform into parts for easier
comparison. Each piece of music is seen for several structures .The defects are detected
when the sections of information are combined back in a single waveform. This single
waveform is matched with the music information that is embedded within the software as
sung by the experts. The comparison then produce the result.
10
11
1.5 SCOPE
It will be helpful for the music industry which has rapid growth now a days as well as for
business and academics .Taking a broader point of view ,in foreign countries, music is taken
as a subject so this work can be very useful on international basis. It can also be used for
Music Information Retrieval which have several applications such as Recommender systems,
12
1.6 TOOLS
MATLAB
13
CHAPTER 2
2.1 SOUND (AUDIO) PROCESSING
Audio signal processing: the field of audio processing was developed to encounter the
problems associated when dealing with audio, the use of which is nowadays very common. It
is actually the manipulation of sound or auditory signals to meet our desired effect. Signal
processing may occur in analog as well as in digital domain .While analog signals work on
actual signals , digital signals works on binary representation of that signal.
However human ear do not differentiate between an analog or digitally processed signals
but experiments and logics show that control of audio signal is better achieved when dealing
with digital information having discrete packets.
Audio processing is often used for the enhancement of particular signals as may be wanted
before they are actually transmitted. Audio signal processing is popular all over the world
these days, for there is a vast range of techniques available here for the manipulation of the
audio signals. Audio processing is also a requirement for altering the sound characteristics
such as pitch, timbre, jitter etc, in all these we need variation of sound system, for example,
lowering or enhancement of sound that is required to meet our needs and at the same time
elimination of noise bursts that might occur.
14
When we send a signal it is changed either to one of the two forms either analog or digital,
then when it reaches the destination it is further made compatible for that particular device,
by conversion.
Digital Signal Processing for Music: Musical tones can be differentiated from each other on
the basis of volume, timbre and pitch. Volume has the unit called decibels and it is defined as
the power or amplitude of the corresponding wave. Frequency is measured in the unit
hertz(Hz),and shows the intensity of the tone. The typical piano has notes between 28 and
4,000 Hz. The third distinguishing feature, timbre, is used to differentiate between
instruments as all instruments shows combination of sine waves so timbre provides good
information for differentiating when they play the same note.
Sampling: The sound that we receive or want to send needs to be sampled and quantized. To
store sound wave created by human voice (or a musical instrument) in a computer, it needs
to be discretized. This is done through sampling and quantizing respectively. For example, if
the signal has highest frequency represented as F Hz, the sampling frequency(Fs) needs to
be 2F ,according to Nyquist-Shannon Theorem. This Sampling theorem has to be observed if
least distortion is required.
15
Frequency and Fourier Transforms: For splitting up a signal, which is a musical tone here,
into constituent sinusoids, Fourier transform is used. The type of Fourier transform used for
discrete signals is DFT, discrete Fourier transform and FFT.As this system deals with pitches ,
frequency is essential.
16
The fundamental frequency and its harmonics are easily detectable here. Lowest vibration
depicts the lowest frequency present in the sound. The harmonics that are multiples of the
fundamental are shown through peaks in the graph above.
17
Wav- it is best used in Windows PCs, they have the ability to store large files of
music.
mp3 this format is the most famous for music purpose, and for downloading
and storing music. The quality of audio is kept same by removal of such
components of music that are merely audible. Mp3 format is useful where
storage size is needed to be kept small.
au it is one of the standard audio file format that is best used with Sun, Unix and
Java.
ogg
dct
flac
aiff
vox
raw
wma it the most common ,Windows Media Audio format owned by Microsoft.
18
aac
ram
dss
msv
dvf
mid
ape
19
Toolbox. They were Chroma Toolbox, MIDI Toolbox etc. Other features are SIMULINK, signal
processing toolbox etc.
MATLAB do all the complex tasks together through its abilities and provide compact and easy
solutions.
Today, MATLAB is being utilized by thousands of users all around the world for commercial
as well as for private use.
Some of the many fields in which MATLAB shows its expertise are given below:
Audio processing.
Image processing.
Video processing.
Control systems.
Signal communication.
20
Matlab can be used to extract and perform manipulations on discrete signals as used
in digital processing.
Most used functions: size, abs, sum, plot, axis, stem, fft, ifft, grid etc
The help function in MATLAB is of immense help .Even if someone who is not fully
aware of MATLAB commands may go through those help commands to get aware of
any of the features requiring for manipulation of audio.
Through MATLAB we are now able to handle real time processing implementation,
the result of Matlab in this regard is above satisfactory.
21
musical instruments. Same musical instrument, wherever in the world they are present will
have same frequencies for same notes, or even if different instrument is being used ,it can
easily be related to other instruments, but a song sung by two human beings can never have
same frequency at similar notes, therefore our system is unique with respect to the projects
described below as it is based on the matching of music created by human voice.
Matching music through a beat tracker Through chroma features, beats can be
defined, through this, songs may be matched which have similar harmonic content
although the instruments and temporal order used might differ.
22
TABLE 2.2
Vast and diverse library with immense help of built-in functions is available.
It blocks the processing of very large audio files to avoid memory issues.
23
24
CHAPTER 3
PROCESSES INVOLVED
3.1 INTRODUCTION
The basic flow of the project is as following:
Segmentation
Feature Extraction
Matching
Result
Figure3.1: Flow Diagram Of Project
25
3.3 SEGMENTATION
The loaded audio files are then segmented i:e to say the whole audio files are divided into
smaller chunks separately. The segmentation can be done manually on the basis of temporal
positions or it can be automated using different methods.
26
3.5 MATCHING
The extracted pitches for segmented audio files are represented in the form of graphs.
Therefore there are two graphs ,One representing the pitches of original audio file and other
representing the pitches for sample audio file. For matching purpose ,the slopes are found
out for both the graphs and the individual slopes are matched in form of percentage.
3.6 RESULT
After slopes are individually matched, all the resultant percentages are averaged and the
final result is found out. Final result shows the matching of both the audio files , original and
sample , in form of a percentage.
27
CHAPTER 4
METHODOLOGY
4.1 LOADING OF AN AUDIO FILE
Audio files are loaded with the help of command miraudio (audio-name).This command
loads the audio file named as whatever written inside the inverted commas. The audio file
must be of WAV format or of AU format. In this project all audio files used are in WAV
format. Miraudio loads the audio file, transform it and display it in form of waveform. The
audio waveform can be resampled according to our requirement by using this syntax:
miraudio(mysong, Sampling, sampling-rate).The desired rate is written in place of
sampling-rate, it should be numeric in nature. By default the sampling rate is kept 44100
Hz.
28
Manual Segmentation:
If segmentation is manual and done on temporal position, a row matrix is provided in
the place of segmentation-method .Suppose v is the row matrix having N columns
then each column indicates the segmentation point .The command will look like this
mirsegment(audio-file, v) .
Automated Segmentation:
Automated segmentation methods can also be used. Methods supported by
MIRtoolbox are:
1. HCDF
2. Novelty
3. RMS
If any one of the methods are being used then mirsegment command will appear as
follows:
Mirsegment(audio-file,Novelty)
29
30
To extract pitch first each segment is subjected to filter bank which decomposes the segment
into audio signal of two frequency channels. Now auto-correlation theorem is applied on the
channels and they are summed back. Peaks of resultant waveform is then determined. After
peaks have been determined from the resultant waveform of each segment, pitch is
extracted and peaks of final waveform is found out to represent resultant pitches.
31
32
4.3.2 AUTO-CORRELATION
Speech processing is a challenging task due to the complexity of human voice. Pitch is one of
the characteristics which can define human speech and in our case musical sound as sung by
human.
Pitch of a tone is perceived by the brain by its fundamental frequency as well as its
periodicity. Even if fundamental frequency is missing from a sound signal , same melody is
perceived ,this is because of same periodicity. Therefore, we can say that pitch detection
depends upon the periodic qualities of the sound waveform.
Auto-correlation is a method which is not based on the amplitude characteristic of a sound
wave signal but rather it is based on the periodic characteristic of the signal. The autocorrelation function basically transforms the signal to display the structure of waveform.
33
Therefore it can be used for pitch-detection on the basis of same principle. Let suppose x(m)
is a signal which is exactly periodic with period P, we can write it as follows:
x(m)=x(m+P)
(1)
Equation 1 tells that auto-correlation of periodic signal will also be periodic with the same
periodicity (P).
General method of finding pitch through auto-correlation method is to first decompose the
sound signal into smaller frames, then overlap a frame over itself and shift it with respect to
time axis. And in the last multiply all the signals together. If any of the two frames are similar
or almost similar, the result of auto-correlation will show distinctive peak. The difference
between the lag of maximum peak and the second maximum peak is defined as fundamental
period of the signal. By dividing this difference from the sampling frequency, pitch is
obtained.
34
In MIRtoolbox the function used for finding autocorrelation of the signal is mirautocor. Each
segment passed from the filter bank has been decomposed into 2 channels. Now autocorrelation is applied on these channels separately.
mirautocor(..., Generalized, k) performs the auto-correlation in frequency domain .It
includes compression of magnitude of spectral representation. Autocorrelation using
Discrete Fourier Transform is expressed as :
y = IDFT(|DFT(x)|^2),
and more generally as,
y = IDFT(|DFT(x)|^k),
35
It is recommended to compress the auto-correlation, k <2 ,to decrease the width of peaks in
the result of auto-correlation, but this compression may increase the signals sensitivity to
noise. According to the study of Tolonen & Karjalainen, (2000), using value of k=0.67 a good
compromise can be achieved.
Due to harmonics, in autocorrelation function, peaks other than at the lag corresponding to a
particular periodicity will be shown which are actually the multiples of that particular
periodicity. For avoiding this redundancy and automatically remove these harmonics
mirautocor(..., Enhanced, b) is used. Where value of b is by default 2 to 10,this function first
half-wave rectify the original auto-correlation function and time scale it by the factor b or a
factor list b, and thereafter subtract it from originally clipped function.
4.3.3 SUMMING
Till this step, all segments have been passed through filter bank ,through which envelope has
been extracted and low frequency part and high frequency are separated and on both of
them auto-correlation function has been applied. Now to sum these auto-correlated
enveloped channel signals mirsum command is used.
For example , autocorrelation has been applied to envelopes of the channels using
mirautocor function:
ac=mirautocor(e)
36
37
38
39
Slope=
First the individual slopes of each segment of both songs are found at same temporal
positions.
The slope of first song at a particular temporal position is matched with the slope of second
song at the same temporal position, and the result is stored in form of a percentage in an
array. Now it is found out how many of the slopes are matched above 90% , how many
slopes matched in between 90% and 80% and so on till 0%.These results are again stored in
an array and averaged to give final result.
40
CHAPTER 5
RESULTS AND DISCUSSION
Here results are observed by observing two audio files. The audio file named alvida contains
the song sung by an expert and the audio file named alvida-sample contains the song sung
by a contestant.
Both of these files are uploaded separately , and the whole process of pitch extraction is
applied individually.
The following are the waveform graphs obtained of both the audio files :
First using miraudio(alvida) and miraudio(alvida-sample) to load both of the files.
41
42
Now audio waveform of both songs are segmented using mirsegment function.
43
And, finally pitches for both the songs are found out using mirpitch function.
44
Now slopes are to be matched for which an algorithm has been designed according to the
principle as discussed in chapter 4
The individual matching of slopes came out to be as follows:
The number of slopes matching in the range of 90-100 % ,80-90% and so on till 10% and
above are determined. The slopes which matched below 10% have been ignored for
simplification
purpose as these slopes were degrading the result .21 slopes out of 55 were matched 90% or
above, but when they were being averaged along with the below 10% matching slopes, the
result was not as accurate to human perception as it was after ignoring these slopes. Also
when the sample was listened and judged by a musical expert, the result given by him was
nearer to the result obtained by ignoring slopes matched below 10%. 21 slopes are in above
90% range,3 in 80-70%,4 in 70-60%,2 in 60-50%,0 in 50-40%,2 in 40-30%,3 in 40-30%,4 in 3020% and 4 in 20-10% ranges.
45
Now these numbers are multiplied by their respective ranges and averaged to find out final
answer.
Finally ,the final percentage is found to be as follows:
The final answer shows that there is approximately 66 % match in song sung by expert and
song as sung by a contestant. In this system threshold to get qualified is kept above 45-55%
match. If the result is in between 45-55%,then the contestant is considerable ,if result is
below this range then contestant is disqualified ,and if result is above this range then
contestant can be considered qualified. This range is set by observing many samples, and
comparing their results with their actual human ear perception i:e how good and matched
sample sounds to human ears.
46
CHAPTER 6
CONCLUSION AND FUTURE
ENHANCEMENT
In conclusion it can be said that this unique system provides a basic structure to facilitate
judging in reality shows. But there is still a lot of room for improvement in this system to
make it more accurate. This system is based on only one feature of musical sound signals
whereas, there are multiple of features including rhythm, tonality, jitter etc which play an
important part to enhance or to degrade the quality of a song. Therefore future work can be
done through including more features in this system. Also the pitch extraction in our system
is done through auto-correlation method which is the basic method among all the other
methods but has its limitations, therefore other methods can also be used and tested against
the method used in this system. But nevertheless this system provides the backbone to all
the future enhancements that can be done on this system and automate the judging in ever
growing area of entertainment i:e music.
47
REFERENCES
http://en.wikipedia.org/wiki/MATLAB
http://www.wisegeek.com/what-is-audio-processing.htm
http://en.wikipedia.org/wiki/Audio_signal_processing
http://www.cs.hmc.edu/~kperdue/MusicalDSP.html
http://www.ee.columbia.edu/~dpwe/resources/matlab/
http://scitation.aip.org/content/asa/journal/poma/18/1/10.1121/1.4794857
http://www.mathworks.com/help/matlab/import_export/supported
http://www.montgomerycollege.edu/Departments/StudentJournal/Automatic.pdf
http://auditoryneuroscience.com/topics/missing-fundamental
https://www.jyu.fi/hum/laitokset/musiikki/en/research/coe/materials/mirtoolbox
http://en.wikipedia.org/wiki/Missing_fundamental
http://en.wikipedia.org/wiki/Pitch_detection_algorithm
http://www.ee.columbia.edu/~dpwe/papers/KarjT99-pitch.pdf
48
http://en.wikipedia.org/wiki/Pitch_%28music%29#Pitch_and_frequency
http://www.researchgate.net/publication/228854783_Pitch_detection_algorithm_autocorrelation_metho
d_and_AMDF/links/0deec52c640858b90b000000.
http://www.cse.cuhk.edu.hk/~khwong/www2/cmsc5707/5707_4_pitch.ppt