Académique Documents
Professionnel Documents
Culture Documents
Neural Network
1st Amit Kishor Raturi 2nd Dhruv Verma
Dept. Of Computer Science And Engineering Dept. Of Computer Science And Engineering
NIT Uttarakhand NIT Uttarakhand
Srinagar Garhwal, India Srinagar Garhwal, India
Amitkishorraturi.cse16@nituk.ac.in Dhruv.cse16@nituk.ac.in
. h3 h3
0 x193 .. (2)
h4 ..
. .
(1)
h280 .. (3)
h250
−1
.
(2)
h300
−4 −2 0 2 4
x
Fig. 3. Proposed Architecture of classifier for urban sound
(b) Hyperbolic tangent activation function. classification
R EFERENCES
[1] S. Chu, S. Narayanan, and C.-C. Kuo, Environmental sound
recognition with time-frequency audio features, IEEE Trans.
on Audio, Speech, and Language Processing, vol. 17, no. 6,
pp. 11421158, Aug. 2009.
[2] R. Radhakrishnan, A. Divakaran, and P. Smaragdis, Audio
analysis for surveillance applications, in IEEE Worksh. on
Apps. of Signal Processing to Audio and Acoustics (WAS-
Fig. 5. Confusion Matrix for urban sound classifier PAA05), New Paltz, NY, USA, Oct. 2005, pp. 158161.
[3] C. Mydlarz, J. Salamon, and J. P. Bello, The implementation
of low cost urban acoustic monitoring devices, Applied
Acoustics, vol.InPress, 2016.
iteration_vs_cost_function.png [4] J.Pineau,M.Montemerlo,M.Pollack,N.Roy,andS.Thrun,Towards
robotic assistants in nursing homes: Challenges and results,
Special Iss. Socially Interactive Robots, Robot., Autonomous
Syst., vol. 42, no. 34, pp. 271281, 2003.
Fig. 6. Cost Function vs Iterations
[5] S. Thrun, M. Bennewitz, W. Burgard, A. B. Cremers, F.
Dellaert, D. Fox, D. Haehnel, C. Rosenberg, N. Roy, J.
Schulte, and D. Schulz, Minerva: A second generation mobile
V. C ONCLUSION tour-guide robot, in Proc. ICRA, 1999.
[6] H.A.Yanco,A robotic wheel chair system: Indoor navigation
The goal of this paper was to evaluate whether and user interface, in Lecture Notes in Articial Intelligence:
Assistive Technology and Articial Intelligence. NewYork:
fully connected neural networks can be successfully Springer-Verlag,1998, pp. 256268.
applied to environmental sound classification tasks, [7] A. Fod, A. Howard, and M. J. Mataric, Laser-based people
especially considering the limited nature of datasets tracking, in Proc. ICRA, 2002.
[8] S. Chu, S. Narayanan, C.-C. J. Kuo, and M. J. Mataric ,
available in this field. It seems that they are in- Where am I? Scene recognition for mobile robots using audio
deed a viable solution to this problem. Conducted features, in Proc. ICME, 2006.
experiments show that a fully connected neural net- [9] J. Huang, Spatial auditory processing for a hearing robot, in
Proc. ICME, 2002.
work model compete with other comman approaches [10] A. Waibel,H. Steusloff,andR. Stiefelha-
like applying convolution nerual network where the gen,ChilComputersinthe humaninteraction-
scarcity of data can be a major issue. We are able loop,inProc.WIAMIS,2004,andtheCHILProject Consortium.
[11] D. P. W. Ellis and K. Lee, Minimal-impact audio-based
to produce an accuracy of 78%. Although, taking personal archives, in Proc. CARPE, 2004.
into consideration much longer training times, the [12] J. Mantyjarvi, P. Huuskonen, and J. Himberg, Collaborative
result is far from ground- breaking, it shows that fully context determination to support mobile terminal applica-
tions, IEEE Trans. Wireless Communications, vol. 9, no. 5,
connected neural networks can be effectively applied pp. 3945, Oct. 2002.
in environmental sound classification tasks even with [13] D. P. W. Ellis, Prediction-driven computational auditory
limited datasets. What is more, it is quite likely that scene analysis, Ph.D. dissertation, Dept. of Elect. Eng. and
Comput. Sci., Mass. Inst. of Technol., Cambridge, MA, Jun.
a considerable increase in the size of the available 1996.
dataset would vastly improve the performance of [14] J.-J. Aucouturier, B. Defreville, and F. Pachet, The bag-of-
trained models, as the gap to human accuracy is frames approach to audio pattern recognition: A sufcient
model for urban soundscapes but not for polyphonicmusic, J.
still profound. One of the possible questions open Acoust.Soc. Amer., vol. 122, no. 2, pp. 881891, Aug. 2007.
for future inquiry is whether fully connected neural [15] R. Cai, L. Lu, A. Hanjalic, H. Zhang, and L.-H. Cai, A exible
networks could outperform the convolution nerual framework for key audio effects detection and auditory con-
text inference, IEEE Trans. Audio, Speech, Lang. Process.,
network approaches. vol. 14, no. 3, pp. 10261039, May 2006.
[16] R. Radhakrishnan, A. Divakaran, and P. Smaragdis, Audio
ACKNOWLEDGMENT analysis for surveillance applications, in Proc. IEEE Work-
shop Applicat. Signal Process. Audio Acoust., 2005, pp.
We are highy greatful to Dr. Maroti Deshmukh , 158161.
Assistant Professor ( Dept. Of Computer Science & [17] P. Cano, M. Koppenberger, S. Le Groux, J. Ricard, N. Wack,
and P. Herrera, Nearest-neighbor generic sound classication
Engineering , NIT Uttarakhand) for their guidence with a wordnet-based taxonomy, in Proc. 116th AES Conv.,
during the course of this research project. 2004.
[18] P. Herrera, A. Yeterian, and F. Gouyon, Automatic classi-
cation of drum sounds: A comparison of feature selection
methods and classication techniques, in Proc. ICMAI, 2002.
[19] G. Tzanetakis and P. Cook, Musical genre classication of
audio signals, IEEE Trans. Speech Audio Process., vol. 10,
no. 5, pp. 293302, Jul. 2002.
[20] M. J. Carey, E. S. Parris, and H. Lloyd-Thomas, A compar-
ison of features for speech, music discrimination, in Proc.
ICASSP, 1999.
[21] K. El-Maleh, M. Klein, G. Petrucci, and P. Kabal,
Speech/music discrimination for multimedia applications, in
Proc. ICASSP, 2000, pp. 149152.
[22] T.ZhangandC.-C.J.Kuo, Audio content analysis for online
audio visual data segmentation and classication, IEEE Trans.
Speech Audio Process., vol. 9, no. 4, pp. 441457, May 2001.
[23] A. Eronen, V. Peltonen, J. Tuomi, A. Klapuri, S. Fagerlund,
T. Sorsa, G. Lorho, and J. Huopaniemi, Audio-based context
recognition, IEEE Trans. Audio, Speech, Lang. Process., vol.
14, no. 1, pp. 321329, Jan. 2006.
[24] V. Peltonen, Computational auditory scene recognition, M.S.
thesis, Tampere Univ. of Technol., Tampere, Finland, 2001.
[25] B. Clarkson, N. Sawhney, and A. Pentland, Auditory con-
text awareness via wearable computing, in Proc. Workshop
Perceptual User Interfaces, 1998.
[26] R. G. Malkin and A. Waibel, Classifying user environment
for mobile applications using linear auto encoding of ambient
audio,in Proc. ICASSP, 2005
[27] S. P. Ebenezer, A. Papandreou-Suppappola, and S. B. Sup-
pappola, Classication of acoustic emissions using modied
matching pursuit, EURASIP J. Appl. Signal Process, pp.
347357, 2004.
[28] Karol J. Piczak, Environmental sound classication with
Convolutional Neural Networks, IEEE INTERNATIONAL
WORKSHOP ON MACHINE LEARNING FOR SIGNAL
PROCESSING, SEPT. 1720, 2015.
[29] Justin Salamon and Juan Pablo Bello, Deep Convolutional
Neural Networks and Data Augmentation for Environmen-
tal Sound, IEEE SIGNAL PROCESSING LETTERS, AC-
CEPTED NOVEMBER 2016.