0 évaluation0% ont trouvé ce document utile (0 vote)
12 vues5 pages
In this paper we classified the audio systems using feedforward neural networ ks to measure the suitability for accuracy in classification and time taken to classify. Here accuracy of above 99% is reported. The classiIication oI audio. At Iirst step. Requires the extraction oI certain Ieatures related to the input sound sample. Which may include root-mean-square amplitude envelope. Constant Q transIorm Irequency spectrum. Cepstral co
Description originale:
Titre original
14.IJAEST Vol No 7 Issue No 1 Optimization of Feed Forward Neural Network for Audio Classification Systems 098 102
In this paper we classified the audio systems using feedforward neural networ ks to measure the suitability for accuracy in classification and time taken to classify. Here accuracy of above 99% is reported. The classiIication oI audio. At Iirst step. Requires the extraction oI certain Ieatures related to the input sound sample. Which may include root-mean-square amplitude envelope. Constant Q transIorm Irequency spectrum. Cepstral co
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
In this paper we classified the audio systems using feedforward neural networ ks to measure the suitability for accuracy in classification and time taken to classify. Here accuracy of above 99% is reported. The classiIication oI audio. At Iirst step. Requires the extraction oI certain Ieatures related to the input sound sample. Which may include root-mean-square amplitude envelope. Constant Q transIorm Irequency spectrum. Cepstral co
Droits d'auteur :
Attribution Non-Commercial (BY-NC)
Formats disponibles
Téléchargez comme PDF, TXT ou lisez en ligne sur Scribd
Onkar Singh Department oI Electronics and Communication Engineering Rayat Institute oI Engineering & InIormation Technology SBS Nagar. India auilaonkarsingh(gmailmail.com Neeru Singla Department oI Electronics and Communication Engineering Rayat Institute oI Engineering & InIormation Technology SBS Nagar. India neerusingla99(gmail.com
Manish Dev Sharma Department oI Physics paniab university Chandigarh. India mds(pu.ac.in
Abstract In this paper we classified the audio systems using feedforward neural networ k to measure the suitability for accuracy in classification and time taken to classify. Here we have investigated and analyzed this system to optimize the neural networ ks as to what layers and numbers of neurons are most suitable to classify audio wave files. Here accuracy of above 99% is reported.
I. INTRODUCTION ClassiIication oI audio signals according to their content has been a maior concern in recent years. There have been many studies on audio content analysis. using diIIerent Ieatures and diIIerent methods. It is a well known Iact that audio signals are baseband. one-dimensional signals. General audio consists oI a wide range oI sound phenomena such as music. sound eIIects. environmental sounds. and speech and non-speech signals. The classiIication oI audio. at Iirst step. requires the extraction oI certain Ieatures related to the input sound sample. which may include root-mean-square amplitude envelope. constant Q transIorm Irequency spectrum. Multidimensional Analysis Scaling traiectories. cepstral coeIIicients. spectral centroid and presence oI vibrato. |1| There are two main approaches to this problem oI content based classiIication based on previous extracted Ieatures:|2| The Iirst which uses deterministic methods and the one that utilizes probabilistic techniques. There are many research eIIorts. high accuracy audio classiIication is only achieved Ior the simple cases such as speech/music discrimination. Previous works have presented a theoretic Iramework and application oI automatic audio content analysis using some perceptual Ieatures and audio classiIier based on simple Ieatures such as zero crossing rate and short time energy Ior radio broadcast. |3| Researchers have conducted many experiments with diIIerent classiIication models including GMM (Gaussian Mixture Model) |4|. BP-ANN (Back Propagation ArtiIicial Neural Network) and KNN (K-Nearest Neighbour)|5|. Many other works have been done to enhance audio classiIication algorithms such as pre-classiIication oI audio recordings into speech. silence. laughter and non-speech sounds. in order to segment discussion recordings in meetings. |6| The usage oI taxonomic structures also helps to enhance classiIication perIormance. Pitch tracking methods have also been introduced to discriminate audio recordings into more classes. such as songs. speeches over music. with a heuristic-based model. Figure 1 below is a block diagram oI the classiIication system. An audio Iile stored in WAV Iormat is passed to a Ieature extraction Iunction. The Ieature extraction Iunction calculates numerical Ieatures that characterize the sample. When training the system. this Ieature extraction process is perIormed on many diIIerent input WAV Iiles to create a matrix oI column Ieature vectors |7|. This matrix is then preprocessed to reduce the number oI inputs to the neural network and then sent to the neural network Ior training. AIter training. single column vectors can be Ied to the preprocessing block. which processes them in the same manner as the training vectors. and then classiIied by the neural network.
Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102 ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 98 I J A E S T
Figure1- Block diagram oI audio classiIication system
II. SYSTEM SETUP
This section describes the setup oI the digital audio classiIication system. This system is composed primarily oI the blocks above and was developed in the Matlab environment. Matlab code can be provided upon request.
Data Ior training and testing the system was taken Irom ten compact discs. The tracks on each oI these CDs were extracted and converted to WAV Iormat and then divided into segments. WAV Iiles are taken as a input Iiles. We can also take MP3 Iiles as input instead oI WAV Iiles is desired. The system presented in this paper can be easily converted to take MP3 Iiles as input by pre-appending an MP3 to WAV converter.
A. Feature extraction
Discriminative Ieatures will contribute a lot in audio classiIication task. In order to improve the accuracy oI classiIication and segmentation Ior audio sequence. it`s important to choose the Ieatures that can represent the temporal and spectral characteristics properly. In our system. we select the mel Irequency cepstral coeIIicients (MFCC). which was proved to be eIIective Ior speech and music discrimination |7 . B. Data preprocessing
The Ieature vectors returned by the Ieature extraction block were Iirst preprocessed beIore inputting them to the neural network. Two types oI preprocessing were perIormed. one to scale the data to Iall within the range oI 1 to 1 and one to reduce the length oI the input vector. The data was divided into three sets. one Ior training. one Ior validation. and one Ior testing. The preprocessing parameters were determined using the matrix containing all Ieature vectors used Ior training and validation. For testing. these same parameters were used to preprocess test Ieature vectors beIore passing them to the trained neural network. The Iirst preprocessing Iunction used was premnmx. which preprocesses the data so that the minimum and maximum oI each Ieature across all training and validation Ieature vectors is 1 and 1. Premnmx returns two parameters. minp and maxp. which were used with the Iunction tramnmx Ior preprocessing the test Ieature vectors. The second preprocessing Iunction used was prepca. which perIorms principle component analysis on the training and validation Ieature vectors. Principle component analysis is used to reduce the dimensionality oI the Ieature vectors Irom a length oI 124 to a length more manageable by the neural network. It does this by orthogonal zing the Ieatures across all Ieature vectors. ordering the Ieatures so that those with the most variation come Iirst. and then removing those that contribute least to the variation. Precpa was used with a value oI .001 so that only those Ieatures that contribute to 99.9 oI the variation were used. This procedure reduced the length oI the Ieature vectors by one halI. Precpa returns the matrix transMat. which is used with the Iunction trapca to perIorm the same principle component analysis procedure on the test Ieature vectors as perIormed on the training and validation Ieature vectors. This was done beIore passing the test Ieature vectors to the trained neural network.
C. Neural Network
A three-layer IeedIorward backpropagation neural network. shown in Figure. was used Ior classiIying the Ieature vectors |6|. By trial and error. an architecture consisting oI 20 adalines in the input layer. 10 adalines in the middle layer. and 3 adalines in the output layer was Iound to provide good perIormance. The transIer Iunction used Ior all adalines was a tangent sigmoid. tansig`. The Levenberg-Marquardt backpropagation algorithm. trainlm`. was used to train neural network.
III. III. EXPERIMENTAL RESULTS
This section will discuss the results oI training and testing the classiIication system.
Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102 ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 99 I J A E S T A. Classification
Here we made to build a classiIier that can predict the correct classiIication oI audio wav Iile based on Ieatures extracted Irom the wave Iile. Neural networks have proved themselves as proIicient classiIiers and are particularly well suited Ior addressing non-linear problems. Given the non- linear nature oI real world phenomena. like predicting success oI audio classiIication. neural networks is certainly a good candidate Ior solving the problem.|6| The parameters will act as inputs to a neural network and the prediction oI success oI classiIication will be the target. Given an input. which constitutes the measured values Ior the parameters oI the wav Iile the neural network is expected to identiIy iI the audio classiIication is correct or not. This is achieved by presenting previously recorded parameters oI wave Iile to a neural network and then tuning it to produce the desired target outputs. This process is called neural network training. The samples will be divided into three units:- 1. Training 2. Validation. 3. Test sets. The training set is used to teach the network. Training continues as long as the network continues improving on the validation set. The test set provides a completely independent measure oI network accuracy. The trained neural network will be tested with the testing samples. The network response will be compared against the desired target response to build the classiIication matrix which will provide a comprehensive picture oI a system perIormance. The next step was to create the neural network discussed above in the system setup section. The training Iunction used was the IeedIorward backpropagation algorithm. trainlm.` Using these parameters we can classiIy our audio Iiles which we have given in the inputs. The plots oI some audio Iiles classiIication is shown in Iigures. Figures show the perIormance and the epoch plot. AIter training. the system was then tested using the data set reserved Ior testing. BeIore passing the test Ieature vectors to the trained neural network. data preprocessing was perIormed using the saved parameters Irom the preprocessing oI the training data. Figure 3shoiws the plot between perIormance and epochs with 97.34 eIIiciency. The perIormance reached .0040275beIore a validation stop occurred. Figure 4.5.6.7 also show the plot between perIormance and epochs with eIIiciencies 95. 5. 97. 98.2 and 99.15 respectively. In Iigures 4.5.6.7 perIormance reached at .000785268. 5.51221e-005. .0032053. .00202006 respectively.
Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102 ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 100 I J A E S T
B. Final result
In the Iinal result we plotted the graph between accuracy and number oI neurons. Accuracy is the ratio oI correct detection over the all detection |5|.
Figure shows the graph between the percentage accuracy and number oI neurons. Here we noted that accuracy is above 99 at neuron 3 and 6. So here we analyzed that at neuron 3 we will get the maximum accuracy. This result is made to vary the number oI neurons means at which neurons our result is more accurate. So in our proiect we prooI that instead oI using a speciIic network we got the more accurate result to vary the number oI neurons.
Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102 ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 101 I J A E S T
IV. CONCLUSIONS AND FUTURE WORK
The classiIier we have built has provided excellent and robust discrimination among speech signals. We extracted the Ieatures Irom the audio content and built the Ieature vectors. then we applied the neural network to classiIy the audio. and we used the IeedIorward training procedure. we reported 99 accuracy. Only a particular and speciIic neuron is used to make the classiIication result more accurate. There are many interesting directions that can be explored in the Iuture. To achieve goal. we need to explore more audio Ieatures that can be used to characterize the audio system. The second direction is to improve the computational eIIiciency Ior neural network.
REFERENCES
|1| Yu Song; Wen-Hong Wang; Feng-Juan Guo;. International ConIerence on Feature extraction and classiIication Ior audio inIormation in news video." Wavelet Analysis and Pattern Recognition. 2009. ICWAPR 2009. . pp.43-46. 12-15 July 2009
|2| Mitra. V.; Wang. C.J.; International Joint ConIerence on. A Neural Network based Audio Content ClassiIication." Neural Networks. 2007. IJCNN 2007. . vol.23. no.8. pp.1494-1499. 12-17 Aug. 2007
|3| Xi Shao; Changsheng Xu; Kankanhalli. M.S.; . Proceedings oI the 2003 Joint ConIerence oI the Fourth International ConIerence on . "Applying neural network on the content-based audio classiIication." InIormation. Communications and Signal Processing. 2003 and the Fourth PaciIic Rim ConIerence on Multimedia. vol.3. no.4. pp. 1821- 1825 vol.3. 15-18 Dec. 2003
|4 Jae-Young Kim; Dong-Chul Park; International Joint ConIerence on. Application oI Bhattacharyya kernel-based Centroid Neural Network to the classiIication oI audio signals." Neural Networks. 2009. IJCNN 2009. . pp.1606-1610. 14-19 June 2009.
|5| Yu Song; Wen-Hong Wang; Feng-Juan Guo; . "Feature extraction and classiIication Ior audio inIormation in news video." Wavelet Analysis and Pattern Recognition. 2009.ICWAPR 2009. International ConIerence on.pp.43- 46.12-15July2009 .|6|Ballan. L.; Bazzica. A.; Bertini. M.; Del Bimbo. A.; Serra. G.; . "Deep networks Ior audio event classiIication in soccer videos." Multimedia and Expo. 2009. ICME 2009. IEEE International ConIerence on.. pp.474-477. June 28 2009-July 2009
|7| Xin He; Yingchun Shi; Fuming Peng; Xianzhong Zhou; . " Chinese ConIerence on . A Method Based on General Model and Rough Set Ior Audio ClassiIication."Pattern Recognition.2009.CCPR 2009. pp.1-5.4-6 Nov. 2009.
Onkar Singh* et al. / (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES Vol No. 7, Issue No. 1, 098 - 102 ISSN: 2230-7818 @ 2011 http://www.ijaest.iserp.org. All rights Reserved. Page 102 I J A E S T