Vous êtes sur la page 1sur 4

Audio Visual Emotion Based Music Player

Prof. Minal P.Nerkar, Surykant Wankhade, Neha Chhajed


nerkar.minal@gmail.com, surykantw10@2gmail.com, Nehachhajed12317@gmail.com
Computer Department AISSMS IOIT,
Pune

ABSTRACT between humans and computers which in-turn


This project puts forth a framework for real time helps to increase use of computers in day-to-day
face and voice recognition and related emotion work. As computers have become an important
detection system based on facial features, their asset of our speedy life, need for meaningful and
actions and intensity of voice. The key elements of easy communication between computers and
Face are considered for prediction of face humans has also increased. In addition to this,
emotions of the user. The difference in each facial speech recognition is also successfully established
feature which are in-variant to scaling as well as
area of research, but its main limitation is that it
rotation are used to determine the different
emotions of face. Machine learning algorithms are cannot respond appropriately to emotions of
used for recognition and classification of different different people. To overcome this drawback,
classes of face emotions by providing training of many computers are developed that are capable to
different set of images. In this context, by detect, understand and reply to multiple emotional
implementing herein algorithms would contribute states of various people similar to how human
in several areas of identification and many real- being does. Hence, this audio-emotion recognition
world problems. The proposed algorithm is
(AER) is a latest field of study which is providing
implemented using open source computer vision
(OpenCV). a great advancement in the field of Human-
Computer Interaction (HCI).
Keywords
Open Source Computer Vision (OpenCV); object
recognition; Sound Meter;
2. PREVIOUS WORKS
1. INTRODUCTION Most works on machine recognition of human
When machine analysis was commenced on emotions from facial expressions or emotional
human cases, it noticed great progress. Facial speech use only a single modality, either speech or
expression and speech are two main aspects of video, as the data. Relatively little work has been
done in combining the two modalities for the
human emotional expressions, since people mostly
machine to recognize human emotions. In this
rely on facial expression and speech to understand paper, we investigate the integration of both the
some one’s present state. In recent years, visual and acoustic information for machine to
recognizing human emotions from facial recognize apparent emotions which may or may
expression and speech i.e., audio visual emotion not be the true emotion of the human user. We
recognition has increased, attracting extensive assume the user is willingly showing his /her
attention towards artificial intelligence. As Audio emotions through the facial expression and in the
speech as a means to communicate.
Visual emotion recognition is motivated for There are some systems which require manual
establishing a reliable and less complex selection of current emotion from list of
relationship between humans and computers, this predefined emotions .Websites like stereomood, it
increases its importance to human computer lacks capabilities in the sense that the user needs to
interaction (HCI). This research mainly focuses to type in what he is feeling, rather than using
ease human work by increasing interaction computer vision to determine his emotion.

Volume 3 Issue 2 April - 2018 121


Similarly an android application named pindrop Fig 1.Basic steps
also provides predefined emotion and users require
Basic Steps for web module
to select one the available emotions. These
applications are dynamically updatable to latest A. Image Acquisition
songs but lacks in determining exact emotions of In any of the image processing techniques,
users using computer vision. To solve the problem the first task is to acquire the image from the
of emotion recognition a lot of work has been done source. These images can be acquired either
in the past. To extract and determine the emotion through camera or through standard data sets
of a user, we need to extract features from an that are available online. The images should
image and use them against a trained data set to be in .jpg format. We have used our own data
classify the input and determine the emotion set for real time emotion detection

3. PROPOSED SYSTEM

 System architecture
The system architecture for the proposed system is
given in fig 3.1 [1]. The input image is loaded into
the system in .jpg format. Then each image
undergoes preprocessing i.e. removal of unwanted
information like background color, illumination B. Pre-processing
and resizing of the images. Scale Invariant Feature Detection (SIFT)
Then the required features are extracted from the algorithm is used for Pre-processing which is
image and stored as useful information. These mainly done to eliminate the unwanted
features are later added to the classifier where the information from the image acquired and fix
expression is recognized with the help Of Scale some values for it, so that the value remains
Invariant Feature Detection (SIFT) algorithm. same throughout. In the pre-processing phase,
Minimum the value of the distance calculated, the the images are converted from RGB to Gray-
nearest the match will be found. Finally, a music scale and are resized to 256*256 pixels. The
track will be played based on the emotion detected images considered are in .jpg format, any
of the user other formats will not be considered for
further processing. During pre-processing,
eyes, eyebrow and mouth are considered to
be the region of interest.

C. Facial Feature Extraction


After pre-processing, the next step in Scale
Invariant Feature Detection (SIFT) algorithm is
feature extraction. The extracted facial features are
stored as the useful information.

The following facial features can be considered


“Mouth, forehead, eyes complexion of skin, cheek
and chin dimple, eyebrows, nose and wrinkles on

Volume 3 Issue 2 April - 2018 122


the face”. In this work, eyes, eyebrow, mouth are
considered for feature extraction purpose for the
reason that these depict the most appealing
expressions. With the mouth being opened or
eyebrow being raised one can easily recognize that
the person is either surprised or is fearful. But with
a person’s complexion it can never be depicted.
Basic steps for android module:

A. Detection
Detection of sound is done using sound meter in
android phone

5. ACKNOWLEDGMENTS

We would like to express gratitude towards our


project guide Prof. Minal P. Nerkar for her expert
advice and encouragement throughout this difficult
project, as well as project coordinator Dr. K. S.
. Wagh and Head of Department Prof. S. N.
Zaware. Without their continuous support and
B. Classification encouragement this project might not have been
k-nearest neighbor algorithm is also a possible.
classification algorithm but it is used to classify
data using training examples. Regardless of the
fact that the target class is multi-modal, the 6. REFERENCES
algorithm can in any case lead to great precision.
Major disadvantage of the KNN algorithm is that it [1] Robust Object Detection and Tracking Using
utilizes every feature similarly in computing a part Sift Algorithm Shweta Yakkali*, Vishakha Nara,
of processing for similitude. Accuracy of k-NN is Neelam Tikone, Darshan Ingle SIESGST, Mumbai
kept high in most of the cases. But as size of University Maharashtra, India
dataset increases we can see accuracy of both
system decreases. But here shows some result and [2]A New Signal Classification Technique by
we must say in overall accuracy KNN has more Means of Genetic Algorithms and kNN Daniel
effective work. But as size of dataset increases we
can see time consumed for predicting values of Rivero, Enrique Fernandez-Blanco, Julian Dorado,
KNN system increases. Alejandro Pazos Department of Information and
Communications Technologies University of A
Corua A Corua, Spain
4. TEST CASES
[3] Comparing Accuracy of K-Nearest-Neighbor
and Support-Vector-Machines for Age Estimation

[4]AlexandreAlahi, Raphael Ortiz, Pierre


Vandergheynst. FREAK: Fast Retina Keypoint. In

Volume 3 Issue 2 April - 2018 123


IEEE Conference on Computer Vision and Pattern
Recognition, 2012.

[5]D.G. Lowe, Object Recognition from Local


Scale-Invariant Features , Proc. Seventh Intl Conf.
Computer Vision, pp 1150-1157, 1999

[6]Expressions Invariant Face Recognition Using


SURF and Gabor Features Barun Kumar Bairagi
ARC India Pvt. Ltd. Kolkata, India .

[7]Z. Zeng and T.S. Huang are with the Beckman


Institute, University of Illinois at Urbana-
Champaign, 405 N. Mathews Ave., Urbana,
61801.

[8] Recognition System using Parallel Classifiers


and Audio Feature Analyzer, in 2011 3rd Int.
Conf. on Computational Intell, Modelling
Simulation, 2011, pp. 210-215.

[9] Review, European Journal for Scientific


Research, vol. 33, no. 3, 2009, pp. 480-501.

[10]Expression Analysis in Determining


Emotional Valence and Intensity with Benefit for
Human Space Flight Studies,5th IEEE
International Conference on E-Health and
Bioengineering - EHB, pp:1 - 4, November 2015

Volume 3 Issue 2 April - 2018 124

Vous aimerez peut-être aussi