Vous êtes sur la page 1sur 32

Introduction to Sentiment Analysis

Rajesh Piryani
D e p a r t me n t O f C o mp u t e r S c i e n c e
S o u t h As i a n U n i ve r s i t y, N e w D e l h i
What is Sentiment Analysis?
It is a natural language processing task that uses an algorithmic formulation to
categorize an opinionated text into either positive or negative sentiment classes (or
sometimes a neutral class equivalent to having no opinion polarity).
SA(Sentiment Analysis) is defined as a quintuple Example
<Oi; Fij; Ski jl; Hk; Tl > Oi = Samsung Mobile
Oi = targeted object Fij = Battery, Camera, Memory Card, Design, etc
Fij = feature of the object Ski jl = positive for six month, Negative after that
Ski jl = Sentiment polarity, Hk = Myself,
Hk = Opinion Holder k, Tl =When I purchased the Samsung mobile it was
good, but now after 6 months it gets heated in 4 to
Tl =Time when the opinion is expressed 5 minutes .

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 2


Why Sentiment Analysis
Mainly because of the Web; huge volumes of opinionated text
User-generated media: One can express opinions on anything in reviews, forums,
discussion groups, blogs
Opinions of global scale: No longer limited to:
Individuals: ones circle of friends
Businesses: Small scale surveys, tiny focus groups, etc.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 3


Example 1
I love this movie! It's sweet, but with satirical humor. The dialogue is great and the
adventure scenes are fun It manages to be whimsical and romantic while laughing at
the conventions of the fairy tale genre. I would recommend it to just about anyone.
I've seen it several times, and I'm always happy to see it again whenever I have a
friend who hasn't seen it yet.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 4


Example 2
My XYZ CAR was delivered yesterday. It looks fabulous. We went on a long
highway drive the very second day of getting the car. It was smooth, comfortable and
wonderful drive. Had a wonderful experience with family. Its an awesome car. I am
loving it..!

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 5


Classification of Sentence
Opinion without sentiment (Objectivity)
I believe the World is flat.
Sentences
Samsung Galaxy has resolution of 14 MP.
Sentiment always involve holders emotion or
desires (Subjectivity)
Objective Subjective
I think intervention in Libya will put US in a
difficult situation.
The US attack on Afghanistan is wrong.
Video Quality of iPhone is awesome. Positive Negative Neutral
iPhone6 is newest in the market.
Figure 1. Classification of Sentence

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 6


Levels of Sentiment Analysis
Levels of
Sentiment Analysis

Document Level Sentence Level Aspect Level

Figure 2. Level of Sentiment Analysis

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 7


Example 3
iPhone- User Review:
I bought an iPhone a few days ago. It was such a nice phone. The touch screen was
really cool. The voice quality was clear too. Although the battery life was not long, that
is ok for me. However, my mother was mad with me as I did not tell her before I
bought the phone. She also thought the phone was too expensive, and wanted me to
return it to the shop.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 8


Visual Comparison of Aspect based
Sentiment Analysis

Figure 3. Visual Comparison of Aspect Level based Sentiment Analysis

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 9


Approaches to perform Sentiment Analysis
Machine Learning Classifier Approach
Nave Bayes, Maximum Entropy, Support Vector Machine etc.
Unsupervised Semantic Orientation Approach
Semantic Orientation-Point-wise Mutual Information-Information Retrieval
Semi-supervised SentiWordNet based Approaches
SentiWordNet, SenticNet

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 10


ML Supervised Algorithm Block Diagram

Figure 4. Block diagram of ML Supervised Algorithm

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 11


Preprocessing of data for ML Algorithm

Punctuation
Stop word
Review/Text Tokenization marks Stemming
removal
removal

Figure 5. Steps for pre-processing of data

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 12


Preprocessing of data for ML Algorithm
Stop Words:
common words that have low discrimination power (e.g., the, is, and who)
usually filtered out before processing the text
Stemming
the purpose of stemming is to reduce different grammatical forms or word forms of a
word like its noun, adjective, verb, adverb etc
The goal of stemming is to reduce inflectional forms and sometimes derivationally related
forms of a word to a common base form
Example: "argue", "argued", "argues", "arguing", and "argus" reduce to the stem "argu"

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 13


Supervised Machine Learning
Input:
a document
A fixed set of classes = , , ,
A train set of m hand-labeled documents , , , ( , )

Output
A learned classifier, :

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 14


The bag of words representation

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 15


The bag of words representation

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 16


The bag of word representation:
using a subset of words

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 17


The bag of words representation

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 18


NB Machine Learning Approach
The probability of a document d being in class c is computed as
( |)

where, (|) is the conditional probability of a term occurring in a document of class .
The goal is to find the best class, i.e., Maximum A Posteriori Class as follows:
= ( |)

Which can be reframed as
= [ + ( |)]

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 19
NB Machine Learning Approach (Contd..)
() and (|) are maximum likelihood estimates based on training data and can be computed as:

=

Laplace (add-1) smoothing for Nave Bayes


+ +
= =


+ ( ) + ||
where, is total no. of docs,
is the no. of docs in the class .
is the number of occurrences of term in training docs from class .
||is the number of unique words in vocabulary

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 20


Example
S: I love this fun film.
Steps:
Assigning each word: ( | )
Assigning each sentence: (|) = (|)

Which class assigns the higher probability to s?

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 21


Example
S: I love this fun film.
Steps:
Assigning each word: ( | )
Assigning each sentence: (|) = (|)

Model Positive Model Negative


0.1 I 0.2 I S I love this fun film
0.1 love 0.001 love 0.1 0.1 0.01 0.05 0.1
0.01 this 0.01 this 0.2 0.001 0.01 0.005 0.1
0.05 fun 0.005 fun
0.1 film 0.1 film > (|)

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 22


Doc Words Class
1 Chinese Beijing Chinese c
2 Chinese Chinese Shanghai c
Example Training Document
3 Chinese Macao c
Formulas 4 Tokyo Japan Chinese j
Test Document 5 Chinese Chinese Chinese Tokyo Japan ?
=

Conditional Probability
, + Conditional Probabilities Conditional Probabilities
=
+ || 5+1 6 3 1+1 2
: ( ) = = = = =
8 + 6 14 7 3+6 9
: 1+1 2 1 0+1 1
, : = = = = =
8 + 6 14 7 3+6 9

For example: Prior = = 1+1 2 1 0+1 1
= = = = =
8 + 6 14 7 3+6 9
1+1 2 1 0+1 1
CHOOSING A CLASS = = = = =
8 + 6 14 7 3+6 9

0+1 1 1+1 2
. = = = =
8 + 6 14 3+6 9
0+1 1 1+1 2
= = = =
. 8 + 6 14 3+6 9

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 23


Algorithm

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 24


Performance Evaluation
Definition of some terminologies
:- A true positive ( ) decision assigns two similar documents to the same classes
:- a true negative ( ) decision assigns two dissimilar documents to different classes
:- A ( ) decision assigns two dissimilar documents to the same classes
:- A ( ) decision assigns two similar documents to different classes

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 25


Performance Evaluation
Accuracy (A)

=

Precision (P)
| |
=
| |

=
+

Recall (R)
| |
=
| |

=
+

F-measure(F)

=
+

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 26


Exercise
Doc Words Class
1 India Eden India Wicket Cricket
Training 2 India India Sachin Cricket
Document 3 Sachin India Eden Cricket
4 Japan Mesi India Football
Test Document 5 India Sachin India Japan Eden Wicket ?

Compute the Conditional Probability of each unique word and compute


the class of doc5?

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 27


Doc Words Class
1 India Eden India Wicket Cricket
Training 2 India India Sachin Cricket
Hint Document 3 Sachin India Eden Cricket
Formulas 4 Japan Mesi India Football

= Test Document 5 India Sachin India Japan Eden Wicket ?

Conditional Probability
Conditional Probabilities Conditional Probabilities
, +
=
+ || =? =?
: ( )
: =? =?
, :

For example: Prior = = =? =?

CHOOSING A CLASS =? =?

? =? =?

=? =?
?

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 28


References
1. Bing Liu. Sentiment analysis and subjectivity. In Handbook of Natural Language Processing, Second Edition.
Taylor and Francis Group, Boca, 2010.
2. Kushal Dave, Steve Lawrence, and David M. Pennock. Mining the peanut gallery: Opinion extraction and
semantic classification of product reviews. In Proceedings of the 12th International Conference on World Wide
Web, WWW 03, pages 519528, New York, NY, USA, 2003. ACM.
3. Soo-Min Kim and Eduard Hovy. Determining the sentiment of opinions. Proceedings of the20th international
conference on Computational Linguistics - COLING 04, 2004.
4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Proceedings of the ACL-02 conference on
Empirical methods in natural language processing - EMNLP 02, 2002.
5. Bo Pang and Lillian Lee. A sentimental education. Proceedings of the 42nd Annual Meeting on Association for
Computational Linguistics - ACL 04, 2004.
6. Bo Pang and Lillian Lee. Seeing stars. Proceedings of the 43rd Annual Meeting on Association for
Computational Linguistics - ACL 05, 2005.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 29


References
7. Michael Gamon. Sentiment classification on customer feedback data. Proceedings of the 20thinternational
conference on Computational Linguistics - COLING 04, 2004.
8. Daniel M. Bikel and Jeffrey Sorensen. If we want your opinion. International Conference on Semantic
Computing (ICSC 2007).
9. Kathleen T Durant and Michael D Smith. Mining sentiment classification from political web logs. In
Proceedings of Workshop on Web Mining and Web Usage Analysis of the 12th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining (WebKDD-2006), Philadelphia, PA, 2006.
10. Peter D. Turney. Mining the web for synonyms: Pmi-ir versus lsa on toefl. Lecture Notes in Computer
Science, page 491 to 502, 2001.
11. Peter D Turney. Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of
reviews. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 417
424. Association for Computational Linguistics, 2002.
12. Janyce Wiebe. Learning subjective adjectives from corpora. In AAAI/IAAI, pages 735740,2000.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 30


References
13. Vasileios Hatzivassiloglou and Kathleen R McKeown. Predicting the semantic orientation of
adjectives. In Proceedings of the 35th Annual Meeting of the Association for Computational
Linguistics and Eighth Conference of the European Chapter of the Association for Computational
Linguistics, pages 174181. Association for Computational Linguistics, 1997.
14. VK Singh, R Piryani, A Uddin, and P Waila. Sentiment analysis of movie reviews: A new feature-
based heuristic for aspect-level sentiment classification. In Automation, Computing, Communication,
Control and Compressed Sensing (iMac4s), 2013 International Multi- Conference on, pages 712717.
IEEE, 2013.
15. Prem Melville, Wojciech Gryc, and Richard D. Lawrence. Sentiment analysis of blogs by combining
lexical knowledge with text classification. Proceedings of the 15th ACM SIGKDD international
conference on Knowledge discovery and data mining - KDD 09, 2009.
16. Robert T. Clemen and Robert L. Winkler. Combining probability distributions from experts in risk
analysis. Risk Analysis, 19(2):187 to 203, Apr 1999.

11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 31


11/10/2017 INTRODUCTION TO SENTIMENT ANALYSIS 32

Vous aimerez peut-être aussi