Vous êtes sur la page 1sur 6

2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT)

Recursive Deep Learning for Sentiment Analysis


over Social Data
Changliang Li, Bo Xu, Gaowei Wu, Saike He, Guanhua Tian, Hongwei Hao,
Institute of Automation Chinese Academy of Sciences
95 Zhongguancun East Road, 100190,
BEIJING, CHINA
{changliang.li; xubo; gaowei.wu;saike.he;guanhua.tian;hongwei.hao}@ia.ac.cn

AbstractSentiment analysis has now become a popular cancer, where prevent reverses the negative polarity of cancer.
research problem to tackle in NLP field. However, there are very The work of [3] introduced an approach based on CRFs with
few researches conducted on sentiment analysis for Chinese. hidden variables with very good performance. RNTN was
Progress is held back due to lack of large and labelled corpus and employed to predict sentiment label on phrases and full
powerful models. To remedy this deciency, we build a Chinese sentence level [4].
Sentiment Treebank over social data. It concludes 13550 labeled
sentences which are from movie reviews. Furthermore, we However, there is relatively little investigation conducted
introduce a novel Recursive Neural Deep Model (RNDM) to on Chinese sentiment analysis. Progress is held back in
predict sentiment label based on recursive deep learning. We Chinese sentiment analysis, due to lack of large and labelled
consider the problem of classifying one sentence by overall corpus and powerful models. To address this problem, we
sentiment, determining a review is positive or negative. On introduce a Chinese Sentiment Treebank and a powerful
predicting sentiment label at sentence level, our model recursive deep model that can accurately predict the sentiment
outperforms other commonly used baselines, such as Nave Bayes, label on sentence level in this new corpus. The corpus is based
Maximum Entropy and SVM, by a large margin. on movie reviews from social networks. And it is the first
Chinese sentiment corpus with labeled parse trees. We
KeywordsSentiment analysis; Chinese Sentiment Treebank; introduce a novel recursive neural deep model. Our model
recursive deep learning
represents a sentence through word vectors and a parse tree and
then computes vectors for higher nodes in the tree using the
I. INTRODUCTION same composition function.
Sentiment analysis is the field of study that analyzes Sentiment analysis at sentence level goes to the sentences
peoples opinions, sentiments, evaluations, appraisals, attitudes, and determines whether each sentence expressed a positive,
and emotions towards entities such as products, services, negative, or neutral opinion. Neutral opinion usually means no
organizations, individuals, issues, events, topics, and their opinion. In this paper, we consider the problem of classifying
attributes [1]. one sentence by overall sentiment, determining a review is
With the development of Web 2.0 and enormous growth of positive or negative.
social data in Internet, sentiment analysis has now become a Combination with the Chinese Sentiment Treebank, our
popular research problem to tackle. In contrast to Web sites recursive deep model performs robust in predicting sentiment
where people are limited to the passive viewing of content, label distributions. We compare to several baselines such as
Web 2.0 may allow users to more easily express their views Nave Bayes (NB), Maximum Entropy and SVM. Our model
and opinions on social networking sites, such as Twitter and achieves the best performance. Meanwhile, based on Chinese
Facebook. Sentiment Treebank, all baseline models also obtain high
The opinion information they leave behind is of great value. performance.
For example, by collecting movie reviews, film companies can The rest of the paper is organized as follows: In Section 2,
gather feedbacks from their customers to further improve their we introduce some related work including word representations,
products. Customers can decide which movie is worthy recursive deep learning and sentiment analysis. Section 3
watching based on fellow customers movie reviews. Hence, in introduces our Chinese Sentiment Treebank. Section 4
recent years, sentiment analysis has become a popular topic for describes our recursive neural deep model. Section 5 presents
many research communities, including artificial intelligence the experiments on predicting Chinese sentiment label. The
and natural language process. conclusion and future work are presented in Section 6.
There has been a large amount of prior research in
sentiment analysis, especially in the domain of product reviews, II. RELATED WORK
movie reviews, and blogs. The work of [2] focused on This work is mainly connected to three areas of NLP
manually constructing several lexica and rules for both polar research: word representations, recursive deep learning and
words and related content-word negators, such as prevent sentiment analysis.

978-1-4799-4143-8/14 $31.00 2014 IEEE 180


DOI 10.1109/WI-IAT.2014.96
A. Word Representations positive or negative sentiments and their target entities or
A word representation is a mathematical object associated topics [1].
with each word, often a vector. Each dimensions value A seed lexicon and a graph propagation framework were
corresponds to a feature and might even have a semantic or employed to learn a larger sentiment lexicon [3]. Pang and Lee
grammatical interpretation, so we call it a word feature [5]. [12] researched the performance of various machine learning
An effective approach to word representation is to learn techniques (Nave Bayes, maximum entropy, and support
word embeddings. Each dimension of the embeddings vector machines) in the specific domain of movie reviews.
represents a latent feature of the word, hopefully capturing Some existing supervised learning approaches employed
useful syntactic and semantic properties [5]. Many approaches annotated corpora of manually labeled documents [13,14].
have been proposed to learn good performance word Several unsupervised learning approaches have also been
embeddings. There are some embeddings datasets publicly proposed based on given sentiment lexicons [15,16]. Various
available for evaluation, such as SENNAs embeddings [6], joint sentiment and topic models were proposed to analyze
Turians embeddings [5], HLBLs embeddings [7], Huangs sentiment in detail [17,18,19]. An empirical method was
embeddings [8]. proposed on Chinese sentiment analysis [20]. Recently, some
The training complexity is always a difficult problem of models based on recursive neural networks have considerable
training a variety of language models. It was recently proposed representational power, such as [4,10,11].
using the distributed Skip-gram or continuous Bag-of-Words III. CHINESE SENTIMENT TREEBANK
(CBOW) models. These models learn word representations
using a simple neural network architecture that aims to predict There is little publicly available Chinese sentiment corpus.
the neighbors of a word. To address this problem, we build the first corpus with labeled
parse trees that allows for a complete sentiment analysis. We
Due to its simplicity, the Skip-gram and CBOW models call the corpus as Chinese Sentiment Treebank.
can be trained on a large amount of text data. There is one
parallelized implementation can learn a model from billions of We collected reviews of 2270 movies from
words in hours [9]. Word2vec tool provides an efficient http://movie.douban.com/, which is a famous movie social
implementation of the continuous bag-of-words and skip-gram website with large amount of movie reviews. And then we
architectures for computing vector representations of words. filtered the movie reviews which meet with any of the
following.
B. Recursive Deep Learning
Recursive neural networks (RNNs) are able to process x with too many typos;
structured inputs by repeatedly applying the same neural x with special symbols;
network at each node of a directed acyclic graph (DAG) [10].
The recursive use of parameters is the main difference with x with rude language;
standard neural networks. The inputs to all these replicated x without labels;
feed forward networks are either given by using the childrens
labels to look up the associated representation or by their x with more than one sentence;
previously computed representation.
x with multi-languages;
The matrix-vector RNNs (MV-RNN) represent a word as
both a continuous vector and a matrix of parameters. It assigns x too long sentences, for example the number of words in
a vector and a matrix to every node in a parse tree: the vector one movie review is more than 30;
captures the inherent meaning of the constituent, while the x too short sentences, for example the number of words in
matrix captures how it changes the meaning of neighboring one movie review is less than 5.
words or phrases [11].
After filtering the unqualified movie reviews, we use
RNTN aims to build greater interactions between the input Chinese word segmentation tool ICTCLAS to segment all the
vectors [4]. It takes as input phrases of any length. And it movie reviews.
utilizes the same tensor-based composition function for all
nodes. RNTN has been successfully utilized in predicting Each sentence is labeled into 5 classes by humans. 0 means
phrase or sentence sentiment label. very negative; 1 means negative; 2 mean neutral; 3 means
positive; 4 means very positive. In order to avoid collecting too
C. Sentiment Analysis many reviews from one movie, the number of collected
Sentiment Analysis touches every aspect of NLP, e.g., reviews from one movie is set to be less than 150. We ignored
coreference resolution, negation handling, and word sense the neutral classes and filtered the movie reviews with
disambiguation, which add more difficulties since these are not sentiment label 2. If one sentences sentiment label value  >
solved problems in NLP [1]. 2, then it will be classified into positive (+). If  < 2, then it
will be classified into negative ().
However, it is also useful to realize that sentiment analysis
is a highly restricted NLP problem because the system does not Table 1 shows some examples. S-Num in Table 1 means
need to fully understand the semantics of each sentence or the number of sentences in each class.
document but only needs to understand some aspects of it, i.e.,

181
TABLE I. EXAMPLES OF MOVIE REVIEWS AFTER FILTERING AND WORD Finally, we employed the Stanford parser to parse all
SEGMENTATION
sentences [21]. Note that all punctuations are also taken as a
Sentim
Label S-Num Example
word. As a result, we obtained Chinese Sentiment Treebank.
ent Fig.1 shows two examples in Chinese Sentiment Treebank.
(a The top one shows one positive movie review, while the down
good story worth reading) one shows one negative movie review.

4 4976 (Does it need reason for Chinese Sentiment Treebank concludes 14964 Chinese
recommending this story?) words and 13550 sentences. It allows us to better predict
(It always sentiment label of sentences based on various machine learning
brings happy for people.)
frameworks. We can train and test various models on the new
,
Positive
corpus.
.(It is a really good movie,
(+) deserving your tiers and smile.) Combination with recursive deep model proposed in this
work, we will get good performance in predicting sentiment
(When I was a label task.The details will be described in next section.
3 6463 little child, I have seen a good movie,
in which it is full of wonderful IV. RECURSIVE DEEP LEARNING
fantasy.)
In this section, we firstly introduce Recursive Neural Deep
(The picture of this Korea movie is Model (RNDM). And then we describe the details of parameter
really beautiful.) learning.
(The
movie is lack of real story about A. Recursive Neural Deep Model
every days life.)
(It is too
RNDM can compute compositional vector representations
1 1364 for phrase of any length and sentence. These representations
boring to see through.)
will then be used as features to predict sentiment label. Fig.2
Neg- (The female leading role of this shows the structure of RNDM.
ative movie is not suitable for me at all. )
() (It is For ease of exposition, we used a tri-gram ,
too bad to be called amovie) (corresponding to the best movie in English), to explain our
model.
0 747 (It is a piece of junk!)

(The story of this movie is
Label
really boring.)
+
WS
p2=g(a,p1)

W

p1=g(b,c)


a b c
-
Fig. 2. RNDM for predicting sentiment label.

In our model, each word is represented as a -dimensional


vector. All the word vectors are stacked in the word embedding
matrix  || , | | is the size of the vocabulary. The
word vectors can be seen as a parameter that is trained jointly
with the model.


When an n-gram is given to the model, it is parsed it into a

binary tree and each leaf node, corresponding to a word, is
represented as a vector. In Fig.2, node
, and are word

vector representations for each word respectively.

The sentiment label vector is -dimensional.    is
Fig. 1. Examples in Chinese Sentiment Treebank. the sentiment classification matrix;   is the

182
transformation matrix. In our model, we omit the bias for case of using  = 
 , can be computed as  C (&) = 1
simplicity.   (&).
RNDM uses the tensor-based compositionality function  We define the full incoming error for a node H as I J,K1LL .
[22]. Its main property is that it can directly relate input vectors. For the node  , the received error is computed as formula (6).
In our example, the vector of  in Fig.2, the parent node I !",MNOO = P(  ) ( '  ' )Q $ ( ) (6)
of and , is computed through formula (1).
Then we compute the error for the two children of  by
 formula (7).
 =     [:]   +    (1)

 [:]  represents the tensor that defines I !",RST = (  I !",K1LL + U) $    (7)


multiple bilinear forms.   denotes the concatenation of two We define S as formula (8).

column vectors resulting in a  vector.   is the ! ,K1LL 

transformation matrix and also the main parameter to learn. S = W5 IW " B [W] + P [W] Q  E (8)

 = 
 is a standard element-wise nonlinearity.
 , the right child of  will then take half of this vector to
After computing the first two nodes, the network is shifted compute the full I as described in formula (9).
by one position and takes as input vectors and again computes
a potential parent node. The next parent vector  in Fig.2 will I !X,K1LL = (I !",RST [1 + : 2]) (9)
!" ,RST [1
be computed as formula (2). I + : 2] means that  is the right child of Y



and hence takes the 2th half of the error. For the left child of  ,
 =     [:]   +    (2) the derivate for
, it will be I !",RST [1: ].

Note that the parent vectors must be of the same The full derivative for  is the sum of the derivatives at
dimensionality to be recursively compatible and be used as each node. We can use formula (10) to compute the full
input to the next composition. derivative for .

Then we employ the obtained vectors as inputs to a soft ;?(Z) ! ,K1LL


 ! ,K1LL

= IW "   + IW X   (10)
classifier. For classification into classes, we compute ;@ 
posterior over labels given the node and its corresponding Similarly, we use formula (11) to compute the full
sentiment node. For example, we compute the sentiment label derivative for .
of phrase in Fig.2 as described in formula (3).
;? \ ;? `" ! ,K1LL 
 !" = #$%
&(    ) (3) = + IW X    (11)
;^ [_] ;^ [_]
 
Where   represents sentiment classification  [J]  is the H slice of the tensor; IW " and
! ,K1LL
matrix. ! ,K1LL
IW X are the a element of I !",K1LL and
!X ,K1LL
B. Backprop Through RNDM I respectively. More details about learning  can be
In this sub-section, we give a brief introduction of the way referred to [4]. For the optimization, we use AdaGrad
to train our model. Given a sentence, we aim to maximize the algorithm to find optimal solution [23].
probability of correct prediction, or minimize the cross-entropy
V. EXPERIMENTS
error between the predicted sentiment label  ' and target
sentiment label  ' . We represent words using 50-dimensional word vectors.
And we initialize all word vectors by randomly sampling each
Let * = (,   , ) be our model parameters and - a value form a uniform distribution: b(c, +c) , where c =
vector with regularization hyper parameters for all model 0.0001.
parameters. The error as a function of the PRDM parameters
for a sentence is represented as formula (4). The sentences in the Chinese Sentiment Treebank were
split into three sets: train set (including 10627 sentences),
.(*) = 013 '
45  $
'
+ -|*| 
(4) validation set (including 665 sentences) and test set (including
6 means the 6 sentence in corpus. 78% is the total 2258 sentences).
number of sentences in corpus. We use validation set and cross validation over legalization
The derivative for the weights of the sentiment of learning rate as well as the weights and minibatch size for
classification matrix  9 at sentiment node  is computed as AdaGrad algorithm.
formula (5). We compare to three commonly used baselines: Nave
Bayes (referred to NB), Maximum Entropy model (referred to
= B( '  ' ) $  C (
D)E  
;?
(5) ME) and support vector machine (referred to SVM). We
;@ A
describe the three baselines briefly in the following sub-
$ is the Hadamard product between the two vectors.  is sections.
the element-wise derivative of function , which in standard

183
A. Nave Bayes We report the overall accuracy on predicting sentiment
Nave Bayes is a simple model based on conditional label of movie reviews in test set. The RNDM obtains an
independence assumption. Given a feature vector table, the accuracy of 90.8% compared to NB (78.65%), ME (87.46%)
algorithm computes the posterior probability that the sentence and SVM (84.9%) just as shown in Fig. 3.
# belongs to one label . We use a multinomial Nave Bayes
model as described in formula (12).
92
!(L) j gh (i)
hkX !(K|L)
0f (|#) = (12) 90 NB
!() ME
88
In this formula,  represents a feature and J (#) represents 86
SVM
the count of feature  found in movie review #. There are a RNDM
84
total of % features. Parameters () and (|) are obtained
82
through maximum likelihood estimates, and smoothing
algorithm is utilized for unseen features. 80
78
We assign one movie review to the sentiment label with the
76
highest posterior probability as described in formula (13).
74
 =
c%
& 0f (|#) (13) 72
Machine Learning Methods
B. Maximum Entropy
Maximum Entropy models are feature-based models. It Fig. 3. Accuracy for predictions at sentence level.
makes no independence assumptions for its features. So we can
add features like bigrams and phrases without worrying about From Fig.3, we can see that the RNDM gets the highest
features overlapping. The model is represented by formula (14). performance in predicting positive/negative sentiment followed
mn!(h oh Kh (L,))
by ME, SVM and NB. Combination with our Chinese
? (|#, l) = (L,))
(14) Sentiment Treebank, even baseline NB and ME, despite their
Op mn!(h oh Kh
simplicity, can achieve high accuracy in predicting sentiment
In this formula, l is the sentiment label, s is the movie view, label.
and l is a weight vector. The weight vectors decide the
significance of a feature in classification. A higher weight The result highlights the fact that combination with Chinese
means that the feature is a strong indicator for the sentiment Sentiment Treebank, our model is reliable in predicting
label. sentiment label of sentences.

C. SVM E. Contrastive Conjunction


Support vector machines (SVMs) are supervised In this sub-section, we use a subset of the test set which
learning models with associated learning algorithms that includes sentences with contrastive conjunction, like X but Y
analyze data and recognize patterns, used in English. The conjunction is interpreted as an argument for
for classification and regression analysis. An SVM model is a the second conjunct, which the first functioning concussively.
representation of the examples as points in space, mapped so The contrastive conjunction structure is a difficult problem
that the examples of the separate categories are divided by a in sentiment analysis on sentence level, because the sentences
clear gap that is as wide as possible. Multiple variants of SVM first part often represents the different sentiment with the
have been developed. second part. We select the sentences with the structure X
Here we employ linear SVM due to its popularity and high ||| Y.
performance. The optimization of SVM (dual form) is to The total number of the sentences with contrastive
maximize described as formula (15). conjunction structure in test set is 60. Table 2 listed some
 movie reviews with contrastive conjunction structure X |
L(r) = TJ5 rJ J,4 rJ r4 J 4 J 4 (15)

|| Y .
Each J is real vector for feature and J is the corresponding
sentiment label J belongs. For the resulting 60 cases with contrastive conjunction
structure, RNDM obtains the highest accuracy of 95%
Subject to (for any H = 1,2  ) formula (16). compared to NB (85%), ME (85%) and SVM (86.7%) just as
rJ 0 shown in Fig.4.
s T (16)
J5 rJ J 0 From the results, we can see that the RNDM outperforms
all the three baselines by a large margin. Both NB and ME
D. Binary Sentiment achieve the same accuracy 85%, and SVM obtains accuracy
This setup is similar to other relevant work on English 86.7%. The result shows that our recursive deep model is more
corpus (original rotten tomatoes dataset), which used full reliable in predicting sentiment label of sentences with
sentence labels and binary classification of positive/negative. contrastive conjunction structure.

184
TABLE II. EXAMPLES OF MOVIE REVIEWS WITH CONTRASTIVE REFERENCES
CONJUNCTION STRUCTURE
[1] Liu, B. Sentiment analysis and opinion mining. Synthesis Lectures on
Movie reviews with contrastive conjunction Human Language Technologies series, Morgan & Claypool Publishers,
Sentiment Label 2012.
structure
[2] Y. Choi and C. Cardie. 2008. Learning with compositional semantics as
(Wu Zhenyu has good acting skill, but he is structural inference for subsentential sentiment analysis. In EMNLP.
0 not suitable for roll in the comedy.) [3] T. Nakagawa, K. Inui, and S. Kurohashi. 2010. Dependency tree-based
(It is really sentiment classification using CRFs with hidden variables. In NAACL,
exquisite, but not interesting.) HLT.
Negative [4] Richard Socher, Alex Perelygin, Jean Y. Wu, Jason Chuang,
() (It has good picture, but with an Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013.
arrogant and disgusting taste.) Recursive deep models for semantic compositionality over a sentiment
1 (It is full treebank. In Proc. EMNLP.
of suspended story, but it is really uninteresting.) [5] J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: a
simple and general method for semi supervised learning. In Annual
(I can play the game, but I am dizzied when look it.) Meeting of the Association for Computational Linguistics.
[6] R. Collobert. 2011. Deep learning for efficient discriminative parsing. In
(It has some interesting parts, but International Conference on Artificial Intelligence and Statistics.
the whole story has no principal line.) [7] Mnih, A., & Hinton, G. E. 2009. A scalable hierarchical distributed
language model. NIPS (pp. 10811088).
3 (At first, I am not interested in it, but in the final [8] E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving
part, I am really touched by it.) word representations via global context and multiple word prototypes.
In Annual Meeting of the Association for Computational Linguistics.
(The story of it is simple, but the whole movie is [9] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013.
Positive
fairly good.) Efficient estimation of word representations in vector space. arXiv
(+)
preprint arXiv:1301.3781.
(Although it is lack of self-help story, the [10] R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous
movie has plentiful stories, alike the Movie Forrest phrase representations and syntactic parsing with recursive neural
Gump.) networks. In Proceedings of the NIPS-2010 Deep Learning and
4 Unsupervised Feature Learning Workshop.

(At the beginning, the movie is [11] R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. 2012. Semantic
terrifying for me, but it turns out to be a touching compositionality through recursive matrix vector spaces. In EMNLP.
and warm story at end.) [12] B. Pang and L. Lee. 2004. A sentimental education: Sentiment analysis
96 using subjectivity summarization based on minimum cuts. In ACL.
94 NB [13] Hang Cui, Vibhu Mittal and Mayur Datar. 2006. Comparative
Experiments on Sentiment Classification forOnline Product Reviews. In
92 ME
Proceedings of the Twenty First National Conference on Artificial
SVM
90 Intelligence.
88 [14] John Blitzer, Mark Dredze, Fernando Pereira. 2007. Biographies,
Bollywood, Boom-boxes and Blenders: Domain Adaptation for
86 Sentiment Classification. Association for Computational Linguistics.
84 [15] P. D. Turney. 2002. Thumbs up or thumbs down?: semantic orientation
82 applied to unsupervised classification of reviews. In Proceedings of the
40th Annual Meeting on Association for Computational Linguistics.
80 [16] Bing Liu. 2010. Sentiment Analysis and Subjectivity. To appear in
Machine Learning Methods Handbook of natural Language Processing, Second Edition.
Fig. 4. Accuracy for predicting sentences with contrastive conjunction [17] Q. Mei, X. Ling, M. Wondra, H. Su, and C. Zhai. 2007. Topic sentiment
structure. mixture: modeling facets and opinions in weblogs. In Proceedings of the
16th international conference on World Wide Web, pages 171180.
VI. CONCLUSION [18] I. Titov and R. McDonald. 2008. Modeling online reviews with multi-
grain topic models. In WWW 08: Proceeding of the 17th international
In this work, we focus on sentiment analysis for Chinese on conference on World Wide Web, pages 111120, New York, NY, USA.
sentence level. We firstly introduce Chinese Sentiment [19] C. Lin, and Y. He. 2009. Joint Sentiment/Topic Model for Sentiment
Treebank based on movie reviews from social websites. And Analysis. The 18th ACM Conference on Information and Knowledge
then we introduce RNDM to predict sentiment label of movie Management.
reviews on sentence level. Combination with Chinese [20] S. Tan, J. Zhang. An empirical study of sentiment analysis for Chinese
documents Expert Systems with Applications, 34 (4) (2008), pp. 2622
Treebank, even baselines can obtain a good performance. 2629
However, RNDM achieves the highest accuracy in predicting [21] D. Klein and C. D. Manning. 2003. Accurate unlexicalized parsing. In
binary sentiment label on sentence level. In predicting ACL.
sentiment label of sentences with contrastive conjunction [22] J. Mitchell and M. Lapata. 2010. Composition in distributional models
structure, RNDM outperforms the baselines by a large margin. of semantics. Cognitive Science, 34(8):13881429.
[23] J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods
VII. ACKNOWLEDGMENT for online learning and stochastic optimization. JMLR, 12, July.
This work is supported by National Program on Key Basic Research Project
(973 program) under Grant:2013CB329302 and the National Natural Science
Foundation of China under Grant No. 61175050&NO.61203281&No.61303172.

185

Vous aimerez peut-être aussi