Vous êtes sur la page 1sur 6

Sentiment Analysis of Polarity in Product Reviews

In Social Media
Hafsa Dar Ikram Ullah Lali
Marium Nafees
Dept. of Software Engineering Dept. of Computer Science
Dept. of Computer Science
University of Gujrat University of Gujrat
University of Sargodha
Gujrat, Pakistan Gujrat, Pakistan
Sargodha, Pakistan
hafsa.dar@uog.edu.pk
mariumnafees15@gmail.com ikramullah@uog.edu.pk

Salman Tiwana
Dept. of Computer Science
University of Sargodha
Sargodha, Pakistan
Salmantiwana@ymail.edu.pk

Abstract— Sentiment analysis is the study area in Natural The information in the way of blogs and discussions on
language processing (NLP) that is concerned to identify the Facebook, Twitter, LinkedIn and other web-based social
mood or opinion with in the text. This paper emphasizes on the networking channels has given consumers another method
different methods utilized for classifying the natural language for communicating their opinion related to any product and
text reviews in accordance with opinions expressed in text to services, therefore may influence other potential purchasers.
analyze whether the extensive behavior is negative, positive or
neutral. The abundance of discussion platforms, Weblogs, Measuring client’s behavior about any items or services on
product reviews sites, e-commerce and social networking sites social sites is described as sentiment analysis [3]. Sentiment
have encouraged stream of thoughts and articulation of analysis joins NLP with artificial intelligence for the
opinions. Social media is considered to be a big platform of evaluation of statements of text analytics posts on different
sentiments, reviews and opinion evaluation. Data used in this social platforms in order to determine the polarity of
study are online product reviews collected from twitter and reviews regarding a specific product and service. The
used to rank the best classifier for sentiments. The method of performance of sentiment analysis can be done at three main
analysis on polarity classification was discussed in levels: 1) the sentence-level, 2) concept-level and 3)
experimental work by using well known classifiers including
document level and the methodologies used to examine
Naïve byes, Support vector machine and Logistic regression for
predicting the user reviews. sentiments are classified in the main categories are lexicon
based, machine learning and hybrid approaches.
Keywords— Sentiment analysis; Sentiment polarity
classification; Product reviews; Weka; Machine learning Text and emoticons are used to express the individual
feelings, which are share on the Internet. The ideas which
I. INTRODUCTION people shares may mirror a notion with various power in
regards to a particular topic, communicating positive,
The media convergence has extended the utilization of web negative or even unbiased feelings [4]. In this manner, the
that brought forth online networking called social media. ideas posted on social sites and web blogs are consider to be
Web-based social networking, an idea that has given helpful for understanding the opinion that people share
individuals a universal platform for distribution their news, about service providers in favor of improvement product
perspectives and opinions with respect to the incidents and maintenance [5]. Keeping in mind the end goal to
around them. Web-based social networking is expanded in perceive clients' emotions, the huge information that is
nearness and imperative in the society. Social sites enable collected from social media must be proceed properly.
clients to speak with individuals in the system by sharing Sentiment analysis becomes the answer of this need.
Ideas, graphics, status, posts and items. It has turned out to Sentiment analysis is a broad platform of NLP and text
be one of the vast platform to express user opinions [1]. mining. The goal of sentiment analysis is to analyze the text
Social media considered gold mine to understand the value polarity of natural language [6]. Micro-blogging like twitter
of public sentiment. Social Media known as the Internet- furnish analysts with an abundance of data to communicate
build applications that engaged in taking after the web 2.0 on social network. The common method to communicate is
ideological and innovative establishment and let the client in
prompt trading [2].
Microblog posts like tweets that frequently shows the In this research document-level approach is used for the
authors ideas and states of emotions [7]. analysis of sentiments. In proposed work data is collected
from social website twitter and efficiently execute sentiment
Twitter is famous social and microblogging administration analysis of product reviews. The focus of work was to find
that empowers its applicant to post content acceptable 140 the polarity of selected data by applying sentiment analysis.
characters, called "tweets". The administration launched on The two main steps are applied for sentiment analysis are 1)
July 2006 [8]. The overall ambiguity was increased rapidly, Text classification and 2) Emoticon Classification. The
with 140 million dynamic applicants begins at 2012 sentiment of product reviews is analyzing by using well
generating tweets of 340 million a day along with 1.6 billion trained Machine Learning classifiers that are Naïve Byes
pursuit questions are handled. Tweets are the essential (NB), Support Vector machine (SVM) and Logistic
nuclear building piece for goodness' sake twitter. Regression (LR). The results described that using hybrid
Tweets are otherwise called "status”. Tweets can be based information of text and emoticons is classified as
installed, answered to, erased, liked and disliked [9]. positive, negative and neutrals.

The twitter information turns out to be exceptionally famous In summary, the work contribution is of four-fold:
as a data source in the research development and
applications in different domains because of its use by huge 1. Data collection by using twitter API
community. Market investigation is one of the areas where 2. An analysis the text and emoticons classification of
data from twitter is broadly utilized as a part of request to tweets data
give approach to create business. Researchers have utilized 3. Find the polarity distribution by applying sentiment
twitter to anticipate the achievement of the online services analysis
and products and distinguish potential clients who take after 4. Find the accuracy of respective analysis
to get the information related to products [10]. The research .
of data mining has effectively formed various tools,
techniques and algorithms for manipulate large data to
handled problems of real world. The data mining case II. SENTTIMENT ANALYSIS
objectives are to successfully deal with huge scale data,
patterns and increase insightful learning [11]. Big Data is “Thinking of people” always be an important information
fundamentally used to portray the exponential development part for everyone especially in decision-making. Sentiment
and in addition the accessibility of organized and analysis is used to identify the polarity of data by using
unstructured information. It enables connections as far as natural language processing. It also defines as opinion
deciding business patterns and nature of research. Big data mining because it analyses the opinion or attitude of a user.
working is complex due to multiple resources. In this The approach of sentiment analysis is to analyses how user
research, large data is organized by considering big data thinks related particular topics. This helps to determining
applications and dependencies. the overall contextual polarity of document and sentiments
on different levels. It categories the whole document and
also determine the response of user words or expressions in
Social media is growing rapidly that’s way examine social the entire document. Many companies use sentiment
data plays a vital role in analyzing customer behavior. So analysis in order to observe their status of product and
there we analyzed data of twitter utilizing sentiment analysis services in general.
which find the polarity of customer reviews on selected
products. Figure 1 elaborate how process is analyzed by
social media on collected dataset. The products of five Sentiments can be depicted as emoticons, opinions or ideas
famous companies are selected for data collection that incited or hued by emoticons [12]. Sentiment analysis is the
include 1. Unilever 2. Procter & Gamble 3. Samsung 4. real task of NLP that examines the individual’s sentiments,
GlaxoSmithKline and 5. Mobilink: attitudes or feelings towards certain elements. It is the
inspection of using a machine to choose the polarity of an
appraisal whether it is positive, negative, or neutral [13].
Sentiment Analysis used to analyze the data polarity and
access the positive and negative content of the input text.
Figure 2 describe how this analysis help to analyze human
ideas.

Figure 1 Social media process analysis


System (QAS) on product data reviews. As a result, it is
certified that MPQA Lexicon and SentiWordnet are
considered to be efficient.

A key issue was handled by [1] on sentiment analysis and


polarity classification. Data is collected from amazon.com
which is one of the famous social site. Related data
contained reviews of four classes of products: beauty,
books, electronic and home. The online survey was done by
3.2million customers towards 20062 product items. Every
review classified as 1) reviewer ID 2) product ID 3) rating
4) view time 5) helpfulness and 6) text. For the analysis of
sentence-level and review-level SVM and NB models were
introduced. An approach of polarity determination was
achieved by [22] using Emoticon-graph by developing a
Figure 2 Sentiment analysis measurement areas code in C++ using QT creator. As a result, it is concluded
that emoticon-graphs had preference over conventional
Machine learning is used to analyze the sentiments and graphs because of easy polarity measurements.
solutions includes supervised learning utilizing labelled
trained data set, unsupervised learning without trained IV. METHODOLOGY
labelled data set and semi-supervised with mixed of labelled
and unlabeled data set [14].
Twitter is one of the biggest Social Networking Services
(SNS) and microblogging administration that empowers its
III. RELATED WORK applicant to post content acceptable 140 characters, called
"tweets". The administration launched on July 2006 [8].
For the advancement of E-commerce and social media Tweets can be installed, deleted, response to, likes and
application classification of online reviews are much dislikes [9]. Twitter is one of the famous or broadly used
important. The aim of sentiment analysis is to determine the platform because of its limited length of tweets. Machine
people intention toward a particular product. The two main learning is one of the competent methodology in sentiment
task that are involved in data mining from customer reviews analysis for polarity identification. For the analysis of
are 1. Classification of data feature set and 2. Sentiment sentiments, the text-level sentence classification is executed
analysis of customer reviews based on the classification of by following means: Pre-handling, labelling, classification
data set. and tokenization.

Here the focused of author was using three method of ML The proposed methodology contains of stages 1: data
that’s include NB, SVM and maximum entropy collection with labelling 2: data mining and 3: sentiment
classification that are applied on data collected from movies analysis. Figure 4 elaborate the proposed methodology of
dataset rating, that is classified into three categories positive, this research in which all stages are described. Text and
negative and neutral. Standard Bag feature framework was emoticons are used to analyses the effectiveness of results.
used for this and SVM is considered best as compared to Data mining strategies are used for classification of results.
others [15]. A novel based strategy of ML was applied by Weka data mining performing tool is used for the analysis
Pang & Lee on trials that’s include motion picture surveys of results. It is a well-known workbench that allow the
classified as positive or negative. To assemble subjective researchers simple access to state-of the- art methods in ML
sentences or expressions, 5000 reviews of movie are [22]. In this research work Support vector machine (SVM),
collected from (www. rottentomatoes.com) and from the Naïve byes (NB) and Logistic Regression (LR) classifiers
dataset of web film (imdb.com). Naïve Bayes classifier and are used for the analysis of results.
SVM used on subjectivity and after that utilized
fundamental subjective identifier. Author locate the
improvement NB: 86.4%, VS 85.2%, SVM 86.15%, versus A. Data Collection
85.45% [16]. Asur & Huberman presented a model to
calculate the expected incomes from movie by evaluate the Collection of data contain tweets of five famous products of
social media platform data [17]. different categories which are selected for this research
namely 1: Unilever 2: Samsung 3: Procter and Gamble
A lexicon base structure is used for the sentiment analysis (P&G) 4: Mobilink and 5: GlaxoSmithKline (GSK). Table I
on the movie reviews dataset and 59.5% fairly based highlights the selected products with corresponding
precision result was calculated [18]. A lexicon-based collected number of tweets [23]. Data is collected in the
analysis was performed by [19] on 2080 tweets data with period of January to April 2017 by using twitter steaming
emoticon containing data. Polarity accuracy of emoticon API and twitter search. This API conveys tweets based on
based or without emoticon was calculated as 22% to 94% specific user request along with data about the author,
while on sentence level 59% accuracy was considered. A continuously in real time and author's permission is not
Rhetorical Structure Theory (RST) was proposed by [20] to necessary for tweets [27].
examine the framework of why type Question Answering
TABLE I: TOTAL TWEETS COLLECTED FROM SELECTED
PRODUCTS TABLE II INFORMATION ABOUT TWEETS WITH POLARITY AND
ATTRIBUTES
Products No. of tweets
Samsung 16k Products Tweets Polarity Classification No. of
Unilever 11.8k
Procter & Gamble (P & G) 2.6k Positive Negative Neutral Attributes
Mobilink 6.7k Samsung 16087 6779 1524 7784 1806
GlaxoSmithKline (GSK) 5.4k Unilever 11804 8723 1399 1682 2245
P&G 2632 1651 110 871 3320
Mobilink 6671 3220 513 2938 2635
B. Data Labelling and Pre-processing GSK 5428 3115 316 1997 2794

WEKA is an open source under GNU (General Public


License) implementing interface of contains many built-in C. Learning Algorithms
functions. In ML and mining, Weka is identified a landmark
system [24]. Figure 3 shows the preprocessing module in Although all algorithms have their own advantages and
which data is prepare for testing and training by applying all disadvantages. The algorithms which are selected for this
these for pre-processing, Weka include associations rules, research are SVM, NB and LR. Naïve byes are generally
classification, visualization clustering and regression [28]. executed for analyzing sentiment because of its efficiency. It
contained independence of predicator assumptions and
In labelling data is labelled by applying hybrid approach of useful for large dataset. The text is allocated to the highest
text and emoticons and is categorized as 1 for positive, 0 for probability class.
negative and 2 for neutral that distinguish the polarity
measurements. In table II, Total number of tweets of For text classification SVM algorithm is considered to be
selected products and their polarity classification along with best algorithm because of robust and systematic learning
attributes are described. Polarity is classified in positive, algorithm in the scenario of large feature space. This
negative and neutral. classifier has low interpretability because of dot product
calculation and normalization [25] [29]. LR technique is
coming up from statistic field with probability function for
binary classification and multi classification. In this
research, we have used this for polarity classification. Three
evaluation measure (Precision, Recall and F-measure) are
used to evaluate the performance efficiency of SVM, NB
and LR.

Precision (P) is measured as the number of true positives


(Tp) over the number of true positives plus the number of
false positives (Fp).
The relationship of precision with all classifier is of
exactness. It is measured as number of irrelevant records
retrieved and the number of relevant records are retrieved.
Figure 3 Data preprocessing module

Figure 4 Proposed methodology flowchart


Recall (R) is measured as the number of true positives (Tp) over
the number of true positives plus the number of false negatives
(Fn). It is calculated as total number of relevant records not
retrieved and total number of relevant records are retrieved. The
relationship between both precision and recall is found to be
inverse [26]. F-measure is calculated by identifying the
harmonic mean between recall and precision. In unequal data
distribution precision & recall are good measurements metrics
when contrasted with others to mass the classifiers execution
[30].

In the intense circumstances, when recall approach 100%, yet


precision will be low then F-measure considered as best
evaluation metric. In table III the average results of data for
three evaluation measure of SVM, NB and LR classifiers are
described. These results are calculated on preprocessed dataset
after applying all classifiers separately.
Figure 5 Collective result of all classifiers

TABLE III AVERAGE OF THREE EVALUATION MEASURES FOR


SVM, NB & LR

Evaluation Measures In ending results the overall accuracy of all applied


Classifiers algorithms is prescribed.
Finally, it is easily evaluated by us that although all
Precision Recall F-measure
algorithms classifiers are good in order to determine the
sentiment analysis but SVM is an efficient approach to get
SVM 73.34% 72.2% 72.76% the maximum accuracy of polarity distribution in sentiment
analysis. Figure 6 shows the cumulative accuracy results in
NB 55.4% 56.8% 56% order to make it more precise which algorithm is giving best
accuracy outcomes in which SVM accuracy of 71.8% is
LR 66.72% 63% 64.80% presented by blue color, NB with 56.4% accuracy is
presented by orange color and LR with 62.4% accuracy is
shown by grey color.

V. EXPERIIMENTAL SETUP AND DISCUSSION

There are different tools that are used to measure the


sentiment analysis. WEKA [25] an open source tool is
preferred for this research for dataset classification analysis.
Its aims to give to researchers a complete collection of
machine learning algorithms and data pre-processing tools.
It’s able to allow users to quickly compare and try different
ML methods on data.

We had training corpus of 42,622 reviews and all were


manually annotated with polarity. For training and testing
data file is converted into Attribute-Relation File Format
(ARFF) because Weka accept the file in this format. Data is
classified into three classes of polarity according to nature of Figure 6 Cumulative analysis result
tweets as 1 for positive, 0 for negative and 2 for neutral. In
Weka pre-processing tools are known as filters. For pre-
processing data is converted into StringToWordVctor and
applied un-supervised learning to get the sentiments polarity VI. CONCLUSION
for using proposed algorithms.
In this article, collected tweets from product reviews
After pre-processing experiments are performed on are analyzed and modelled with two approaches: 1)
following classifiers 1: Support vector machine (SVM) 2: text-based and 2) emoticon based. Well known
Naïve byes (NB) 3: Logistic regression (LR). Figure 5 classifiers SVM, NB and LR are trained in WEKA to
shows the graphical representation of all applied algorithms. measure the effectiveness of data. Tweets are
The resulted data help to find the best algorithm for polarity categorized to find the polarity analysis on data. After
classification sentiments. workout of applying different classifiers on data review
SVM is considered an efficient and best because of its 7. Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up?:
maximum accuracy outcomes. sentiment classification using machine learning techniques.
Proceedings of the ACL- Conference on Empirical methods in
natural language processing-Volume 10, (pp. 79-86).
In future we plan to identify the reviews along with images
from different social networking data and determining the 17. Turney, P. D. (2002). Thumbs up or thumbs down?: semantic
comparative of these. The data of different social sites would orientation applied to unsupervised classification of reviews.
be a kind measure in order to get more accuracy by applying Proceedings of the 40th annual meeting on association for
more competitive algorithms. computational linguistics, (pp. 417-424).

18. Pang, B., & Lee, L. (2004). A sentimental education: Sentiment


analysis using subjectivity summarization based on minimum cuts.
REFERENCES Proceedings of the 42nd annual meeting on Association for
Computational Linguistics, (pp. 271).
1. Rekha, K. S. (2014). Opinion Mining and Classification of User Reviews
in Social Media. International Journal, 2.
19. Lakkaraju, H., Bhattacharyya, C., Bhattacharya, I., & Merugu, S.
(2011). Exploiting coherence for the simultaneous discovery of latent
2. Kaplan, A. M., & Haenlein, M. (2010). Users of the world, unite! The
challenges and opportunities of Social Media. Business horizons, 53, 59- facets and associated sentiments. Proceedings of the 2011 SIAM
68. international conference on data mining, (pp. 498-509).

3. Arora, D., Li, K. F., & Neville, S. W. (2015). Consumer’s sentiment́ 20. Mostafa, M. M. (2013). More than words: Social networks’ text
analysis of popular phone brands and operating system preference using mining for consumer brand sentiments. Expert Systems with
Twitter data: A feasibility study. Advanced Information Networking and Applications, 40, 4241-4251.
Applications (AINA), 2015 IEEE 29th International Conference on, (pp.
680-686). 21. Mishra, A., & Jain, S. K. (2014). An Approach for Computing
Sentiment Polarity Analysis of Complex Why-type Questions on
Product Review Sites. Research in Computing Science, (pp.84, 65-
4. Hogenboom, A., Bal, D., Frasincar, F., Bal, M., de Jong, F., & Kaymak,
76).
U. (2013). Exploiting emoticons in sentiment analysis. Proceedings of the
28th Annual ACM Symposium on Applied Computing, (pp. 703-710).
22. Lovins, J. B. (1968). Development of a stemming algorithm.
Mech. Translat. & Comp. Linguistics, 11, (pp.22-31).
6. Guimarães, R., Rodr\́iguez, D. Z., Rosa, R. L., & Bressan, G. (2016).
Recommendation system using sentiment analysis considering the polarity
of the adverb. Consumer Electronics (ISCE), 2016 IEEE International 23. Tao, K., Abel, F., Hauff, C., & Houben, G.-J. (2012). What
Symposium on, (pp. 71-72). makes a tweet relevant for a topic? , (pp. 49-56).

7. Roberts, K., Roach, M. A., Johnson, J., Guthrie, J., & Harabagiu, S. M. 24. Kherwa, P., Sachdeva, A., Mahajan, D., Pande, N., &
(2012). EmpaTweet: Annotating and Detecting Emotions on Twitter. Singh, P. K. (2014). An approach towards comprehensive sentimental
LREC, 12, pp. 3806-3813. data analysis and opinion mining. Advance Computing Conference
(IACC), 2014 IEEE International, (pp. 606-612).
8. Pak, A., & Paroubek, P. (2010). Twitter as a Corpus for Sentiment
Analysis and Opinion Mining. LREc, 10 25. Holmes, G., Donkin, A., & Witten, I. H. (1994). Weka: A
machine learning workbench. Intelligent Information Systems, 1994.
9. Duncan, B., & Zhang, Y. (2015). Neural networks for sentiment analysis Proceedings of the 1994 Second Australian and New Zealand
on twitter. Cognitive Informatics \& Cognitive Computing (ICCI* CC), Conference on, (pp. 357-361).
2015 IEEE 14th International Conference on, (pp. 275-278).
26. Bravo-Marquez, F., Frank, E., Mohammad, S. M., & Pfahringer,
10. Lavanya, T., Miraclin Joyce Pamila, J. C., & Veningston, K. (2016). B. (2016). Determining Word-Emotion Associations from Tweets by
Online review analytics using word alignment model on Twitter data. Multi-label Classification. Web Intelligence (WI), 2016
Advanced Computing and Communication Systems (ICACCS), 2016 3rd IEEE/WIC/ACM International Conference on, (pp. 536-539).
International Conference on, 1, pp. 1-6.

11. Jeyapriya, A., & Selvi, C. S. (2015). Extracting aspects and mining 27. M. Saqib, M. Bilal, M. Ikram et. al., (2017), Effectiveness of
opinions in product reviews using supervised learning algorithm. Social Media Data in Healthcare Communication, J. Med.
Electronics and Communication Systems (ICECS), 2015 2nd International Imaging Health Inf. Vol. 7, issue 6, pp. 1365-1371
Conference on, (pp. 548-552).

12. Bhardwaj, N., Shukla, A., & Swarnakar, P. (2014). Users' Sentiment 28. Basit S., M. Ikram et al., (2017), Discovery and Classification of
Analysis in Social Media Context using Natural Language Processing. The User Interests on Social Media, Information Discovery and
International Conference on Digital Information, Networking, and Wireless
Communications (DINWC), (pp. 103). Delivery, vol. 45

13. Mukherjee, S., & Bhattacharyya, P. (2012). Feature specific sentiment 29. Raza M., Saqib N., Basit S., Javed F., et al., (2017) Early
analysis for product reviews. Computational Linguistics and Intelligent Detection of Controversial Urdu Speeches from Social Media, Data
Text Processing, 475-487.
Science and Pattern recognition, Vol 1 (2), pp. 26-42
14. Ahmed, K., El Tazi, N., & Hossny, A. H. (2015). Sentiment Analysis
over Social Networks: An Overview. Systems, Man, and Cybernetics 30. M. Salman, Farooq J., Ikram Ullah, et al., (2018), Comparative
(SMC), 2015 IEEE International Conference on, (pp. 2174-2179). Analysis of Context based Classification of Twitter, 4th IEEE Int.
Conference on Advances in Computing, Communication and
15. Das, T. K., Acharjya, D. P., & Patra, M. R. (2014). Opinion mining Automation, Malaysia
about a product by analyzing public tweets in Twitter. Computer
Communication and Informatics (ICCCI), 2014 International Conference
on, (pp. 1-4).

Vous aimerez peut-être aussi