Vous êtes sur la page 1sur 5

SENTIMENT ANALYSIS OF APPLICATION REVIEWS

ON GOOGLE PLAYSTORE
Saarim Momin Sangeeta Panigrahi Pooja Patil Parag Tetambe
BE(IT) BE(IT) BE(IT) BE(IT)
Department of Information Technology, Padmabhushan Vasantdada Patil Pratisthans College Of
Engineering Mumbai University, Mumbai, India.

Abstract-Posting reviews online has become an example, an application developer may be interested
increasingly popular way for people to express opinions in the following questions:
and sentiments towards the products bought or services • What do people think about our
received. Analyzing the large value of online reviews product (service, company etc.)?
would produce useful actionable knowledge that could
be of economic values to vendors and other interested
parties. Sentiment Analysis refers to the use of natural • How positive (or negative) are
language processing, text analysis and computational people about our product?
linguistics to identify and extract subjective information
in source materials. With the increasing use of smart • What would people prefer our product to be
phones which leads to increasing use of vast varieties of like?
applications. The developers of these applications will The need to collect opinions for many applications
have the need to keep their particular applications up to
and draw conclusions that what people like/dislike,
date in order to keep their particular application in the
top lists. has been the most important aspect in today’s
scenario. The objective of this paper is to discuss
Keywords-Social media, Learning Approaches, concept of sentiment analysis of application reviews
sentiment analysis on google playstore and perform comparative study
of its various techniques.
I. INTRODUCTION
With the explosive growth of social media (i.e., II. LITERATURE REVIEW
reviews, forum discussions, blogs and social The various methods or approaches that are used over
networks) on the Web, individuals and organizations a period of time to help the developers find the
are increasingly using public opinions in these media sentiments for the opinion and sentimental scores
for their decision making. However, finding and vary on various factors. The various types of
monitoring opinion sites on the Web and distilling Methods came across are
the information contained in them remains a 1. Lexicon based techniques.
formidable task because of the proliferation of 2. Machine learning based techniques.
diverse sites. Lexicon based techniques use a dictionary to perform
Everything the project will be expected to do the entity-level sentiment analysis. This technique uses
earlier stage will comprise of the extraction of the dictionaries of words annotated with their semantic
user reviews for the purpose of analysis on it. The orientation (polarity and strength) and calculates a
user reviews will consists of the text feedback. The score for the polarity of the document. Usually this
sentiment analysis will be carried on which will method gives high precision but low recall.
create score for each text review generated.
The evaluation and the estimation process will then Learning based techniques require creating a model
help the developer to study the various possible areas by training the classifier with labeled examples. This
of improvement and expertise, which will help the means that you must first gather a dataset with
developer to create a better application in future for examples for positive, negative and neutral classes,
the users. The high level features and information can extract the features/words from the examples and
be extracted on the basis of the information generated then train the algorithm based on the examples.
and sentiment scores are evaluated. The dataset Choosing which method to use heavily depends on
collected from all these sites can be effectively and the application, domain and language. Using lexicon
efficiently used for marketing, social networking. For based techniques with large dictionaries enables us to
achieve very good results. Nevertheless they require usually do not give domain or context dependent
using a lexicon, something which is not always meanings.
available in all languages. On the other hand
Learning based techniques deliver good results
nevertheless they require obtaining datasets and
require training. III. PROPOSED SYSTEM

Fig 2.1: Approaches of sentiment analysis Fig 3.1: Overview of this approach
Machine learning tasks are typically classified into As above diagram explains, everything the project
three broad categories, depending on the nature of the will be expected to do the earlier stage will comprise
learning "signal" or "feedback" available to a learning of the extraction of the user reviews for the purpose
system. These are: of analysis on it. The user reviews will consists of the
text feedback. The sentiment analysis will be carried
Supervised learning: The computer is presented with on which will create score for each text review
example inputs and their desired outputs, given by a generated.
"teacher", and the goal is to learn a general rule that
maps inputs to outputs.
The evaluation and the estimation process will then
help the developer to study the various possible areas
Unsupervised learning: No labels are given to the
of improvement and expertise, which will help the
learning algorithm, leaving it on its own to find
developer to create a better application in future for
structure in its input. Unsupervised learning can be a
the users. The high level features and information can
goal in itself (discovering hidden patterns in data) or
be extracted on the basis of the information generated
a means towards an end.
and sentiment scores are evaluated.
Sentiment lexicon - Sentiment words or phrases (also
3.2 PROCESSING STEPS
called polar words, opinion bearing words, etc. ) E.g.,
There are standard methods involved in above
Positive: beautiful, wonderful, good, amazing.
techniques. Those are as follows:
Negative: bad, poor, terrible, cost an arm and a leg.
Many of them are context dependent, not just
3.2.1 Data Collection
application domain dependent. Two main ways to
The data collection (data RAW) from google
compile such lists:
playstore is collected for analysis.
Corpus-based approaches :Often used as a double
3.2.2 Preprocessing
propagation between opinion words and the items
Data in the form of raw comments is acquired by
they modify require a large corpus to get good
using the python library which provides a package
coverage.
for simple processing through application interface.
A comment acquired by this method has a lot of raw
Dictionary-based methods: Typically use wordnet’s
information in it which we may or may not find
synsets and hierarchies to acquire opinion words and
useful for our particular application. It comes in the building on Java regular expressions. The goal of this
form of the python “dictionary” data type with Annotator is to provide a simple framework to allow
various key-value pairs. Since a lot of information we a user to incorporate NE labels that are not annotated
only filter out the information that we need and in traditional NL corpora. For example, a default list
discard the rest. of regular expressions that we distribute in the
3.2.3 Classification models file recognizes ideologies (IDEOLOGY),
For the purpose of classification we will classify the nationalities (NATIONALITY), religions
comments into three types (RELIGION), and titles (TITLE).

• Positive: If the entire review has a • Stop-words removal: Stop words are class of some
positive/happy/excited/joyful attitude or if something extremely common words which hold no additional
is mentioned with positive connotations. Also if more information when used in a text and are thus claimed
than one sentiment is expressed in the comment but to be useless . Examples include “a”, “an”, “the”,
the positive sentiment is more dominant. Example: “he”, “she”, “by”, “on”, etc. It is sometimes
“It is a great application and works perfectly fine on convenient to remove these words because they hold
my phone!!”. no additional information since they are used almost
equally in all classes of text, for example when
• Negative: If the entire comment has a computing prior-sentiment-polarity of words in a
negative/sad/displeased attitude or if something comment according to their frequency of occurrence
mentioned with negative connotations. Also if more in different classes and using this polarity to calculate
than one sentiment is expressed in the comment but the average sentiment of the tweet over the set of
the negative sentiment is more dominant. Example: words used in that comment.
“I did not like the application as it provides too many
tabs and lags too much on my phone”. IV. COMPARATIVE STUDY
Comparative Analysis will focus on comparing
• Neutral/Objective: If the creator of review various important aspects of the Implementation.
expresses no personal sentiment/opinion in the Comparing the various parameters such comparison
comment and merely transmits information. is crucial for understanding the potential limitations,
Example: “I recently downloaded this application on advantages, and disadvantages of popular methods in
my new phone”. analyzing the content.

3.2.4 Feature Extraction This comparison needs to be done in order to have an


• Tokenization: It is the process of breaking a stream analyzer for various types of emotions, sentiments,
of text up into words, symbols and other meaningful attitudes, opinions, feelings and affects and provide
elements called “tokens”. Tokens can be separated by maximum amount of accuracy.
whitespace characters and/or punctuation characters.
It is done so that we can look at tokens as individual This will also help to study the drawbacks and the
components that make up a tweet. shortcomings in various algorithms whereas at the
same time will help us too see the algorithms and
• Sentence splitting: Splits a sequence of tokens into techniques which will provide us excellent results.
sentences. The dataset used will also play an important role in
deciding the accuracy level and the total calculation
• Parts-of-Speech Tagging: POS-Tagging is the of the final results play an important role in deciding
process of assigning a tag to each word in the the accuracy level and the total calculation of the
sentence as to which grammatical part of speech that final results.
word belongs to, i.e. noun, verb, adjective, adverb,
coordinating conjunction etc. The following table will give us a detailed idea
regarding various implementations techniques which
• Named Entity Recognisation(ner): Recognizes will help us to decide over a particular analyzing
named (PERSON, LOCATION, technique which will help us figure out the most
ORGANIZATION, MISC) and numerical (MONEY, accurate results and provide us with the correct
NUMBER, DATE, TIME, DURATION, SET) sentiments.
entities. With the default annotators, named entities
are recognized using a combination of CRF sequence
taggers trained on various corpora. Also Implements
a simple, rule-based NER over token sequences
more refined and critic free applications on Google
playstore. As the user reviews of apps vary from
category to category, the proposed procedure is
efficient from that point of view. The result is
sufficient up to the mark for judging the Android App
and the developers are also able to predict the
problem and the improvement needed in the app for
its popularity within less time. There is major
advantage of sentiment analysis in mobile
environment of analyzing the reviews of users using
Google apps.

VII.REFERENCES

Fig.4.1: Comparative Analysis

V. FUTURE SCOPE
The progressive phases will comprise of the
development of the application interface along with
some added functionalities of more detailed and fine
grained results for the application developers with the
ability to make it easier for the developer to utilize all
the features from the system.

VI.CONCLUSION
The approach for extracting app features mentioned
in user reviews and their associated sentiments.
These can help app analysts and developers to
analyze and quantify users’ opinions about the single
app features and to use this information e.g., for
identifying new requirements or planning future
releases.
A number of works has been done for the informal
reviews and blogs. But we look forward to work on
the application world and help to provide users with

Vous aimerez peut-être aussi