Vous êtes sur la page 1sur 5

4 th International Multi-Track Conference on Sciences, Engineering & Technical Innovations October 5-6, 2018

Twitter Sentiment Analysis of Indian Telecom


Companies for subscriber churn prediction

Sandeep Ranjan1 , Sumesh Sood2


Depart ment of Co mputer Science & Engineering
I K Gu jral Punjab Technical University
Kapurthala, India
1
ersandeepranjan@yahoo.com, 2 sumesh64@g mail.co m

Abstract—Predicting customer churn is a big challenge and a size is very large and based on a vast geographic area. It is
survival basic for Telecom operators. In a large and competitive difficult to monitor day in and day out customer responses in
market like India, it is very essential to gather real-time customer such complex segments.
feedback as a health indicator. S ocial networks have evolved as a
rich source of real-time sentiments and opinionsof the general With the emergence of the Web 2.0 era of the Internet, the
public.In this research, tweets for the Twitter handle of 5 major amount of the general public posted content has been rising
telecom brands in In dia: Aircel, Bharti Airtel, Idea Cellular, exponentially. Online social media has evolved as a platform
Reliance Jioand Vodafone India were extracted for six months to for informat ion exchange where users generate their own
develop a prediction model for telecom subscriber churn content, share their pictures, opinions and videos with other
prediction using the sentiment score. Naïve Bayes classifier fellow users[5]. Social networks like Twitter have evolved as
implementation and TextBlob library of Python were used to an accessible, rich and vast repository of user opinions and
assign polarities to user sentiments. Customer satisfaction sentiments. These days, most of the service providers including
represented by the overall monthly sentiment score has been used the hospitality industry, airlines, mobile operators, banks and
to predict customer churn. The predictions made by the model insurance companies are using social media as a means of
were validated using IBM S PSS and were within the acceptable developing and managing customer relations and attempt to
limits. The results of the sentiment analysis based prediction maximu m customers by initiating the word of
model can be of great use fortelecom operators to take timely mouth[6].Celebrity and brand Twitterhandle attract general
actions for improving the future customer expe rience and
public sentiments and attention[7]. The sentiments mined from
avoiding customer churn.
social media are used for feature extraction using various
Keywords—Customer Churn,Naive Bayes, Opinion Mining models to monitor the network behavior and even predict the
SentimentAnalysis, Social Network. behavior of the network.
The research presented used a Naïve Bayes classifier and
I. INT RODUCTION
Textblob library of Python to develop a prediction model for
Customer churn prediction, especially in the telecom sector, Indian telecom operators churn prediction in terms of monthly
has been attracting a number of researchers over the last few subscriber addition.Section 2 of the covers the review of the
years [1], [2]. The teleco m market is getting competitive day related work. Section 3 covers the dataset creation. Section 4
by day due to the launch of new products aiming at attracting covers the process of pre-processing, community detection,
new customers and also retaining the existing ones. Indian sentiment analysis and the proposed model of prediction.
customers have been recently given an option to port out of the
existing operator and shift to any other operator on the grounds II. LIT ERATURE REVIEW
of satisfaction, service issues or any other reason. Customer Social networks have a high potential of providing concrete
churn is an issue of concern for telecom operators as the cost of action leading insights and have become popular among
adding new customers is higher than the cost of retaining the researchers, academia and industry[8]. Social network data
existing ones. mining and sentiment analysis fieldshave become challenging
Telecom operatorsallocate huge amountsof theirbudgetfor and the problems researched applying them appear to be
brand promotion and monitoring and gathering of real-time exciting as organizations have started dealing with large
customer experience feedback[3]. This information is treated as volumes of complex and timely arriv ing data volumes using
a health indicator of sales, revenues and market position. the latesthardware and software.Ext racting reliable and
Traditional monitoring and feedback methods prove to be a actionable inputs from the data pool is a resource consuming
costly affair and have significant latency issues[4]. The and challenging task. It is required for a successful business to
compilation of feedback results issubmitted to higher master this art of predictive analysis from the available
management with a long delay of upto a couple of weeks after information to stay ahead of competitors and also stay in the
the feedback collectionstarted. With this time lag, most market competition
factors and competitor’s strategieschange a lot. This delay and Thanks to the recent smartphone revolution and new
ineffective planning can lead to customer churn and ultimately technologies in the telecom sector,the mobile phone density
decrease in market share.Planning and implementation of has reached higher levels[9]. In this scenario, subscriber churn
corrective actions require still more time . Market segments like has raised alarms for telecom operators as customerschange
Fast Moving Consumer Goods (FMCG), teleco m sector and their operators for service and quality reasons. Telecom
the movie industry are the most affected ones astheir customer operators usevarious churn prediction techniques and

ISBN : 978-81-929077-8-9 P a g e | 317


4 th International Multi-Track Conference on Sciences, Engineering & Technical Innovations October 5-6, 2018

algorithms based on metrics like network performance, service B. Tweet Preprocessing


usage, pricing andlocal informat ion. Recent advancement in Some preprocessing and cleaning of text are required
these techniques is the application of social network sentiment
before sentiment analysis can be performed on it. The research
analysis to identify customers who could become potential
selected Tweet-preprocessor, a tweet cleaning and
churners if corrective and timely actions are not applied.
preprocessing Python library.It has inbuilt functions capable
Additionally, a subscriber who churnsout will have a great
impact on other subscribers in his social circle. of parsing and cleaning tweet datasets and supports Python 2.7
and 3.3+. Its functionssupportfiltering and generating,
Precious knowledge is hidden in the large quantity of data tokensforemojis-smileys, special characters, Twitter reserved
stored in the repository of social network media min ing which words
has become a basic ingredient for successful and
effectivestrategicmarketing campaigns[10]. The researchers C. Community Detection
surveyed social network data and the current techniques that Detection of network co mmunit ies by examining the
deal with it. They concluded that social network sentiment clustering technique is a complicated and detailed task [13].
analysis is capable of transforming the human society and There are a large number of algorithms available for
promoting a huge range of benefits such as the higher profits community detection in social networ ks. The research
due to the increase of operating margin, higher emp loyment presented carried out the network commun ity structure
rates, promising and attractive market figures, saving of time discovery based on betweenness centrality using the Louvain
due to timely decision making and finallyincrease in customer method. It involves the concept of network modularity.
satisfaction.
Consider a graph G (V,E) co mposed of a set of vertices V and
With the growing size and penetration in the lives of a set of edges E. Let G be co mposed of M communit ies and Ls
individuals, community detection has gained much interest in be the number of edges shared by the nodes of s-th community
recent times. Co mmunity detection algorithms aim at and ds is the sum of degrees of the s-th commun ity nodes. The
discovering striking patterns in the form of co mmunities in the network modularity of G is calculated as:
network graphs. Graphs can be represented as supersets of
densely connected subgraphs. There is a need to identify such
subgraphs which have more interconnection edges among their
constituent nodes. In its simp lest form, the community
detection problem can also be described as a clustering The study used Generalized Louvain method implemented
problem[11]. The algorithm proposed by the researchers in Python to detect communities. A subgraph with a higher
initially works on edge centrality and k-paths and in the next betweenness centrality value represents a well-connected,
step calculates the node pairwise proximity for the network. influential and concentrated informat ion region [14].
Betweenness measure has been used in railway networks,
The process of social network s entiment analysis is similar
to opinion min ing. As the present day socialnetworks contain freight networks and many other real-life networks to
big data made up of large amounts of opinions and content, determine bottlenecks and other problems. A node with
summarizing the opinions and sentiments is necessary to comparatively higher betweenness value than other nodes acts
convert it into meaningful information [12]. Sentiment analysis as a traffic checkpoint and can shut down or boost the network
research has taken a multidisciplinaryform enco mpassing traffic [15].The co mmunit ies detected using Louvain
computer science, social scienceand many areas of implementation in Python are selected for further processing
management sciencegainingsignificance inacademia, research, in the research. Self-loops or the nodes which don’t for a part
business and society as a whole. Support Vector Machine, of any online word of mouth are rejected in this phase of the
Naïve Bayes classifier andArtificial Neural Networksare the experiment as they don’t have any influence on other network
most popular and efficient techniques for performing sentiment users.
analysis on social network datasets . This research presents a
method for developing a prediction model based on sentiment IV. SENT IMENT A NALYSIS
analysis using RapidMiner. Sentiment analysis is defined as the processof analyzing
user sentiments, emotions andopinions towards real-
III. DAT A MINING &COMMUNITY DET ECTION worldentit ies likebrands, organizations, services, fellow
A. Data Mining individuals and events expressed on public platforms [16].
Opinions expressedon social network contain user sentiments
Tweepy, a popular Python library for Twitter mining was
thatcan be classified into a positive, negative and neutral
used to fetch tweets for telecom handles. In the research,
category. It is very important for brand promoters to capture
tweets for #aircel, #airtelindia, #ideacellular, #reliancejio,
the feelings of their existing customers or prospective
#vodafonein the Twitter handles for the Teleco mm co mpanies,
customers[17].
were extracted on a daily basis fro m 1st August 2017 to 31st
January 2018. A ircel India wh ich had been operating in all A. Naïve Bayes Classifier
telecom circles in India for the last many years started losing Naïve Bayes is a supervised probabilistic classifier which
its subscribers. This subscriber churn resulted in the addition has the ability to learn the patternsfrom the given
of new subscribers to other operators particularly Reliance Jio documents[18], [19]. It evaluates the contents against the given
which emerged as the biggest gainer in terms of subscriber set of words or a user-defined dictionary to categorize the
addition.After applying filter to get the distinct tweets, tweets source documents to assign their contents to the correct class or
for #aircel, #airtelindia, #ideacellu lar, #reliancejio, category. Let d be the tweet and c be a class that is assigned to
#vodafonein were 18268, 36142, 26124, 42,764 and 29564 d, where
respectively.

ISBN : 978-81-929077-8-9 P a g e | 318


4 th International Multi-Track Conference on Sciences, Engineering & Technical Innovations October 5-6, 2018

Based on the final polarity value i.e. Polarity_Final, h ighly


positive sentiment tweets or blogs were labeled as P+, slightly
In the above equations, (f) is a feature, (fi) is the feature positive sentiment tweets or blogs were labeled as P, highly
count denoted with ni (d) and is present in d which represents negative sentiment tweets or blogs were labeled as N+, slightly
individual tweets and (m) denotes the number of features. P(c) negative sentiment tweets or blogs were labeled as N. Weights
and p(f|c) are computed through maximu m likelihood were assigned to the tweets or blogs as shown in the table 1.
estimates. Python functions and libraries are used to train and TABLE I. POLARIT Y AND WEIGHT S
classify using Naïve Bayes classifier based machine learn ing.
Range
B. TextBlob -
-
0.1 -0.1 0.1
TextBlob is a Python library built on top of the Natural Polarity_Final 0.5 0.5
to to to
to1.0 to
Language ToolKit(NTLK). It is relatively easier to learn and 0.5 0.1 -
-1
implement andhas a range of features covering the domain of 0.5
sentiment analysis, and noun-phrase extraction.The Polarity N+ N NEU P P+
TextBloblibrary enables automated and convenient procedures Weight -2 -1 0 1 2
for various aspects of Natural Language Processing tasks. The TABLE II. POLARITY DIST RIBUTION FOR TELECOM HASHT AGS
Application Programming Interface of TextBlob is similar to FOR AUGUST 2017
other Python scripts. Tokenization breaks down or
decomposesphrases and sentence of the given text into tokens, Hashtag N+ N NEU P P+
which are similar to the words in natural languages. It is done #aircel 648 708 780 1190 829
in two steps: #airtelindia 1147 1210 678 2329 2068
• Create a TextBlob object.Test strings are passed to this #ideacellular 901 451 1327 1805 893
object. #reliancejio 176 491 1258 4456 3032
#vodafonein 1526 1041 898 2227 2784
•Call various functions from theText Bloblibrary for
performing various task-specific calculations. V. PREDICT ION M ODEL
The most common task performed by TextBlob is Using the polarity values, the overall sentiment score for
sentiment analysis for which there is the sentiment the telecom operators was calculated for each month. Table 3
function.This function returns a tuple of the form (polarity, shows the sentiment scores for the month of August, 2017
subjectivity) where polarity is a floating point value within the which was taken as the base month for the experiment. The
limits [-1.0, 1.0] (-1.0 means a pure negative sentiment, 0 is a month wise predicted values of the telecom operators were
neutral sentiment and 1.0 means a pure positive sentiment) and compared with the month wise subscriber addition data was
subjectivity is a floating point value within the limits [0.0, 1.0] fetched from the website of the Telecom Regulatory Authority
(0.0 is highly objective and 1.0 is highly subjective). of India (TRAI).Figure 1 shows the prediction model
developed in the research. Table 4 shows the projected values
In the research presented, the polarity and subjectivity
of month wise community sentiment score for the telecom
values are calculated for each tweet. The negative polarity
operators and Table 5 shows the subscriber addition data
values or the polarity values less than zero represent a negative
obtained from TRAI https://trai.gov.in/release-
sentiment and the positive polarity values or the polarity values
publication/reports/telecom-subscriptions-reports.
greater than zero represent a positive sentiment. The final
polarity is calculated as the product of polarity and subjectivity
value.

TABLE III.TELECOM SENT IMENT SCORES FOR AUGUST 2017

Polarity → N+ N NEU P P+
Total
Weight (W) → -2 -1 0 1 2

No of tweets for #aircel (AC) → 648 708 780 1190 829 4155
(W*AC) → -1296 -708 0 1190 1658 844

No of tweets for #airtelindia(AT) 1147 1210 678 2329 2068 7432


(W*A) → -2294 -1210 0 2329 4136 2961
No of tweets for #ideacellular (I) 901 451 1327 1805 893 5377
(W*A) → -1802 -451 0 1805 1786 1338
No of tweets for #reliancejio(R) 176 491 1258 4456 3032 9413
(W*R) → -352 -491 0 4456 6064 9677
No of tweets for #vodafonein (V) 1526 1041 898 2227 2784 8476
(W*V) → -3052 -1041 0 2227 5568 3702

ISBN : 978-81-929077-8-9 P a g e | 319


4 th International Multi-Track Conference on Sciences, Engineering & Technical Innovations October 5-6, 2018

Fig 1. Sentiment analysis based prediction model


TABLE IV TELECOM TWEET COMMUNITY SENTIMENT SCORE (PREDICTED GROWTH)

APRIL MAY JUNE JULY AUGUST


March Sentiment Growth Sentiment Growth Sentiment Growth Sentiment Growth Sentiment Growth
Hashtag Base Score rate (%) Score rate (%) Score rate (%) Score rate (%) Score rate (%)
#aircel 844 837 -0.83 831 -0.72 816 -1.81 762 -6.62 701 -8.01
#airtelindia 2961 2967 0.2 2987 0.67 3094 3.58 3101 0.23 3156 1.77
#ideacellular 1338 1316 -1.64 1339 1.75 1387 3.58 1428 2.96 1429 0.07
#reliancejio 9677 9915 2.46 10276 3.64 10904 6.11 11348 4.07 11865 4.56
#vodafonein 3702 3719 0.46 3761 1.13 3776 0.4 3781 0.13 3887 2.8

TABLE V TELECOM SUBSCRIBER ADDITIONS (ACT UAL GROWTH)

SEPTEMBER '17 OCTOBER '17 NO VEMBER '17 DECEMBER '17 JANUARY '18
Te lecom August17 Subscriber Growth Subscriber Growth Subscriber Growth Subscriber Growth Subscriber Growth
ope rator Base Addition rate (%) Addition rate (%) Addition rate (%) Addition rate (%) Addition rate (%)
Aircel
89146187 -394209 -0.44 -497264 -0.56 -665916 -0.75 -2654140 -3.03 -3491753 -4.11
Bharti
281043837 1003632 0.36 3148164 1.12 4340871 1.52 576575 0.2 1502755 0.52
Airtel
Idea
191059301 -904137 -0.47 713408 0.38 3198450 1.68 2431152 1.25 1144631 0.58
Cellular
Reliance
132679328 5936576 4.47 7344371 5.3 6117260 4.19 8013707 5.27 8300054 5.18
Jio
Vodafone
208144702 -700687 -0.34 879413 0.42 2701848 1.3 1503087 0.71 1282261 0.6
India

ISBN : 978-81-929077-8-9 P a g e | 320


4 th International Multi-Track Conference on Sciences, Engineering & Technical Innovations October 5-6, 2018

TABLE VI. IBM SPSS CORRELATION TEST BASEDVALIDATION


Month September October November December January
Pearson Correlation 0.879 0.884 0.925 0.906 0.911
Significant value 0.049 0.046 0.024 0.034 0.031

Conference on Big Data Computing Service and Applications,


VI. VALIDAT ION BigDataService 2015, 2015, pp. 446–452.
The results obtained from the proposed research model [9] J. Spiess, Y. T. Joens, R. Dragnea, and P. Spencer, “Using Big Data to
Improve Customer Experience and Business Performance,” Bell Labs
were validated using IBM SPSS 24 by applying correlation Tech. J., vol. 18, no. 4, pp. 3–17, 2014.
analysis as shown in Table 6.The validation tests were [10] P. Ducange, R. Pecori, and P. Mezzina, “A glimpse on big data analytics
conducted on the month wise predicted growth rate (sentiment in the framework of marketing strategies,” Soft Comput., vol. 22, no. 1,
score growth rate %) and actual growth rate (subscriber pp. 325–342, 2018.
addition growth rate %). The correlation values between the [11] P. De Meo, E. Ferrara, G. Fiumara, and A. Provetti, “ Generalized
Louvain method for community detection in large networks,” in
two variables are within the significance limits (>0.85) wh ich International Conference on Intelligent Systems Design and
validate the findings of the prediction model. Applications, ISDA, pp. 88–93,2011.
[12] A. F. Alsaqer and S. Sasi, “Movie review summarization and sentiment
VII. CONCLUSION analysis using rapidminer,” in 2017 International Conference on
Networks and Advances in Computational T echnologies, NetACT 2017,
Keeping a close watch on customer feedback and opinions pp. 329–335. 2017
is animportant task for strategic planning. It can help in [13] J. Leskovec, K. J. Lang, and M. W. Mahoney, “Empirical Comparison
predicting customer churn which can be prevented by taking of Algorithms for Network Community Detection,” in Proceedings of
timely actions and can attract new customers. The research the 19th International Conference on World Wide Web, , pp. 631–640.
employed Twitter commun ity sentiment analysis to fetch real- 2010
[14] T. T sekeris, “Interregional trade network analysis for road freight
time sentiments of the general public which represent both transport in Greece,” Transp. Res. Part E Logist. Transp. Rev., vol. 85,
customers and potential customers . The proposed model using pp. 132–148, 2016.
month wise sentiment score of Twitter hashtags of Indian [15] A. G. Nikolaev, R. Razib, and A. Kucheriya, “On efficient use of
Telecomm operators successfully predicted their growth rate in entropy centrality for social network analysis and community detection,”
Soc. Networks, vol. 40, pp. 154–162, 2015.
terms of subscriber addition. [16] N. M. Sharef, H. M. Zin, and S. Nadali, “Overview and future
The Twitter sentiment score obtained from the dataset opportunities of Sentiment Analysis approaches for big data,” J.
Comput. Sci., vol. 12, no. 3, pp. 153–168, 2016.
communities which is the online word of mouth representing [17] R. Sharma, V. Ahuja, and S. Alavi, “The Future Scope of Netnography
the wisdom of the crowds accurately predicts the popularity and Social Network Analysis in the Field of Marketing,” J. Internet
and success of the Telecomm operators. The success of Commer., vol. 2861, pp. 1–20, 2018.
Telecomm operators is reflected in the month wise subscriber [18] V. A. Kharde and S. S. Sonawane, “Sentiment Analysis of T witter Data:
addition data.A positive sentiment score of a company is an A Survey of Techniques,” Int. J. Comput. Appl., vol. 139, no. 11, pp.
975–8887, 2016.
indicator of the brand preference of the public and a negative [19] P. Gamallo and M. García, “Citius: A Naive-Bayes Strategy for
sentiment score indicates customer dissatisfaction or Sentiment Analysis on English T weets,” in 8th International Workshop
inclination towards any other company which is better suited to on Semantic Evaluation (SemEval 2014), , no. SemEval, pp. 171–175.
their requirements. Data mining and sentiment analysis 2014
[20] A. M. Almana, M. S. Aksoy, and R. Alzahrani, “A Survey On Data
techniques can be used by managers to take timely actions to Mining Techniques In Customer Churn Analysis For Telecom Industry,”
predict and prevent such customer churn [20], [21]. J. Eng. Res. Appl., vol. 4, no. 5, pp. 165–171, 2014.
[21] W. Verbeke, D. Martens, and B. Baesens, “Social network analysis for
REFERENCES customer churn prediction,” Appl. Soft Comput. J., vol. 14, no. PART C,
pp. 431–446, 2014.
[1] [1] M. Oskarsdottir, C. Bravo, W. Verbeke, C. Sarraute, B. Baesens,
and J. Vanthienen, “A comparative study of social network classifiers
for predicting churn in the telecommunication industry,” in 2016
IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining (ASONAM), , pp. 1151–1158. 2016
[2] A. Amin et al., “ Customer churn prediction in the telecommunication
sector using a rough set approach,” Neurocomputing, 2017.
[3] V. Mahajan, R. Misra, and R. Mahajan, “Review on factors affecting
customer churn in telecom sector,” Int. J. Data Anal. T ech. Strateg., vol.
9, no. 2, p. 122, 2017.
[4] S. Ranjan and S. Sood, “Analyzing Social Media Community Sentiment
Score for Prediction of Success of Bollywood Movies,” Int. J. Latest
Eng. Manag. Res., vol. 3, no. 2(S), pp. 80–88, 2018.
[5] M. Nanda, C. Pattnaik, and Q. Lu, “Innovation in social media strategy
for movie success: a study of the Bollywood movie industry,” Manag.
Decis., vol. 56, no. 1, pp. 233–251, 2018.
[6] S. Hudson, M. S. Roth, T. J. Madden, and R. Hudson, “The effects of
social media on emotions, brand relationship quality, and word of
mouth: An empirical study of music festival attendees,” Tour. Manag.,
vol. 47, pp. 68–76, 2015.
[7] M. Ghiassi, J. Skinner, and D. Zimbra, “T witter brand sentiment
analysis: A hybrid system using n-gram analysis and dynamic artificial
neural network,” Expert Syst. Appl., vol. 40, no. 16, pp. 6266–6282,
2013.
[8] D. Arora and P. Malik, “Analytics: Key to go from generating big data
to deriving business value,” in Proceedings - IEEE 1st International

ISBN : 978-81-929077-8-9 P a g e | 321

Vous aimerez peut-être aussi