Vous êtes sur la page 1sur 10

Detecting Potential Cyber Armies of

Election Campaigns Based on Behavioral


Analysis

Ming-Hung Wang(B) , Nhut-Lam Nguyen, and Chyi-Ren Dow

Department of Information Engineering and Computer Science,


Feng Chia University, Taichung, Taiwan
{mhwang,lam,crdow}@mail.fcu.edu.tw

Abstract. Recently, online social networks have been popular for elec-
tion campaigns to monitor public opinion, spread information, and even
try to influence the discussion among the platforms. In this study, we
focus on the collusion of potential political teams or individuals who
attempt to produce discussions and a series of positive/negative com-
ments to support/attack specific candidates. We collect the user behavior
in the most extensive online forum in Taiwan before a national election
and use statistical analysis to identify such users. We also verify the
results by manually reading the published content of the users. From
the results, we hope this study can benefit users to identify underneath
information manipulation and retain the trustiness of online society.

1 Introduction
Online social media such as Twitter, Facebook, and Reddit have been important
platforms for political discussions, as many users publish, endorse and comment
on political issues and their thoughts. Such phenomenon attracts mass media,
political organizations, the government, and even individuals to understand the
online public opinion. Furthermore, scientists use the online data to predict the
election results [7, 10, 11].
With the increasing concentration from the public to these platforms, polit-
ical organizations start to conduct election campaigns on social networks. They
may recruit professional web users, who we called “cyber army,” to monitor dis-
cussions, promote their candidates, and even try to influence the public opinion.
However, when such behaviors become common, normal users are not easy to
distinguish whether messages are from election campaigns or the public. More-
over, some users may leave when they consider cyber armies have flooded the
platforms.
To retain the trustiness of online platforms about the political messages, in
this study, we aim to identify the professional users according to their behaviors.
First, we consider the recruited users should be firm in their political stands.
They may endorse specific candidates and attack other candidates. To investigate
such characteristics, we attempt to address the following research questions.
c Springer Nature Switzerland AG 2019
L. M. Aiello et al. (Eds.): COMPLEX NETWORKS 2018, SCI 813, pp. 437–446, 2019.
https://doi.org/10.1007/978-3-030-05414-4_35
438 M.-H. Wang et al.

R1: Is there a group of users who always give negative or positive


ratings to attack or support certain candidates?
Second, as cyber armies are recruited, they should spend more time or even
be online all the time to respond the attacks and promote their candidates in
a speedy way. To figure out such users, the research question R2 needs to be
answered.
R2: Is there a group of users who are always online and rapidly
respond to any information about candidates?
To address the above two questions, we initially collect a dataset from the
most significant political discussion forum in Taiwan, where a national election
will be held in November 2018. The collection contains more than 10 thousand
articles published from 2018/01–2018/08. We leverage statistical methods to
investigate the article rating and comment behaviors. From the results, we suc-
cessfully find potential cyber armies with different characteristics compared with
other users. Also, we manually verify the results by investigating the published
comments of these users.
This paper is organized as follows. In Sect. 2, we briefly introduce the back-
ground of using statistical analysis to identify influential authors on the Inter-
net. The overview of our proposed approach and data collection are presented in
Sect. 3. In Sect. 4, we exhibit the results according to our method. We discuss
some issues to address in the future and conclude our research in Sect. 5.

2 Related Works
Understanding user influence in social networks is getting more attention from
researchers [1–3, 8]. Many scholars have examined the comments, tweeting, and
retweeting to analyzing user behavior. In [6], Lei et al. identified the character-
istics of the temporal tweeting, retweeting, and commenting on Weibo’s users
to cluster the users into different groups. Jamali et al. [5] used the user’s com-
ments to identify user relationship and predict the popularity of the comment.
To analyze user influence from different microblogging services, Tsugawa et al.
[9] proposed an influencer detection method that used sampling data from Twit-
ter and Facebook to identify influencers. Different metrics have been used for
evaluating the reliability of the method such as degree, closeness, and PageR-
ank. However, this study is not a suitable solution for a social network such as
political networking site due to its special characteristic.
In politics domain, social networks have been used as a tool for diffusion
information and exchanging political opinion. Hoang et al. [4] examined political
tweets in Twitter to evaluate the influence of tweet sentiment on user behavior.
They found that the sentiment impacts the user in different ways. In [13], Wong
et al. proposed a method to infer the political leaning of Twitter users based
on their tweets and retweets. Wang and Lei [12] provide regression models to
predict influence authors and articles by evaluating the very first rating score
and reply quantity.
Some studies have focused on analyzing online messages to predict the elec-
tions. Tumasjan et al. [10] collected 100,000 Twitter messages relating to the
Detecting Cyber Armies Based on Behavioral Analysis 439

German federal election and the popularity of candidates on Twitter matches


to the vote outcome. O’Connor et al. [7] also provide the predictability of Twit-
ter data using sentiment analysis. Wang and Lei [11] provide a hybrid method
including the number of articles, the sentiment score of articles, and the online
ratings toward candidates to predict the election results. However, there are still
very few works concentrating on identifying grouped political behavior online to
influence the public opinion. This study provides a practical example to demon-
strate the phenomenon and present identification to address such issues.

3 Methodology

To identify potential cyber armies, we start by distinguishing them by apparently


different user behaviors. Comparing with normal users, we think the recruited
users should have at least two features. (1) These users should be persistent
and eager to express their support/rejection of certain candidates. (2) These
users should spend more time on the forum and may respond to articles rapidly.
To identify these professional users, we collect data from the largest political
discussion forum in Taiwan, where a national election will be held in November
2018. The dataset is described as follows. The dataset was crawled from one of
the most popular bulletin board systems in the world called PTT Bulletin Board
System (PTT). PTT is also the most influential forum in Taiwan and consisted of
some groups focusing on different topics. The most popular one, “Gossiping,” is
concentrated on political discussions. We collect all articles and comments from
January 01 to August 01, 2018, a 7-month-long observation before the national
election. Each item posted on PTT is comprised of the following information:

1. Author information: including the author ID, nickname, and IP address.


2. Article metadata: publication time, IP address.
3. Article content: textual part of the article.
4. User comment and rating: user can give positive/neutral/negative ratings
to an article with a comment.

From the crawled dataset, we extracted the articles into 3-subdataset, accord-
ing to the articles related to the 3 major candidates, Wen-Je Ko (the current
mayor of Taipei city), Wen-Chih Yao (nominated by the ruling party, Democratic
Progressive Party), and Shou-Chung Ting (nominated by the major opposition
party, Kuomingtang). A summary of our dataset is demonstrated in Table 1.

Table 1. A summary of our dataset

Candidate # articles # commenters # authors # comments


Wen-Je Ko 8,408 43,490 2,569 610,936
Wen-Chih Yao 3,608 24,693 1,372 213,623
Shou-Chung Ting 1,456 15,173 709 79,068
440 M.-H. Wang et al.

From the 7-month long observation on a single forum, we attempt to distin-


guish professional users from normal ones. Therefore, our methodology is estab-
lished on abnormal detection among all users in our data collection. We conduct
our analysis on the peer-to-peer ratings, the commenter-candidate relationships,
and the response time of users.

4 Result

4.1 Candidate Popularity

From Table 1, we can find the current city mayor, Wen-Je Ko, is the most
popular of the three candidates. From Fig. 1, we can see the AUC (area-under-
curve) of Ko is the largest, indicating that Ko receives much more attention
than the other two. From another viewpoint, we can see the distribution of
# comments (x-axis) and # commenters (y-axis) in the same figure. From the
figure, the distributions of the three candidates follow the power-law distribution
(for better readability, we take the logarithm on both axes), which demonstrates
that most of the commenters have a small number of comments while only a
small portion of commenters publish a significant number of comments. These
active commenters appear very often in articles related to candidates. Comparing
with most users who do not comment very often, we consider these users should
be investigated more.

Fig. 1. The distribution of comments given by commenters

4.2 Commenters Polarity

Focusing on active commenters, we attempt to answer the research question R1


by investigating comment activities of every users.
Detecting Cyber Armies Based on Behavioral Analysis 441

R1: Is there group of commenters who always give negative or


positive rating to attack or support certain candidates?
To address question R1, we count the number of comments of each com-
menter in our dataset. Only the top 100 commenters who have given the highest
comments are investigated in this part, as we like to concentrate on studying the
most active users. Because we want to identify if there are groups of people who
constantly giving positive/negative ratings toward certain candidates, for each
commenter in the selected commenters set, we sum up the number of negative
ratings and positive ratings, denoted as NR and PR, respectively. The polarity
of rating of commenter c can be calculated as follows:

P olarityc = P Rc − N Rc (1)

Fig. 2. Analysis of polarity and number of articles of the top 100 commenters for each
candidate

Comment Quantity and Polarity of the Top Commenters. Figure 2


demonstrates the top commenters on articles about the three candidates. We also
denote the top 20% commenters with most article commented or most positive
ratings in yellow diamonds. From the figure, we find the commenters of the three
candidates behave differently. Top commenters are much more active in Ko’s
442 M.-H. Wang et al.

discussions than in the other two candidates. Because Ko is emerging and getting
popular on the Internet in the last few years, the result is reasonable. We also
find Yao’s commenters are more willing to comment than Ting’s commenters.
Interestingly, we find more comments do not correspond to higher online
ratings, especially for Yao. From Fig. 2, we find the slope of the correlation line
in Ko’s dataset is steeper than Ting’s and Yao’s with a positive relationship.
The results demonstrate the commenters willing to give more positive ratings
toward Ko’s discussions and deliver more negative responses to Yao’s articles
and Ting’s articles. The results match the latest election poll results 1 , where Ko
receives much more support than the other two candidates.
From the above results, we observe some outliers may be our targets, “pro-
fessional users,” who we like to identify in this study. However, from only one
polarity perspective, we consider it is not enough for us to judge these users to
be recruited by candidates. In the following paragraph, we discuss the between-
candidate polarity of each user to investigate.

Analysis of Polarity of the Top Commenters between Candidates. As


a user recruited by a political campaign should help to promote their candidates
and even attack other candidates, we show the polarity of top commenters toward
the three candidates as in Fig. 3. From the figure, we find user 010 (Ko: 132,
Yao: −245, Ting: −94) rates extremely negative to Yao and Ting, but gives
positive ratings to Ko. Also, 063 (Ko: 48, Yao: −17, Ting: −63), 052 (Ko: 97,
Yao: −19, Ting: −29), and 050 (Ko: 156, Yao: −29, Ting: −20) rates positively
to Ko and negatively to the other two candidates.
Compared with the abovementioned authors, another type is user 005 (Ko:
−200, Yao: 2, Ting: −5). The ID only gives a lot of negative ratings to Ko but
stays neutral to the other two candidates. These users reveal strong political
support/rejection toward candidates. However, the results are not enough to
judge a user a potential cyber army by merely obvious political tendency. We
discuss another metric – response time in the next analysis.

4.3 Commenters Response Time

We address the second research question in this section.


R2: Is there a group of users who are always online and rapidly
respond any information about certain candidates?
Even though we do not have the online duration or login/logout data of
each user, we consider the article response time could be another metric to
address the question. As when a user usually responds to an article in minutes,
he/she may be online and keep focusing on the posts. We investigate the top
100 commenters in each candidate’s dataset and present the median value of
response time of the commenters as in Fig. 4 and Table 2. We also denote the

1
Wikipedia maintains a series of poll results of the 2018 Taipei mayor election, as
shown in this Wiki entry.
Detecting Cyber Armies Based on Behavioral Analysis 443

Fig. 3. Analysis of polarity of comments of popular commenters on the 3 candidates

top 20% commenters with most comments or smallest response time in yellow
diamonds in the figure. From the results, we find user 010 and 063 filtered from
previous analysis responded very fast to articles related to the three candidates.
User 001 and 003 also reply rapidly to articles related to candidates. There are
also multiple users comment a lot on a single candidate and feedback to the
related articles in a short period.
Table 2 shows the polarity and response time of selected users toward
the three candidates. From the table, we find users 010, 063, 001, and 003
responded very fast (the median of response time <20 min) to candidates’ arti-
cles and they show apparent political leaning according to the ratings toward
three candidates2 . Based on the results, we manually verify our results with
the published content from these users. The material posted by the users cor-
responds to the polarity of their ratings toward the candidates. According to
the results, we consider these users are the users we like to identify in our two
research questions.

2
Here we apply a strict threshold (i.e., 20 min) to avoid false positive identification
on cyber armies. Meanwhile, there may be some cyber armies not recognized in the
above results.
444 M.-H. Wang et al.

Fig. 4. Analysis of response time (median) and number of comments of the top 100
commenters for each candidate

Table 2. Polarity and the median of response time (minutes) of selected users

ID Polarity Ko Polarity Yao Polarity Ting Response Ko Response Yao Response Ting
010 132.0 −245.0 −94.0 5.0 4.0 5.0
063 48.0 −17.0 −63.0 6.0 4.0 6.0
052 97.0 −19.0 −29.0 30.0 27.0 22.0
050 156.0 −29.0 −20.0 28.0 27.0 22.0
005 −200.0 2.0 −5.0 113.0 21.0 34.0
001 1,317.0 151.0 61.0 10.0 10.5 9.0
003 486.0 59.0 93.0 18.0 11.5 12.5

5 Conclusion

To decrease the influence of cyber armies of election campaigns and retrain the
trustiness of online platforms, we conduct behavioral analysis to identify poten-
tial recruited users using a 7-month observation of a popular forum. Starting
from analyzing the polarity of users toward candidates, we filter several users
who regularly rate positively/negatively to specific candidates. Second, from
another point of view, we investigate the response time of users after political
discussions were published. Other groups of users are found that they always
respond to articles in only 10 min after articles were posted online. Combining
Detecting Cyber Armies Based on Behavioral Analysis 445

the previous two findings, we manually check the response content of the detected
users, and they indicate their support to specific candidates. From this study, we
provide a study on potential cyber army identification. The main contributions
of this study are 3-fold:

1. We collect a 7-month-long data collection consisting of over 10 thousand


articles and 80 thousand users from the largest election discussion platform
in Taiwan.
2. We provide a series of analysis to distinguish cyber armies using two charac-
teristics, and several potential armies are identified from our results.
3. We verified the users identified by manually checking the content they pub-
lished corresponds to our analysis.

As the information on the Internet is getting popular and influential in con-


ventional mass media, retaining the trustiness of online information has been a
crucial issue for both platforms and even all the users. This study provides a prac-
tical example of how to identify such professional users. Even though we cannot
confirm whether these users are professional or not as the anonymity of the Inter-
net, we strive to provide some evidence for further investigation. We consider
this work could help not only the government but also political organizations to
realize the user behavior from a birds-eye viewpoint. Furthermore, it would ben-
efit the online society or platforms to filter potential information manipulation.
In the future, we seek to develop a formalized and systematic method to identify
such users. Also, automatic verification of our results based on content analy-
sis or other evidence should be investigated. We hope the research outcomes of
this topic can benefit the online society to improve discussion efficiency, to hold
democratic debates, and to increase the transparency of online information.

Acknowledgements. This work was supported by Ministry of Science and Technol-


ogy, Taiwan, under the Grant MOST 107-2218-E-035-009-MY3. We would like to thank
reviewers for their valuable comments and suggestions to improve the manuscript.

References
1. Benevenuto, F., Rodrigues, T., Cha, M., Almeida, V.: Characterizing user behavior
in online social networks. In: Proceedings of the 9th ACM SIGCOMM Conference
on Internet Measurement, pp. 49–62. ACM (2009)
2. Boutet, A., Kim, H., Yoneki, E.: Whats in twitter, i know what parties are popular
and who you are supporting now! Soc. Netw. Anal. Min. 3(4), 1379–1391 (2013)
3. Guo, D., Lin, F., Chen, C.: User behaviors in an online social network. In: IEEE
International Conference on Network Infrastructure and Digital Content, 2009,
IC-NIDC 2009, pp. 430–434. IEEE (2009)
4. Hoang, T.A., Cohen, W.W., Lim, E.P., Pierce, D., Redlawsk, D.P.: Politics, sharing
and emotion in icroblogs. In: Proceedings of the 2013 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining, pp. 282–289.
ACM (2013)
446 M.-H. Wang et al.

5. Jamali, S., Rangwala, H.: Digging digg: comment mining, popularity prediction,
and social network analysis. In: International Conference on Web Information Sys-
tems and Mining, 2009. WISM 2009, pp. 32–38. IEEE (2009)
6. Lei, K., et al.: Understanding user behavior in sina weibo online social network: a
community approach. IEEE Access 6, 13302–13316 (2018)
7. O’Connor, B., Balasubramanyan, R., Routledge, B.R., Smith, N.A.: From tweets
to polls: linking text sentiment to public opinion time series. In: Proceedings of the
Fourth International AAAI Conference on Weblogs and Social Media, pp. 122–129.
AAAI Press, Menlo Park, Washington, D.C. (2010)
8. Priambodo, R., Satria, R.: User behavior pattern of mobile online social network
service. In: International Conference on Cloud Computing and Social Networking
(ICCCSN), 2012, pp. 1–4. IEEE (2012)
9. Tsugawa, S., Kimura, K.: Identifying influencers from sampled social networks.
Phys. A: Stat. Mech. Its Appl. 507, 294–303 (2018)
10. Tumasjan, A., Sprenger, T., Sandner, P., Welpe, I.: Predicting elections with Twit-
ter: What 140 characters reveal about political sentiment. In: Proceedings of the
Fourth International AAAI Conference on Weblogs and Social Media, pp. 178–185.
AAAI Press, Menlo Park, Washington, D.C. (2010)
11. Wang, M.H., Lei, C.L.: Boosting election prediction accuracy by crowd wisdom
on social forums. In: 2016 13th IEEE Annual Consumer Communications and
Networking Conference (CCNC), pp. 348–353. IEEE (2016)
12. Wang, M.H., Lei, C.L.: SocialDNA: a novel approach for distinguishing notable
articles and authors through social events. J. Inf. Sci. Eng. 34(6), 1579–1598 (2018)
13. Wong, F.M.F., Tan, C.W., Sen, S., Chiang, M.: Quantifying political leaning from
tweets, retweets, and retweeters. IEEE Trans. Knowl. Data Eng. 28(8), 2158–2172
(2016)

Vous aimerez peut-être aussi