Vous êtes sur la page 1sur 7

Towards Detecting Malicious Activity On Twitter

Muhammad Saad

Muhammad Fareed Zaffar

James Kirk

Twentieth Century Fox Springfield, USA


School of Sciences
and Montgomery Scott
Email: homer@thesimpsons.com
and Engineering
Starfleet Academy
Lahore University of Management Sciences
San Francisco, California 96678-2391
Email: arrowloop@gmail.com
Telephone: (800) 5551212
Fax: (888) 5551212
AbstractIn Section 4 we will demonstrate how we identified
collusion networks on twitter which are used to promote content
of a renowned user. Within the collusion network, we isolated the
bots and compromised accounts. We also identified the popular
services that allow users to schedule their tweets and retweets
in the favor of another user. In Section 5 we will introduce
another network known as C2 network. Which comprises of
the beneficiaries of multiple collusion network. In Section 6, we
will propse some security suggestions on how to mitigate the
ramification of anomalous activities. Finally in Section 7 we will
give insights to our future work. In Section 4 we will demonstrate
how we identified collusion networks on twitter which are used
to promote content of a renowned user. Within the collusion
network, we isolated the bots and compromised accounts. We
also identified the popular services that allow users to schedule
their tweets and retweets in the favor of another user. In Section 5
we will introduce another network known as C2 network. Which
comprises of the beneficiaries of multiple collusion network. In
Section 6, we will propse some security suggestions on how
to mitigate the ramification of anomalous activities. Finally in
Section 7 we will give insights to our future work.
KeywordsIEEEtran, journal, LATEX, paper, template.

I.

I NTRODUCTION

Twitter, as often referred to The SMS of Internet has


evolved from a micro-blogging service to a giant pool of
interactive dissemination of information. As of March 2016,
there are over 310 million monthly active users. An estimate
volume of twitter users is around $1.3 Billion. 29.2 % social
media users in United States alone, are active on twitter. On
average, 500 million tweets are generated every day. Even
though Twitter is blocked in China, still it has been able to
garner over a 100 million users there. Paper (X) analyzes how
swiftly information promulgates over twitter. Within seconds
a user can get visibility of hundreds of other users under the
influence of a single tweet. A record 618,725 tweets were
generated under a minute in FIFA 2014 World Cup final.
Twitter has also emerged to be mouthpiece for marketing and
business promotions. Twitters worldwide revenue is around
$2.22 billion. 65.8 % of 100+ employees companies in United
States use twitter as a marketing tool for publicity of their
brands and products. However, other than the glib view,
twitter ecosystem comes with a mild undertone of competitive
mischief. As the user space inflated, a number of services
materialized to capture the market. Politicians, government
agencies, celebrities from show business and sports, journalists
and business tycoons joined twitter. With twitter, users have
a potential chance of directly interacting with famous figures.
This is not very common and easier in other social networking

sites. Thus popularity of twitter grew. More users and more


celebrities joined twitter to reach out to the other people. The
most blatant popularity measure of any celebrity is the number
of followers the account has and the number of retweets
and likes it reaps. To exploit this competitive environment
online malicious services came to the surface to provide twitter
followers for couple of dollars. As the availability of such fake
followers became cheaper, common users also subscribed to
such services to boost their profiles. [reference]. Contending
celebrities also employed these techniques to surpass eachother
on the margins of fame and followers. As political influence
increased, politicians bought dedicated accounts to create
trends in their favors. Campaigns like Brexit, Turkish Coup
and American Presidential Elections were heavily contested
on twitter. Thus a concept of collusion networks began to
take shape. Since October 21, 2015, twitter introduced a
new strategy known as Polls. Where users could get votes
on a particular statement. This further accentuated the need
to have maximum user participation to display a stream of
opinions. Polls on elections, movies, books, technology and
sports became very common. Trends and polls on twitter
tend to influence the mindset of common users. Opinions can
tilt if they are clouded with monotonous data and newsfeed.
News spread on twitter is often viral and dynamically content
preservative. Thus the impact factor is humongous.
Our aim in this paper is to identify how malicious activities are
organized and executed on twitter. We will identify potential
vulnarabilities in the twitter framework which allow such
activities to be successful. We will point out the behavior of
underground black markets, patterns among the fake followers
and dedicated collusion networks which exist on this platform.
We will also suggest techniques to identify them and thwart
their influence.
In Section 1, we will explain how certain black markets operate
which provide organic and inorganic fake followers to people.
Highlighting certain security flaws and their unknown but deleterious effects. We will also reveal how top users get benefits
from each others followings and how twitters environment
help them to get eachothers followers. Some privacy intrusion,
data breach and information leakage techniques about other
users will be brought to notice. The system we developed to
depict and replicate the behavior of black markets is called
Smart Auto Following (SAT).
In section 2 we will introduce techniques to catch fake twitter
followers. Due to changes in twitter API, previous work done
to catch fake followers [Y] is not efficient anymore.

We have instead leveraged the new change to our advantage


and devised an efficient strategy to map and categorize fake
followers of a particular account. The system used for the
combined analysis of fake followers is named as Monitoring
Fake Tweeps. We will also refer to some notable work done
in this field before and also certain constraints we had while
putting it all together. Section 3 will show the results of MFT.
How under the given constraints, it was able to detect fake
followers based upon its parameters. We will also apply checks
to verify the results and counter map the findings.
In Section 4 we will demonstrate how we identified collusion
networks on twitter which are used to promote content of a
renowned user. Within the collusion network, we isolated the
bots and compromised accounts. We also identified the popular
services that allow users to schedule their tweets and retweets
in the favor of another user. In Section 5 we will introduce
another network known as C2 network. Which comprises of
the beneficiaries of multiple collusion network. In Section 6,
we will propse some security suggestions on how to mitigate
the ramification of anomalous activities. Finally in Section 7
we will give insights to our future work.

an account by exploiting this.


Figure 1 shows when a malicious user signs with legit@gmail.com he is not allowed. Figure 2 shows that when
he indents the same email string with a dot le.git@gmail.com
he is given the permission. Marked by a blue tick on the right
corner. Such an activity can have many probable impacts.

Fig. 1

A. SECTION 1
Fake twitter followers has become a popular industry now.
While the normal way of inflating the followers count is
tiresome and demanding, a user can easily purchase them
online. Fake followers are now available even at $ 1. Sifting
through such services, we noticed two most common trends.
Some websites provide organic bulk followers while others
offer inorganic daily followers. For example, sites like buycheapfollowersfast.com offer real followers for different
prices while sites like buyrealmarketing.com offer daily
followers, followers from USA, retweets and favorites. The
phenomenon of organic followers is known to the world. These
sites create a pool of accounts and upon subscription, direct the
pool towards the target account. However we found this new
trend of daily followers to be intriguing. Our assumption to
begin with was the underlying vulnerabilities in twitter that
contemplated daily creation of bulk users. We traced those
weaknesses, tested our assumptions against our findings and
finally automated the behavior to simulate how this industry
works. Signing up for twitter is an easy and straightforward
process. A user enters the name, types in an email ID or
phone number, selects a password, verifies phone number, sifts
through the rest of the pages and finally signs up. Twitter sends
a verification for the email ID and the user later verifies the
that and everything is up and running. As easy as it sounds, it
also flags some key security flaws in the signing up process. If
a legitimate user who has already signed up for twitter using
email legit@gmail.com then twitter will not allow another
account to be set up over this email ID. However this can easily
be circumvented by addition of a . or a + sign anywhere
before the @ character. So if legit@gmail.com is changed
by addition of a . and transformed to le.git@gmail.com,
then twitter will allow an account to be set up on this email.
However the verification link will be sent to the original
legit@gmail.com. Another way of signing up is using a ghost
email ID. One that does not exist e.g thisemailcannotpossibilyexist11987462728626@gmail.com. Twitter will inform the
user that a verification email has been sent, even though that
email ID does not even exist. And the malicious user can create

Fig. 2

A number of accounts can be created over one


email ID using permutations of characters.
Accounts can be created over non existent email
IDs.
Malicious activity carried out upon a permuted
account can ultimately frame a legitimate user if
he is unaware of accounts existence.

Fig 3 below ilustrates one such instant where an account is


created over a ghost email ID that does not exist in cyber space.
thisemailcannotpossibilyexist11987462728626@gmail.com
Once the first phase is passed or bypassed, the user ends up
on page where he is asked to verify his phone number. Unlike
other social media forums, on twitter many accounts can be
registered over one phone number. Even more alarming is the
fact that one can easily bypass this phase by clicking the skip
button. Figure 4 illustrates this.
By clicking the skip button the malicious user bypasses
the confirmation phase and goes onto the page of selection of
username. Even that can be skipped by clicking skip button
(Fig 5). By this time an account is created and a user can skip
the next phases of signing up and can move onto homepage.

However to a Daily Followers service provider, the actual


activity begins from here. The final screen shows 21 most
popular accounts based upon the users location. On that screen
user can also add other target accounts that it intends to follow.
If the user is not interested in following some of the 21
accounts recommended by twitter, then he can uncheck them
and continue. We call this Club of 21 and we will later
show our in depth analysis of this club. He is then taken to the
homescreen of twitter. Fig 6 shows the screen. The identities
of accounts are not shown for privacy concerns. Now here are
some possibilities associated with this peculiar behavior.

Fig. 3

Fig. 6

Fig. 4

Fig. 5

If a user follows those 21 accounts then he becomes a common


follower to all If any of the 21 account subscribes to the
service of daily followers, he invariably provides followers to
the other 20 accounts. If a normal user subscribes to such
service, then based upon the location of service provider, he
becomes a beneficiary to 21 recommended accounts of that
location Daily Followers providers can easily automate these
steps and generate accounts. Since we deciphered this theory,
we also replicated this upon a test account.
We created a web automation script in a headless browser.
Using a pool of random names and ghost email IDs, we
followed all the above mentioned steps. We unchecked all the
accounts in the club of 21 and followed our target account.
Web automation is easy and simple and can be done using
DOM elements or xPath. Random delay was added in the
execution of script to avoid temporal similarity of account
creation. Over a period of three days we created 400 such
accounts and followed the target account. Later on the account
was deactivated. Since the 400 accounts were not following
any other account, so no damage was done. However this
gave us the key insights about how such providers of followers
work. Also what are the key vulnerabilities in twitter that can
be exploited. This is an alarming fact that if an account is
created upon a permuted email of a valid account and some
serious malicious activity is performed e.g. ( cyber crime,
harassment, violet threats) , then the valid user may ultimately
suffer. Assuming the credibility and impact of twitter, this situation must be taken very seriously and dealt with appropriate
measures.

Sets
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

B. SECTION 2

As mentioned earlier, the club of 21 includes top 21


accounts based upon geographical location. They can also
be modeled by mentioning interests on a screen before the
final screen. Assuming that in any given club if a user has
subscribed to a service of followers, other twenty would be
the indispensable recipients of those followers. Or in an other
scenario, if some of those accounts have subscribed to the
same service then their followers would be following the same
accounts. If the provision of followers is not selective, then
mutual followers of all 21 will be reasonably high. If the
provision was selective, then percentage of of mutual followers
would show spikes in mutuality. First we manually observed
it and build a hypothesis. Next, we had to come up with an
algorithm that converged the following count of 21 selected
accounts in a way to show their mutual followers. One way
was to crawl a set of followers of all 21 accounts and find
out intersection of each account with every other account. This
turned out to be a laborious and tiresome task since all possible
intersections of 21 accounts were 1540. We figured out these
possibilities of intersections by writing a code and driving a
mathematical equation from it.

Intersections
0
0
1
4
10
20
35
56
84
120
165
220
286
364
455
560
680
816
969
1140
1330
1540

The process of calculating all sets, subsets and possible


intersections of 21 accounts can take too much time for
analysis and distribution. Other than complex calculation,
the broad understanding would not have been lucid. So we
drove an algorithm that resolves the complexity and maintains
brevity. Algorithm mapped the convergence on key value pairs.
Algorithm
Crawl 5000 followers of all 21 accounts {A1...A21}
Store all the followers in one set
Find out the number of repetitions of each element in set
Generate a key value pair against each element in the
set alongwiththenumberof occurrences

PseudoCode
sets = 21
adder = 0
array = [empty]
fnlc nt = 0
loop = 0
for loop sets:
adder = adder + loop
count.push(adder)
end
secondl oop
for secondl oop < count.length :
fnlc nt = f nlc nt + f nlc nt[secondl oop]
output fnlc nt
From the code we drove the following equation that
follows the progression of intersections as the set count
increases. Table below shows the number of accounts and
possible intersections of all those accounts

Output it in the from {e1 : 12}, {e22 : 4}, {e13 : 21}


Element 3 in the set occurs21times.Element2occurs4
times and so on.
Count the number of keys with same values. E.g if {e1:12}
and {e10: 12} add them

Finally generate a new key value pair where values of previous


object become
the keys of new object and their new values denote
the number of occurrences. E.g {21: 30}, {20, 15}, {19:40...}
Number of followers that are following 21 accounts are 30, followe
following 20 accounts are 15 and so on.

Using this Algorithm, we were able to figure out how


many followers were mapping to the number of accounts. We
were able to see the mutual convergence with all possible
intersections while keeping the output simple and easier to

calculate. One other reason to apply this was that we were


interested to calculate the number of completely independent
followers. And also the number of totally common followers.
The threshold of maxima and minima would have given us
the knowledge of exclusiveness of accounts with eachother.
We took the top 21 accounts from our location. There were
politicians, media persons, sportsmen, journalists and show
business icons. Interesting to note was that the politicians
were from the opposite camps and sportsmen had nothing in
common with journalists. We ran our experiment with first
5000 followers and then upto 10,000 followers. And repeated
this experiment after 5 days three times on same timings. The
reason for repeating after five days will be elaborated in section
(X). We wanted to ensure with an upper bound confidence
that by 5 days, all our accounts had gained atleast 5000 new
followers and there was no overlap in the sample space. The
results showed that the percentage of independent followers
who followed just one particular account A1 U A2 U...A21
was always in the range of 11%-17%. The percentage of users
who followed atleast one other account was thus between 88%83% ( figures have been rounded off, actual being in decimals).
The percentage of followers, following 21 accounts was also
wavering between 18-35%. To keep maintain the fairness, we
carried out the entire experiment by removing one figure from
the 21 club and adding another renowned personality that was
not the part of the club. This was a control experiment. And the
result was not surprising. Percentage of followers following 21
accounts was 0 in the recent 5000 and 0.05% in 100000. So our
original theory held a solid ground. However analysis of the
entire data showed that provision of followers was selective.
Though percentage of common 21 was always higher than
percentage of common 20, common followers among 18 and
12 accounts had the highest percentage. but Since 21 accounts
are being constantly fed followers by services, they remain
at the top in terms of followers count and hence keep on
coming in the recommendation for new accounts. Making the
job easier for the black market service providers. Another key
factor about these 21 accounts is the behavior of their gain
in followers. We will discuss that in Section 4 where we will
introduce our new techniques to catch fake followers.

5)
6)
7)

As the number of accounts increase, their dissimilarity should increase


As the number of days increase, the pattern of gain
in followers should be dissimilar
Their autocorrelation and cross correlation should be
less as the sample size increases

C. SECTION 3
Next we explored the similarities in the followers count
of selected 21 accounts. If they were being automatically
provided followers by services, then there was to be a pattern
in the follower gain and also in the accounts created. This is
correlated to the work mentioned in the previous section and
also in the coming segment where we propose strategies of
catching fake followers. This is to be kept in notice that our
cluster of 21 accounts, no account has less that 1.5 million
followers. If we crawl the follower count of a target account
after 15 minutes for a day. And then for next consecutive 5
days, we should safely assume following observation about the
account based upon normal human behavior
1)
2)
3)
4)

There can not be one specific pattern of followers


distributed over the day.
Patterns of two days can not be similar
Patterns of two accounts over the same day can not
be co related
Patterns of two accounts over the increasing number
of days can not reflect similarity

These assumptions are based upon the ground realties of


human behavior. Even if a bulk of user follows a bunch of
accounts on one day at a certain moment, same behavior is
less likely to be replicated the next day. And the probability
decreases as the sample space of accounts, users and time
increases. And to reflect it, we have also crawled some
followers of other renowned test accounts which are not the
part of club of 21, for the control experiment. Followers count
of 21 accounts was scooped after 15 mins. So over the duration
of 24 hours, we got 96 discreet counts. Fig X below shows
the followers count of four key account over the day and five
curves above the base curve show the the behavior of followers
gain for the remaining five days.

Fig. 11: Write some caption here


Fig. 7: Write some caption here

Fig. 12: Write some caption here


Fig. 8: Write some caption here

Fig. 13: Write some caption here


Fig. 9: Write some caption here

Fig. 14: Write some caption here

Four accounts and their figures were considered sufficient


for analysis. But their similarities extend in all 21. Taking a
close look on the figures above reveals that almost all our
assumptions about normal behavior are being violated. Thus
to be considered suspicious. All the curves depict a specific
behavior over the day with very little deviation. Surprisingly
the next day similar behavior is followed. And is continued to
be followed for six days. This means that on average same
number of followers are garnered everyday with consistent
frequency. And that unnatural behavior happens to reiterate
over six days. Not just for one account but all four accounts
mentioned here and 21 accounts that we have identified.
Curves show same rise for the same duration and constant
drop at equal duration. This substantiates the idea that they
subscribed to a same service provider or they are beneficiaries
of same service provider. Since the temporal space is divided
in 15 minutes, another peculiar behavior was noticed. Within
fifteen minutes a sudden drop in their followers was observed.
E.g User 1 lost 70 followers within 15 minutes. Gaining
70 followers in the same duration is plausible but all of a
sudden 70 followers just unfollowed during that time period,
was suspicious. Upon further scrutiny, it was observed that
all the accounts lost around same number of followers in
that time period. In other words, around 70 90 followers
(from the data) unfollowed all 21 accounts in the window
of 15 minutes. The only reasonable explanation to is that
twitter finds a bunch of followers to be suspicious and it
removes them. Since they are all common to 21 so the drop
is observed in the curve of all accounts. To further look into
it, we calculated the auto-correlation of one curve of the same
account with the other 5 curves of the account for the duration
of six days. And we also calculated the cross correlation of
one accounts total behavior with the other three accounts.
Then we found the net gain in the followers by subtracting
followers count from the count revived after 15 minutes. For
example if the count for an account A right now is 30 and
after 15 minutes it becomes 50, then the net gain becomes
(50-30=20). So for all discreet values we obtained equivalent
net gain or loss in followers. Taking their average over 96
data points gave us the average gain in followers for that day.
Using that for next 96 data points, we obtained the average
for the second day and so on. The intent was to see if the
average gain in followers per day remains in a general range.
Deviations were expected since these accounts are famous and
they have massive propensity of gaining genuine followers too.
However more the predictability of data, lesser the chances
of genuine followers. Given measurements,Y1, Y2, ..., YN at
time X1, X2, ..., XN, the lag k autocorrelation function is
defined as rk=Nki=1(YiY)(Yi+kY)Ni=1(YiY)2 Although the
time variable, X, is not used in the formula for autocorrelation,
the assumption is that the observations are equi-spaced. Autocorrelation is a correlation coefficient. However, instead of
correlation between two different variables, the correlation is
between two values of the same variable at times Xi and Xi+k.
Autocorrelation is used to detect non-randomness. A positive
(negative) autocorrelation means that an increase in your time
series is often followed by another increase (a decrease). If
the autocorrelation is close to 1, then an increase is almost
certainly followed by another increase. In other words, the
average value of the time series is increasing. Alternatively, a
decrease is almost certainly followed by a decrease. In other
words, the average level of the time series is decreasing. The

trend part follows trivially.


Tables below show the autocorrelation matrix of all four
accounts
II.

C ONCLUSION

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.


Etiam lobortis facilisis sem. Nullam nec mi et neque pharetra
sollicitudin. Praesent imperdiet mi nec ante. Donec ullamcorper, felis non sodales commodo, lectus velit ultrices augue,
a dignissim nibh lectus placerat pede. Vivamus nunc nunc,
molestie ut, ultricies vel, semper in, velit. Ut porttitor. Praesent
in sapien. Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Duis fringilla tristique neque. Sed interdum libero ut
metus. Pellentesque placerat. Nam rutrum augue a leo. Morbi
sed elit sit amet ante lobortis sollicitudin. Praesent blandit
blandit mauris. Praesent lectus tellus, aliquet aliquam, luctus
a, egestas a, turpis. Mauris lacinia lorem sit amet ipsum. Nunc
quis urna dictum turpis accumsan semper.
A PPENDIX A
P ROOF OF THE F IRST Z ONKLAR E QUATION
Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
Etiam lobortis facilisis sem. Nullam nec mi et neque pharetra
sollicitudin. Praesent imperdiet mi nec ante. Donec ullamcorper, felis non sodales commodo, lectus velit ultrices augue,
a dignissim nibh lectus placerat pede. Vivamus nunc nunc,
molestie ut, ultricies vel, semper in, velit. Ut porttitor. Praesent
in sapien. Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Duis fringilla tristique neque. Sed interdum libero ut
metus. Pellentesque placerat. Nam rutrum augue a leo. Morbi
sed elit sit amet ante lobortis sollicitudin. Praesent blandit
blandit mauris. Praesent lectus tellus, aliquet aliquam, luctus
a, egestas a, turpis. Mauris lacinia lorem sit amet ipsum. Nunc
quis urna dictum turpis accumsan semper.
ACKNOWLEDGMENT
The authors would like to thank...
R EFERENCES
[1]

H. Kopka and P. W. Daly, A Guide to LATEX, 3rd ed. Harlow, England:


Addison-Wesley, 1999.

Vous aimerez peut-être aussi