Vous êtes sur la page 1sur 6

Abstract- Social network is a popular media for the freedom of speech.

These speeches are related to


the people having

different religions, geographical locations and races, etc. However, different religious and ethnic groups
often take exceptions

for freedom of speech, which may sometimes stir people into action. Although, concerning authorities
are regulating freedom

of speech for social sites, automated systems are required to implement such regulations. We consider
spiritual belief as most

targeted domain for spreading hatred speech. This work is focused on devising a methodology that can
filter tweets/messages

over Twitter and identify hatred speeches. We have investigated the utility of supervised learning
algorithms (i.e. Support

Vector Machine (SVM), Naive Bayes (NB) and k Nearest Neighbors (kNN) for performing this task. SVM,
NB and kNN

classifiers are applied on Tweets for categorizing opinions first and then find their sentiment polarity. In
both cases, SVM

showed better performance than NB and kNN.

Index Terms- Sentimental Analysis; Twitter; Freedom of Speech; Religion; Hatred Speech.

I. INTRODUCTION

Over the past decade, social media has emerged into a

dynamic form of world-wide interpersonal

communication. It facilitates users for constant and

continuous information sharing, making connections

and conveying their thoughts across the world via

different mediums. Today, Facebook boasts 1.3

Billion users, with 82% of users are from outside the

United States and Canada. Twitter has 270 Million

active users and 500 million tweets are sent per day.

Furthermore, each day, more than 4 Billion videos are


viewed on YouTube and 60 Million photos are

uploaded on Instagram [5]. Due to high reachability

and popularity of social media websites worldwide,

organizations also use these websites for planning and

mobilizing events for protests and public

demonstrations [1]. Twitter is a famous platform for

opinion and information sharing and this platform is

mostly used before, during and after live events [3].

Twitter effectively takes part in any mega event

happening around the world. Access to social media

websites means access to freedom of speech and a

voice [4]. The independence to give views or opinions

over any issues or posting any material is considered

as freedom of speech.

Definition of hate speech is often contested and it

generally lies in a complex nexus with freedom of

expression, individual, group and minority rights,

along with concepts of equality, liberty and dignity.

Hate speech generally refers to expressions, speech,

gesture or writing that advocates, threatens, or

encourages violent acts. Popular social media

websites, blogs, forums are frequently being misused

by many groups to promote online radicalization (also

referred as cyber-extremism, cyber-crime and cyber

hate propaganda). Research [22, 25, 26] shows that


extremist groups put forth hateful speech and videos,

offensive and violent comments and messages

focusing their mission. Many hate promoting groups

use popular social media websites to promote their

ideology by spreading extremist content among their

viewers [2]. Hate speech can stay online across

multiple platforms for a long period of time and it can

be linked repeatedly. According to Oboler (CEO of the

Online Hate Prevention Institute) “The longer the

content stays available, the more damage it can inflict

on the victims and empower the perpetrators” [27]. If

you remove the content at an early stage you can limit

the exposure. This is just like cleaning litter, it does

not stop people from littering but if you do not take

care of the problem it just piles up and further

exacerbates.

Opinion mining and sentiment analysis is a technique

to detect and extract subjective information in text

documents [28]. Using sentiment analysis, we can find

the overall contextual polarity about any topic in a

document provided by its author. Challenging task in

opinion mining is sentiment classification which is

done by guessing opinion about anything i.e. book,

movie, product, issues regarding politics and religion

etc.These opinions can be in the form of sentence,


document or feature and the task is to label them as

positive, negative or neutral.

In this work, we have applied opinion mining

techniques on tweets related to spiritual beliefs. Main

aim of this work is to detect hate speech in tweets and

analyse the sentiment polarity. In Section II, related

work is discussed. Section III consist of our

methodology that how we have collected tweets.

Experiments that we have performed and obtained

results are discussed in Section IV. Article concludes

with some remarks in Section V.International Journal of Advances in Electronics and Computer Science,
ISSN: 2393-2835 Volume-4, Issue-1, Jan.-2017

http://iraj.in

Identification of Hatred Speeches on Twitter

47

II. RELATED WORK

In this section, we have presented related work on

detection of hatred speech in websites, newspapers

and on social networking sites. Spertus [8] who called

insult or sentiment as" flames", used two approaches

human message rating and message classification

(machine classification) for flame recognition. In

human message rating, author got rating of 1222

messages by four speakers of American English (two

males and two females) by indicating a message as

flame if it contains abusive or insulting language. In


message classification, author used decision tree

generator C4.5 and MLC ++ utilities to provide a

classifier. 64% accuracy for detecting flames was

achieved. They also provided a prototype system

Smokey.

An attempt to design a generic hatred speech

identification system [7], in order to catch hatred

phrases from web pages. Two datasets were used:

NSM (Natural Semantic Module) company log files

containing 372 sentences and Usenet newsgroup

messages (i.e. 1288 in total) which were annotated

already for flame (containing abusive phrase)

detection. A three level classification method was

used. At first level, Naive Bayes (NB) classifier was

used for selecting most discriminative features. At

second level, Multinomial Updatable NB classifier

was used which provided new labelled sentences after

initial trimming to add in system for adaptive learning

(a new feature extraction was also part of it from

previous labelled data) and finally rule-based classifier

named Decision Table/Naive Bayes hybrid (DTNB)

classifier was used by feeding second level processed

data to make final decision on given instance. An

accuracy of 68% was achieved on most frequent

flames by using 10 fold cross validation which shows


the inefficiency of generic hatred speech identification

method.

Vous aimerez peut-être aussi