Information & Management: Jiexun Li, Xin Li, Bin Zhu

G Model
INFMAN 2917 No. of Pages 10
Information & Management xxx (2016) xxxxxx
Contents lists available at ScienceDirect
Information & Management

journal homepage: www.elsevier.com/locate/im
User opinion classication in social media: A global consistency

maximization approach
Jiexun Lia,* , Xin Lib , Bin Zhuc
a
College of Business & Economics, Department of Decision Sciences, Western Washington University, Bellingham, WA 98225, USA
b
Department of Information Systems, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong Special Administrative Region
c
College of Business, Oregon State University, Corvallis, OR 97331, USA
A R T I C L E I N F O A B S T R A C T
Article history:
Received 17 July 2015 Social media is a major platform for opinion sharing. In order to better understand and exploit opinions
Received in revised form 10 May 2016 on social media, we aim to classify users with opposite opinions on a topic for decision support. Rather
Accepted 5 June 2016 than mining text content, we introduce a link-based classication model, named global consistency
Available online xxx maximization (GCM) that partitions a social network into two classes of users with opposite opinions.
Experiments on a Twitter data set show that: (1) our global approach achieves higher accuracy than two
Keywords: baseline approaches and (2) link-based classiers are more robust to small training samples if selected
Big data properly.
Social media
2016 Elsevier B.V. All rights reserved.
Opinion mining
Collective classication
1. Introduction with hundreds of millions of active users and postings generated

per day. In order to mine data of such large volume and high
In electronic commerce, a major challenge faced by companies velocity, businesses must be equipped with advanced analytical
is to understand their customers opinions to design and conduct technology to obtain an accurate overview of social opinion
marketing campaigns. The emergence of big-data platforms and landscape for better decision support. Furthermore, social media
technologies makes it possible for e-commerce companies to contains a mixture of various types of data, such as textual content,
access and analyze data about a large population of (potential) multimedia content, and social interactions. Analytical solutions
customers. In particular, social media is a major source and for social media must be capable of modeling such data variety.
battleeld of e-commerce big-data analytics, due to its growing Hence, among the main research directions in business intelli-
popularity for information distribution and opinion sharing. gence and analytics [9], mining opinions from noisy unstructured
Mining data in social media to gain marketing intelligence have text in streams and mining link structures in social networks pose
been identied as one of the major applications of big-data signicant challenges and opportunities for the big-data analytics
analytics [1]. The analysis of social media data not only enables research community.
businesses to have more effective conversations with their In order to successfully make use of opinions in social media for
customers [2] but also facilitates the understanding of peoples decision making, it is important to classify individuals viewpoints
opinions about their products/services [3]. Zhao et al. [8] identied into two sides of a debate. Many business questions can be
social media analytics as a critical complement to traditional formulated as a binary decision, such as Should our product have a
survey analytics to increase the validity of market studies. Online larger screen size? or Do we need a luxury version of our current
opinions have been used in a variety of analytic solutions, such as product/service? It is crucial for companies to accurately estimate
sales prediction [4,5], search engine design [6], and nancial the for and against populations in social media platforms and
market forecasting [7]. identify target groups for marketing campaigns. Because it is
Mining social media data requires big-data solutions. For difcult to fully understand the demographics or opinions of all
instance, microblogging sites like Twitter are growing at a high rate users before starting a campaign, it is a common practice to begin
with a select group of seed users.
Most traditional methods of opinion mining try to classify
users opinions analyzing text content of the postings [10]. In
* Corresponding author.
social media such as Twitter, the performance of such
E-mail addresses: jiexun@gmail.com (J. Li), xin.li.phd@gmail.com (X. Li),
bin.zhu@oregonstate.edu (B. Zhu).
approaches is often unsatisfactory due to the short length of
http://dx.doi.org/10.1016/j.im.2016.06.004
0378-7206/ 2016 Elsevier B.V. All rights reserved.
Please cite this article in press as: J. Li, et al., User opinion classication in social media: A global consistency maximization approach, Inf.
Manage. (2016), http://dx.doi.org/10.1016/j.im.2016.06.004
G Model
2 J. Li et al. / Information & Management xxx (2016) xxxxxx
online postings and informal writing styles. Furthermore, with a 2.1. Content-based approaches
limited number of seed users as training data, content-based
opinion classiers often fail to achieve high classication Opinion classication is often based on textual contents, which
accuracy. can take two approaches: lexicon-based and learning-based
Enlightened by the variety nature of big data, our study takes a methods. Lexicon-based approaches use lexicons and predened
different perspective to address the opinion classication problem. rules to annotate sentiments of text [13]. For example, Demers and
As we know, the network structure of users interacting with each Vega [14] used a lexicon approach to measure the tone of news.
other in social media contains rich information suggesting opinion Learning-based methods use machine-learning techniques upon
groups. How opinions are formed, divided, and spread is often linguistic features, including the lexicon features if available, to
guided by the principles of homophily and social inuence. In build opinion classication models. For example, Yang et al. [15]
microblogging, as participants receive messages only from those applied association rules and a Nave Bayes (NB) classier to
who they choose to follow, it is highly unlikely that they will follow classify the sentiments of online consumer reviews on
a person whom they do not like or care about. Hence, the following e-commerce websites.
relationships among users may suggest a layer of homophily that With the rise of social media, opinion mining is widely applied
can be exploited to identify opinion camps. Another valuable to social media applications such as Twitter and Facebook. Most
information source for opinion mining is the retweeting relation- existing studies used the textual contents to resolve this problem.
ships among users. According to studies investigating the Mostafa [16] used a lexicon-based approach to classify brand
motivation behind online activities, microblog participants usually sentiments. Li and Xu [17] constructed a rule-based system to
retweet a message because they think it could help others make detect the events in microblog posts that cause emotional effects.
decisions [11]. They may also retweet messages to stay connected In the machine-learning approach camp, Sayeedunnissa et al. [18]
with others who receive the message or show support to the took a classic NB approach with the bag-of-words model and
sender of the message [12]. As a result, in the context of information gain-based feature selection to classify Twitter
microblogging, people are less likely to retweet a message if they sentiments. Akaichi et al. [19] and Hamouda and Akaichi [20]
disagree with the content of the message or dislike the sender of used support vector machine (SVM) and NB approach to classify
the message. Facebook status sentiments. Myslin et al. [21] compared multiple
On the basis of social inuence theories, link structures among classic machine-learning approaches to classify Twitter
users could also suggest users opinion. Most existing studies take a sentiments on tobacco products. It is worth noting that Hassan
local view and classify each users opinions separately. In such et al. [22] proposed a bootstrapping ensemble approach to address
approaches, early errors may propagate to later classications and the Twitter sentiment classication problem. The approach
thus lead to lower accuracy. We propose a novel method that takes provided more accurate and balanced predictions and built
advantage of the global structure of social interactions to alleviate sentiment time series that better reect events eliciting strong
the opinion classication problem in a collective manner. In sentiments from users. There were also studies combining
particular, we model the individuals involved in a topic discussion machine-learning with lexicon-based approaches to classify
as a graph according to their communication linkages. Our sentiments [23,24]. In the machine-learning approach, clustering
conjecture is that this graph reects stabilized social relations methods were also used to analyze social media opinions. For
where people with common opinions tend to interact more and example, Paltoglou and Thelwall [25] proposed an unsupervised
people with opposite opinions tend to communicate less. On the lexicon-based approach that estimates the level of emotional
basis of this conjecture, we propose a global consistency intensity contained in text. Feng et al. [26] used PLSA to cluster
optimization algorithm to collectively classify the opinions of blogs with sentiments as a latent variable, which was able to nd
users involved in debates in social media. Our research shows that sentiment coherent groups.
our proposed opinion classier is not only accurate but also robust
to the size of seed users as long as the seeds are chosen 2.2. Link-based approaches
appropriately.
We consider our study a novel and signicant contribution to Social media provides a new platform for users to communicate
the design of analytical solutions in electronic commerce. Our and interact with each other. When a user posts a message online
work focuses on analyzing the large volume of everchanging to express his/her opinion, it can incur a series of responses such as
opinion data generated by users in social media. In particular, our compliments, praises, disagreements, and even attacks from other
proposed approach takes advantage of the variety of social media users. Each of these responses can lead to even more responses in a
data by leveraging the link relationships among users to achieve spreading manner. Such response relationships form a social
highly accurate classication of opinions. Furthermore, the global network of opinionated users. The linkage (i.e., relationship)
maximization algorithm shows signicant robustness to the size of information has become new evidence that can help distinguish
training set. This strength of our approach can help mitigate a users opinions. Because users are connected to each other in a
common problem in big-data analytics, that is, the limited amount social network and their class labels are intercorrelated, opinion
of labeled data. Our research has signicant implications for both mining has become a collective classication problem [27].
researchers and practitioners of big-data analytics in electronic Many studies have tried to use the linkage information
commerce. embedded in social networks for classifying users opinions in
social media. However, they view the semantics of linkages in
social networks differently. In an early effort of link-based opinion
2. Literature review
classication, Agrawal et al. [28] analyzed a social network formed
by respond-to relationships in newsgroups. By assuming that a
Opinion mining is an important problem in text mining [10],
respond-to relationship represents disagreement, they use a
which has been examining the subjectivity and polarity of text. In
max-cut graph-partitioning algorithm to break highly weighted
the context of this research, we are more interested in the
disagreeing edges and separate users into two groups. Their
classication of opinion polarity, that is, positive or negative. In
assumption of respond-to indicating disagreement may not hold
existing literature, related work on opinion classication can be
in other popular social media platforms. On Twitter, following or
divided into content-based and link-based approaches.
retweeting someone is more likely to mean that the users agree
G Model
J. Li et al. / Information & Management xxx (2016) xxxxxx 3
with each other or share a similar opinion. Therefore, more studies of real-world applications such as project scheduling [41], gene
on link-based opinion mining are based on the homophily function prediction [42], and image segmentation [43]. Unlike LP,
assumption [29], that is, the phenomenon of birds of a feather the min-cut method takes a global view and nds an optimal
ock together [30]. Users who are connected by a mutual partition of the entire graph to reach maximum consistency in the
relationship are more likely to share common opinions. In the two subgraphs. To the best of our knowledge, no prior study has
context of opinion classication on a social media platform like used the min-cut approach to classify users opinions on Twitter.
Twitter, researchers have investigated users mutual relationships
in a variety of forms, including follow [3134] mention [32,35], 2.3. Research questions
or retweet [3537]. In particular, using @ mentions, as a way of
creating connections on Twitter, may also indicate a desire to pay Among the state-of-the-art techniques for opinion classica-
attention (e.g., to information of interest). Among the homophily tion, content-based approaches (e.g., NB classiers) [1820] ignore
connections, several studies suggested that the follow links had the rich information of social linkages among users, while
little positive impact on classication accuracy [31]. Conover et al. link-based approaches (e.g., LP) [31,35,39] exploit linkage infor-
[35] found that combining attention links with follow links is mation but take a local view. Our study, based on the homophily
superior to using follow links alone. Wong et al. [36] also argue that assumption and taking a global view, is aimed at modeling users
following (a tweeter) is not a robust indicator of approval or opinions collectively in the entire structure of the network. In this
agreement on political opinions. A user may follow two sources on study, we investigate whether such a global optimization approach
Twitter with opposite political stances to obtain a more unbiased can outperform the state-of-the-art opinion classiers in terms of
comprehensive view. A follow link may exist as a stale edge in the accuracy.
Twitter following network, simply because a user forgets to Furthermore, traditional content-based opinion classication
unfollow a prominent tweeter whom he/she is no long interested methods generally require a training data set with a reasonably
in. By contrast, retweeting is often an explicit act of approval and large number of labeled instances. Data annotation is known to be
therefore a stronger evidence for agreement in opinions. a tedious task that requires a large amount of time, effort, and
On the basis of the homophily assumption, a variety of domain expertise. Given the enormous volume of data on social
analytical techniques have been developed for opinion classica- media platforms, how many data instances (e.g., tweets) do we
tion. One of the most popular techniques used in several studies is need to review and label for training to guarantee the potential
label propagation (LP), which analyzes the labels in a nodes capability of the classier? We are interested in determining the
neighborhood and tries to assign a label to each node in an iterative robustness of opinion classiers to the training set size and the
manner. Ren et al. [38] used this method for determining class method of choosing the most important data instances for training
labels of customer reviews based on a graph consisting of links opinion classiers. In particular, this study is aimed at addressing
representing similarity between nodes. In order to predict the the following research questions:
political alignment of Twitter users, Conover et al. [35] studied two Q1. Can a collective classier based on global optimization
types of communication networks, based on mention edges and outperform existing models for opinion classication?
retweet edges, and proposed a solution of community detection Q2. How robust are opinion classiers to different sizes of training
using an LP algorithm [39]. Speriosu et al. [31] developed an data sets?
opinion polarization approach by combining both lexical links (e.g., Q3. What is the best strategy to choose seed users as training data
text, hash tags, and emotions) and following links in a graph. A for opinion classication?
semi-supervised LP algorithm was used for classication with
different seeding methods. In addition, for tweet-level opinion 3. Model
classication, Rajadesingan and Liu [37] made an assumption that
two tweets being retweeted by the same users within a short time Debates in social media are often a dynamic process involving
period are likely similar in terms of opinion. They introduced a heated discussion between two sides. The involved users, however,
graph-based algorithm, namely ReLP (retweet label propagation) may only interact with a portion of users to express their opinion.
that starts with a set of seeds and iteratively propagates labels to In circumstances where one feels that his/her opinion is fully
similar tweets. Tan et al. [32] proposed an opinion polarization expressed, he/she may not generate much content. In such
approach by incorporating social network information such as circumstances, predicting each users opinion by analyzing his/
follow and mention linkages. Their graph-based classication her postings and relationships separately, as in most existing work,
models consist of loopy belief propagation to infer user-level may not be the best solution. Rather, we introduce a novel
sentiment labels. Rabelo et al. [33,34] applied a relational classier approach that takes into account the interdependencies between
combined with a relaxation labeling algorithm on a Twitter users and classies user opinions in a collective manner. Our
follower network to collectively predict political polarity of users. approach builds an undirected graph to represent users involved in
All the aforementioned studies showed that classication models an online debate. On the basis of the homophily assumption, we
based on links in a graph outperformed traditional content-based model opinion classication as a global consistency maximization
opinion classiers. (GCM) problem [42]. Our collective opinion classier can nd an
Most of these existing collective opinion classiers, such as LP optimal solution by partitioning the graph into two components,
[35] or relaxation labeling [33,34], take a local view and label each each representing one side of the debate.
node based on the labels of its direct neighbors. Such a labeling
process iterates in the graph until the solution converges, which 3.1. Problem formulation
may be highly computationally expensive. More importantly,
propagating labels based on dependencies in local neighborhoods We represent a social network as an undirected graph: G(V, E).
may not necessarily lead to a global optimal solution. In such a graph, V:{v1, v2, . . . , vn} is a set of n vertices/nodes, in
In optimization theory and graph theory, minimum cut which each node represents an online user. E:{eij} is a set of
(min-cut) is a combinatorial optimization problem that partitions edges/links, in which each link eij represents a certain relationship
the vertices of a graph into two disjointed subsets such that the (s) between two users i and j. The users discuss certain topics. For a
total capacity of the removed edges is minimum [40]. The min-cut specic topic, a user i may hold an opinion state xi. We assume only
method and its dual, max-ow, have been widely used in a number two possible opinion states in this study and set xi = 1 if user i is a
G Model
supporter (for) and xi = 1 if user i is an opponent (against) of a

certain topic. 0
In practice, it is often easier to identify some highly visible

1
opinion leaders. This group consists of the most vocal and
0 -1
opinionated users such as activists, politicians, and celebrities.
We can manually assign the opinion labels (either 1 or 1) to such 1
users as seeds (or training data), denoted by Vv. For the 0
remaining (majority) users in the network, the opinions are
unknown (xi = 0). Our goal is to assign a state value xi to each user i (a) Undirected graph G
with xi = 0 by analyzing the linkage between users in the social
network. 0
1
3.2. Global consistency maximization
s 0 -1 t
On Twitter, two nodes i and j can have different relationships/ source 1 sink
interactions, for example: 0
Following: user i follows user j, which indicates that i is (b) Directed graph H
interested in j.
Retweeting: user i retweets messages by user j, which often for against to be determined
indicates that i is distributing js opinion, possibly adding his/her
Fig. 1. Transforming an Undirected Graph G into Directed Graph H.
own opinion on the topic.
Commenting/replying: user i comments on messages posted by
user j, which indicates that i is expressing an opinion to either For each node i in G with xi = 1, we create a direct edge from
agree or disagree with j. node i to node t with wit = 1.
Liking/favoriting: user i likes messages by user j, which often For each edge in G connecting node i and node j such that xi = 1
indicates that i agrees with j. and xj = 0, we create a direct edge from node i to node j with edge
weight wij copied from G.
Each of these relationships between i and j can be regarded as For each edge in G connecting node i and node j such that xi = 0
evidence of (dis)agreement between the two users. All the and xj = 1, we create a direct edge from node i to node j with
evidence can be aggregated to an agreement score wij (associated edge weight wij copied from G.
with each edge eij) of their opinions on the topic. It is important to For each edge in G connecting node i and node j such that xi = 0
note that wij can be either positive (agreement) or negative and xj = 0, we create a direct edge from node i to node j and a
(disagreement). In this study, we focus only on retweeting direct edge from j to i, with wij = wji copied from G.
relationships. Retweeting is often considered an explicit act of The edges in G that are not incident on a node in state 0 are
approval and a more robust indicator of agreement than other ignored.
relationships such as following [36]. Like most related work
[3537], our approach is based on the homophily assumption, that With this new directed graph H, classifying opinions of users
is, when a user i retweets another user j, they tend to share the is converted to a min-cut problem. An st cut in graph H is a
same opinion on this topic. Therefore, the agreement score wij partition of the nodes of H into two sets S and T, where S
between the two users is proportional to the number of retweets contains the source node s and T contains node t. An edge (u, v)
between them (either i retweeting j or j retweeting i). crosses the cut if u lies in S and v lies in T. The weight of the cut
For each pair of users i and j with opinion states xi and xj, we is the sum of the weights of the edges crossing the cut. Because
dene wij xi xj as a consistency score of the edge connecting i and j. a high weight represents a high degree of agreement between
As wij > 0 indicates they are likely to agree each other, we generally two users, separating two highly agreeing nodes on two sides is
want to set xi and xj to be the same (either 1 or 1). Accordingly, if considered a reduction in consistency. Therefore, our objective is
wij < 0, we generally expect the two disagree, and xi and xj have to nd the st cut C with the smallest weight in H. For each node
opposite values. Thus, at the social network level, we argue that the i in S, we set xi = 1; for each node i in T, we set xi = 1. Because
optimal opinion state assignment should provide us the highest we have set the weights of edges incident on s and t (edges of
overall consistency across the network: types 1 and 2) to be innity, they will not be selected to
XX participate in C. Hence, the only edges in C are those incident on
MaximizeE wij xi xj nodes with state equal to 0. Each edge in C of types 3 or 4
i j corresponds to one inconsistent edge in G, an edge between two
s:t:xi ; xj f1; 1; 0g users who retweeted one another but have opposite opinions. In
In order to solve this optimization problem, we construct a new each pair of edges of type 5, at most one edge can belong to C;
directed graph H from the undirected graph G (which has users according to the denition of the st cut, the edge in the pair
with unknown opinion states). Each node of H corresponds to a directed from a node in T to a node in S does not belong to the
node of G. H also contains two new nodes: a source node s (for) cut. Therefore, each edge in C corresponds to exactly one
and sink node t (against). Unlike edges in G, each edge in H is inconsistent edge in G in a one-to-one manner. Hence, the cut C
directed. Fig. 1 illustrates how to transform an undirected graph G with the smallest weight gives the optimal state assignment that
(Fig. 1(a)) into a directed graph H (Fig. 1(b)) by creating six different minimizes the total weight of inconsistent edges in G.
types of edges based on the following rules: In the example in Fig. 1, suppose the weights of all edges not
incident on s or t are 1 and the min-cut in H is the edge connecting
For each node i in G with xi = 1, we create a direct edge from node the node in state 0 to the one in state 1. Thus, the algorithm
s to node i with wsi = 1. should assign a state of 1 to both nodes in state 0.
G Model
4. Experimental evaluation Content-based classier
4.1. Data set A content-based classier (CC) differentiates two sides of a

debate, supporters, and opponents, solely based on the content of
In order to compare and evaluate different opinion classica- their postings. During a Twitter debate, each user vi may express
tion methods, we use a real-world Twitter data set from Ref. [37]. his/her opinion via multiple tweets or retweets {tij}. We use a basic
This data set spans a period of 5 days during the heated debate on bag-of-words method to represent each posting as a feature
gun reform from 15 April 201318 April 2013. This data set was vector of word occurrences. For each visibly opinionated user, who
collected using Twitters streaming API with keywords such as always takes one side of two opinions, we can label all his/her
gun and gun control. The data set consists of 916,171 postings postings as 1 (for) if he/she is a supporter or 1 (against) if an
(505,637 original tweets and 410,534 retweets) by 491,860 users. opponent. Such a collection of postings is used as the training set
Among the users; there are visibly opinionated users; who are used for CC. In previous opinion-mining research, NB classiers are
as seeds for model training; and moderately opinionated users; found among the top-performing CC [1820]. Hence, in our
who are used for performance evaluation. experiments, we also choose NB to build an opinion classication
model as a baseline. In our experiments, we choose the commonly
Visibly opinionated users: On Twitter, lists are created and used algorithm, NB classier, to build an opinion classication
maintained by users as a form of groups. For example, lists such model. Then, for a user vi, whose opinion is to be determined, we
as Protect 2nd Amendment or Guns Save Lives are clearly use the trained NB classier to predict the opinion of each of his/
against gun reform; other lists such as Prevent Gun Violence her postings. For each posting tij by vi, the NB classier gives a
and Gun Safety are apparently supporters for gun reform. In probability score pij+ of the posting tij being positive and pij of it
this data set, 262 users (84 for and 178 against gun reform) were being negative. Then, all predicted probability scores of his/her
identied as visibly opinionated users from such related lists. postings are aggregated by class to determine user vis class label xi
This subset of users (Vv) is used as a training set in our as follows:
experiments. 8 X X
Moderately opinionated users: This group contains users who >
> 1 if p > p
>
> ij ij
< Xj Xj
posted between 2 and 4 tweets during the collection cycle, do not xi 1 if
> pij < p
ij
belong to any relevant list, and do not label themselves as for/ >
>
>
: j j
against gun reform in their Twitter prole page. From a randomly 0 otherwise
selected sample of 500 users in this group, each was manually
annotated as for or against based on their tweets. Out of the 500 If the sum of positive scores is higher than the sum of negative
users, 276 are for gun reform, 120 are against, and the rest shared scores, this user is labeled 1; if the sum of positive scores is less
information tweets (such as gun reform-related news) but did than the sum of negative scores, this user is labeled 1; otherwise,
not voice their personal opinions and were hence ignored. This the users opinion is undetermined and labeled 0.
subset of 396 users is considered as the test set in our
experiments. Label propagation
Given this data set, we build a graph G that represents the social Among the link-based opinion classiers, LP is one of the most
network of all users. In this graph G, each vertex represents a user popular and effective techniques in several studies [31,35,39]. In
and each edge represents a relationship between two users. This our experiments, we implement an LP algorithm for opinion
study only considers the retweet relationship between users. Thus, classication as a second baseline. Like GCM, LP is based on the
an edge eij between vertices i and j indicates user i retweeted j and/ same homophily assumption, that is, a user tends to share the same
or j retweeted i. The weight of eij, wij, is dened as the number of opinion as his/her neighbors. Fig. 2 shows the pseudo-code for the
retweets between i and j. It is worth noting that 215,669 out of the LP algorithm. The algorithm starts with graph G, including a subset
491,860 users are isolated vertices in the graph, because they did of seeds Vv, that is, visibly opinionated users, whose class labels are
not retweet or were not retweeted by others. Table 1 provides an known. In each iteration, the algorithm traverses all vertices in G in
overall description of this Twitter data set. a random sequence. Each vertex i is assigned the majority label of
its neighbors. In order to determine the majority label, we take into
4.2. Baseline methods and implementation account the weight wij of the edge between vertices i and j, which
can be calculated in the same manner as in GCM. In this study, we
For comparison, we develop two baseline methods for only consider the number of retweets between two users. The
classifying users of opposing opinions.
Input: G including labeled vertices Vv
Output: G with all vertices labeled
Table 1 Procedure:
Data description of the Twitter data set. 1. new TRUE
# users (vertices) 491,860 2. while (new)
# isolated users 215,669 3. new FALSE
# connected users 276,191 4. S random sequence ofall vertices in G
# visibly opinionated users 262 (84 for; 178 against) 5. for each vertex i in S:
# moderately opinionated users 396 (276 for; 120 against) 6. l the majority label of is neighbors
# of postings 916,171 7. ifis label xi != l
# of original tweets 505,637 8. xi l
# of retweets 410,534 9. new TRUE
# edges 384,331
Fig. 2. Pseudo-code of a Label Propagation Algorithm.
G Model
propagation process stops when there is no further change on the Table 2

Classication matrices of three opinion classication methods.
vertices labels.
Our experiments are conducted on a laptop computer with the Predicted Actual
following conguration: Intel Core i5-4310U CPU @ 2.00 GHz For Against
2.60 GHz, 8 GB RAM, 64-bit Operating system, Windows 8.1
CC For 184 17
Enterprise. All three opinion classication models are Against 92 103
implemented in Python. For both link-based classiers, LP and Undetermined 0 0
GCM, we use a well-known Python package for network analysis
called NetworkX (Version 1.8.1) (https://networkx.github.io/) to LP For 267 17
Against 8 102
build the undirected graph G and the directed graph H for GCM. A
Undetermined 1 1
function named minimum_edge_cut() in NetworkX is used to
compute the min-cut for GCM. In our evaluation, CC trains the GCM For 267 16
classier using the tweets by the 262 visibly opinionated users and Against 8 103
Undetermined 1 1
predicts the labels of the 396 moderately opinionated users, with
no need to classify all other users in the large network of
approximately 500,000 users. The computing time (including
training and testing) for CC was 105 s. By contrast, both LP and GCM highest overall accuracy of 93.43%. For both classes, GCM achieves
require computing over the entire network(s) of connected nodes, the highest precision, recall, and F-measure: 94.35%, 96.74%, and
not just those 396 in the test set, and hence consume more time. In 95.53% for class for; 92.79%, 85.83%, and 89.18% for class
particular, in our experiments, the computing time for LP was against, respectively. Of the 396 users in the test set, LP and
5013 s and that for GCM was 6351 s. It is worth noting that, GCM provide almost identical classication results, except for
although seemingly more time-consuming, LP and GCM complete three users. In particular, of these three users, GCM incorrectly
classifying all 276,191 connected users in the network, while CC classies one to the against category, while LP incorrectly
classies only 396 for evaluation. Nevertheless, the efciency of classies two, one to for and the other to against. By checking
the link-based classiers is not yet satisfactory and we encourage the linkages of these three users, we nd that each of them has two
further research for improvement. retweeting linkages to other users. When the two linked nodes
are assigned opposite labels, link-based classiers have difculty in
4.3. Evaluation metrics determining the correct label for this node, unless additional
information (e.g., content and weights on linkages) is available and
Our experiments compare the performance of three opinion considered.
classiers: CC, LP, and GCM. This task is a binary classication of Nevertheless, it is worth noting that the link-based classiers
users into two categories of opinions: for versus against. We use have a limitation to predict the label of a node: this node has to be
standard measures, that is, accuracy, precision, recall, and directly or indirectly connected with some labeled nodes in the
F-measure. Accuracy assesses the overall correctness of classica- graph. In our experiments, because the test set contains two
tion. Precision, recall, and F-measure evaluate the accuracy of each moderately opinionated users who are isolated vertices in the
class. These four metrics are dened as follows: graph, neither LP nor GCM can determine their class label. This
limitation, however, is not a problem for CC, because it predicts the
of correctly classified instances
accuracy class label of a user based on the content of his/her tweets. For the
of all classified instances 215,669 isolated users of the 491,860 users in our data set,
predicting their class labels would require a CC, like CC that we
implemented in this study, which results in a classication
of instances correctly classified in classi
precisioni accuracy of 75.47%.
of instances classified in classi
5.2. Training sample selection
of instances correctly classified in classi
recalli Because both LP and GCM are link-based, they can infer the class
of instances in classi
label of a user based on what is known of his/her associations. In
order to answer our research questions Q2 and Q3, we conducted
2 precisioni recalli further experiments and investigated how the link-based methods
F measurei perform in response to different sizes of training samples.
precisioni recalli
In our Twitter data set of 491,860 users, 262 are identied as
It is worth noting that, for the two link-based classiers LP and visibly opinionated users (for: 84; against: 178). We then examine
GCM, if a user is an isolated vertex in the graph without any the robustness of the three classiers by reducing the number of
retweeting edges with others, the classiers are unable to visibly opinionated users in the training set. In the context of
determine its label. Hence, such vertices will be labeled as Twitter, there are different ways of choosing users to create a
undetermined. In our evaluation, such undetermined users are training set. We consider the following four ranking methods in
also counted as errors. our experiments:
5. Results 1. Rank by the number of tweets (# tweets): This indicates how

active a user is when disseminating information and promoting
5.1. Classication performance his/her opinions.
2. Rank by the number of followers (# followers): This number is
In order to answer our research question Q1, Tables 2 and 3 directly available on Twitter to show the popularity of the user.
show the results of the three opinion classication models. The 3. Rank by the number of times being retweeted (# retweeted):
two link-based classiers, LP and GCM, achieve higher classica- Retweeting is a major mechanism on Twitter for disseminating
tion accuracy than the CC by >20%. In particular, GCM achieves the information and opinions. A highly retweeted user tends to be
G Model
Table 3
Prediction performance of the three opinion classication methods.
Methods Accuracy (%) For Against
Precision (%) Recall (%) F-measure (%) Precision (%) Recall (%) F-measure (%)
CC 72.47 91.54 66.67 77.15 52.82 85.83 65.40
LP 93.18 94.01 96.74 95.36 92.73 85.00 88.70
GCM 93.43 94.35 96.74 95.53 92.79 85.83 89.18
one with strong opinions and high inuence on a particular tweets method seems to be more robust and can maintain
topic. performance until the training set size is below approximately 10%.
4. Rank by the number of times being retweeted and retweeting This is because the selected users are the most active ones who
others (degree): In a directed graph, that shows the retweeting posted the most tweets, which provides sufcient data instances to
network among users, this number is the degree (in-degree + achieve the best performance possible for this classier.
out-degree) centrality score of each node. Figs. 4 and 5 show the results for the two link-based methods,
LP and GCM. For all four ranking methods, with only minor
For each ranking method, we compare the performance of the uctuations, LP shows consistently high performance for classi-
three opinion classiers (CC, LP, and GCM) by varying the cation, even when only a very low percentage of data instances are
percentage of top-ranked users included in the training set. In included in the training set. In particular, for rank by # retweeted
particular, when the percentage of top users is 100%, all visibly and rank by degree, even if only one positive and one negative
opinionated users were included in the training set. Then, we instance are available in the training set, due to the high
gradually decrease the percentage until only the topmost user connectivity of the graph, LP can still successfully propagate the
from each class is included. Figs. 35 show the average precision, correct class labels to nodes and achieve almost the same
recall, and F-measure scores of robustness tests for the three performance. For rank by # tweets and rank by # followers,
classiers. LP also maintains good classication performance until the
As shown in Fig. 3, as the size of the training set decreases, the training set is reduced below 1.5% (i.e., two instances for each
performance of CC is initially stable in the beginning and starts class) and 1% (one instance for each class), respectively.
decreasing when the percentage reaches about 25%. The irregular GCM achieves slightly better results than LP when 100% of the
shapes of the curves near the left end (percentage < 10%) indicate opinionated users are used as training data. However, the
that the classiers fail due to insufcient training data and robustness of GCM against the training data size varies for
therefore assign all or most instances in the test set to only one of different ranking methods used to select the seed users. In
the two classes. Among the four ranking methods, the rank by # particular, for rank by # tweets and rank by # followers, GCMs
Fig. 3. Robustness Test for CC (performance vs.% of training data used).
G Model
Fig. 4. Robustness Test for LP (performance vs.% of training data used).
classication performance starts dropping drastically when the reliable source for opinion classication. Link-based classiers,
training sets size is below 75%. It is highly likely that some key such as LP and GCM, consider that users opinions are not
nodes that are critical to graph partition in GCM classiers were independent, but interrelated. Rather than predicting the opinion
excluded in the training set due to low number of tweets and/or of each individual user separately, LP and GCM try to collectively
followers. Therefore, GCM fails to work for the remaining nodes in classify users based on their linkage structures. In a social network
the graph. Nevertheless, for the other two ranking methods rank involving two sides debating, linkages may carry a variety of
by # retweeted and rank by degree, GCMs performance is a lot meanings regarding the relationship between the two connected
more robust. When the training set percentage is reduced from users. In our study, we design the classication algorithm based on
100% to 2%, GCM constantly gives exactly the same classication a highly simplied assumption that a retweeter tends to share the
results (best among all: 96.96% precision, 95.63% recall, and 96.29% same opinion as the retweetee. Even under such a strong
F-measure). Only when the training set is reduced below 2% (i.e., assumption, the classiers can give surprisingly high accuracy.
less than two positive and two negative instances), the GCM Moreover, the results of robustness tests further strengthen our
classier fails to work. conclusion on the superiority of link-based opinion classiers over
CC, particularly when only a small training set is available. As
6. Discussion shown in Fig. 3, CC show ne robustness against reduced training
set size. Particularly for the rank by # tweets method, the
Our experimental study helps us answer the three research classication accuracy (72%) did not drop much until the training
questions. The results also provide us with several interesting set was reduced to approximately 10%. This might be partly
insights into the interaction dynamics of online debates on social attributed to the high degree of information redundancy exhibited
media such as Twitter. in tweets and retweets, especially during a short time period of
The main challenges for opinion mining on Twitter data include debate on one specic topic. Even visibly opinionated users do not
the short length (i.e., 140-character limit) and informal style. Given always post original tweets. They retweet, too. As long as we have
limited word features and high variations in style, traditional CC enough content (tweets and retweets) from the most active users,
showed poor predictive capability for differentiating users with CC can still show appreciable performance. By contrast, for link-
opposite opinions. People can say different entities to express their based classiers, if chosen wisely (e.g., based on rank by #
opinions. However, their precise words may not clearly reveal retweeted and rank by degree), only a handful (e.g., one or two
which side they take in a debate. It is important to consider that a instances from each class) of visibly opinionated users could be
debate must involve people from two sides arguing against each sufcient as training data instances to achieve the highest
other. On each side, remarks of opinionated leaders can be widely classication accuracy (93%). Nevertheless, the link-based
spread by supporters via actions such as retweet and like on a classier, GCM, could become unpredictable if some critical
social media platform. Such interaction and communication annotated nodes are left out (e.g., based on rank by # tweets
results in a high degree of connectivity in social networks. These and rank by # followers). Therefore, the connectivity to other
linkages between users become an additional and evidently more nodes in the graph is key to robust performance of link-based
G Model
Fig. 5. Robustness Test for GCM (performance vs.% of training data used).
classiers. During an online debate, opinion leaders may not shows high accuracy and robustness to the limited size of labeled
necessarily be the most active tweeters or well-known celebrities data for training.
with the most followers. Rather, given a reasonable size of This study has signicant practical and theoretical implications
followers, someone who posts tweets with sharp opinions or for big data analytics in e-commerce. For practitioners, a more
brilliant remarks, as long as they get more retweets, can stand out accurate classier that requires a smaller training data set can
as real leaders of public opinion. Successfully identifying these signicantly reduce the effort needed to plan and conduct
critical opinionated users could guarantee a high accuracy for marketing campaigns. Facing a social network of competitive
opinion classication with only minimum effort to create a opinions, companies need to develop different marketing
training data set. On the contrary, if inappropriate ranking strategies for populations that are for or against their products.
methods were chosen, one may fail to identify the real critical By conducting opinion classication on a topic at multiple time
opinionated users, which can lead to low accuracy of opinion points, companies can obtain a dynamic view of how opinions
classication. Furthermore, identication of the critical users and evolve over time so that they can react and adjust their strategies
further analyzing their content and behaviors can provide better accordingly. In the big-data era, such data-driven analytics will
insights into the key arguments and political appeal. become increasingly important to commerce. Moreover, our
results provide support for further theoretical studies on the
7. Conclusions and future directions global consistency of social networks in social media analytics. As
we have argued, global consistency may be accounted for by the
In this study, we propose a GCM algorithm to address the joint force of network connectivity and social inuence. Unlike
collective opinion classication problem. Our algorithm most previous research focused on individual-level inuences, our
collectively assigns users states in Twitter discussions that match study indicates the existence of global-level self-organization
their retweeting behaviors to others postings. In experiments on a phenomenon, which is worth further investigation.
real-world data set, the proposed approach is signicantly more In the future, we will explore the following directions to extend
accurate than the state-of-the-art approach. Further analysis this study. (1) We will investigate the combination of content data
shows that our proposed algorithm performs exceptionally well and linkage data for collective opinion classication. (2) We will
even with only a handful of training users. investigate social network structures other than two-sided debates
Classifying user opinions is a critical challenge for e-commerce on social media and develop new opinion classication algorithms.
companies. This study provides a novel opinion classication (3) We will continue studying factors (e.g., topics and stages of
solution that addresses the volume, velocity, and variety issues of events) that can affect the performance of opinion classication in
big social media data. By analyzing a large amount of opinion- more social media data sets and contexts. Our ultimate goal is to
related data in microblogging sites, our solution based on GCM build an effective approach that is scalable to incorporate insights
exploits the social interactions among users for classication. It from social relationships for opinion mining.
G Model
References [26] S. Feng, D. Wang, G. Yu, W. Gao, K.F. Wong, Extracting common emotions from
blogs based on ne-grained sentiment clustering, Knowl. Inf. Syst. 27 (2011)
[1] H. Chen, R.H.L. Chiang, V.C. Storey, Business intelligence and analytics: from 281302, doi:http://dx.doi.org/10.1007/s10115-010-0325-9.
big data to big impact, Mis Q. 36 (2012) 11651188, doi:http://dx.doi.org/ [27] P. Sen, G.M. Namata, M. Bilgic, L. Getoor, B. Gallagher, T. Eliassi-Rad, Collective
10.1145/2463676.2463712. classication in network data, AI Mag. 29 (2008) 93106, doi:http://dx.doi.
[2] R.F. Lusch, Y. Liu, Y. Chen, The phase transition of markets and organizations: org/10.1145/1217299.1217304.
the new intelligence and entrepreneurial frontier, IEEE Intell. Syst. 2010 (2016) [28] R. Agrawal, S. Rajagopalan, R. Srikant, Y. Xu, Mining newsgroups using
7175, doi:http://dx.doi.org/10.1109/MIS.2010.27. networks arising from social behavior, Proc. Twelfth Int. Conf. World Wide
[3] A. Doan, R. Ramakrishnan, A.Y. Halevy, Crowdsourcing systems on the world- Web - WWW 03 (2003) 529, doi:http://dx.doi.org/10.1145/775152.775227.
wide web, Commun. ACM 54 (2011) 86, doi:http://dx.doi.org/10.1145/ [29] P.F. Lazarsfeld, R.K. Merton, Friendship as a social process: a substantive and
1924421.1924442. methodological analysis, Free. Control Mod. Soc. 18 (1954) 1866, doi:http://
[4] C. Forman, A. Ghose, B. Wiesenfeld, Examining the relationship between dx.doi.org/10.1111/j.1467-8705.2012.02056_3.x.
reviews and sales: the role of reviewer identity disclosure in electronic [30] M. McPherson, L. Smith-Lovin, J.M. Cook, Birds of a feather: homophily in
markets, Inf. Syst. Res. 19 (2008) 291313, doi:http://dx.doi.org/10.1287/ social networks, Annu. Rev. Sociol. 27 (2001) 415444, doi:http://dx.doi.org/
isre.1080.0193. 10.1146/annurev.soc.27.1.415.
[5] N. Archak, A. Ghose, P.G. Ipeirotis, Deriving the pricing power of product [31] M. Speriosu, N. Sudan, S. Upadhyay, J. Baldridge, Twitter polarity classication
features by mining consumer reviews, Manage. Sci. 57 (2011) 14851509, doi: with label propagation over lexical links and the follower graph, Proc. Conf.
http://dx.doi.org/10.1287/mnsc.1110.1370. Empir. Methods Nat. Lang. Process (2011) 5356.
[6] A. Ghose, P.G. Ipeirotis, B. Li, Designing ranking systems for hotels on travel [32] C. Tan, L. Lee, J. Tang, L. Jiang, M. Zhou, P. Li, User-level sentiment analysis
search engines by mining user-generated and crowdsourced content, Mark. incorporating social networks, Proc. 17th ACM SIGKDD Int. Conf. Knowl.
Sci. 31 (2012) 493520, doi:http://dx.doi.org/10.1287/mksc.1110.0700. Discov. Data Min. - KDD 11 136 (2011) 1397, doi:http://dx.doi.org/10.1145/
[7] C. Oh, O. Sheng, Investigating predictive power of stock micro blog sentiment 2020408.2020614.
in forecasting future stock price directional movement, ICIS (2011) 119. [33] J. Rabelo, R.B.C. Prudencio, F. Barros, Collective classication for sentiment
[8] J.L. Zhao, S. Fan, D. Hu, Business challenges and research directions of analysis in social networks, 2012 IEEE 24th Int. Conf. Tools with Artif. Intell.
management analytics in the big data era, J. Manag. Anal. 1 (2014) 169174, (2012) 958963, doi:http://dx.doi.org/10.1109/ICTAI.2012.135.
doi:http://dx.doi.org/10.1080/23270012.2014.968643. [34] J. Rabelo, R.B.C. Prudencio, F. Barros, Using link structure to infer opinions in social
[9] E.-P. Lim, H. Chen, G. Chen, Business intelligence and analytics: research networks, IEEE Int. Conf. Syst. Man, Cybern., Seoul, Korea, 2012, pp. 681685.
directions, ACM Trans. Manage. Inf. Syst. 3 (2013) 110, doi:http://dx.doi.org/ [35] M.D. Conover, B. Gonalves, J. Ratkiewicz, A. Flammini, F. Menczer, Predicting
10.1145/2407740.2407741. the political alignment of twitter users, Proc. 2011 IEEE Int. Conf. Privacy,
[10] B. Pang, L. Lee, Opinion mining and sentiment analysis, found, Trends1 Inf. Secur. Risk Trust IEEE Int. Conf. Soc. Comput. PASSAT/SocialCom 2011 (2011)
Retr. 2 (2008) 1135, doi:http://dx.doi.org/10.1561/1500000011. 192199, doi:http://dx.doi.org/10.1109/PASSAT/SocialCom.2011.34.
[11] G. Walsh, K.P. Gwinner, S.R. Swanson, What makes mavens tick? Exploring the [36] F. Wong, C. Tan, S. Sen, M. Chiang, Quantifying political leaning from tweets
motives of market mavens initiation of information diffusion, J. Consum. and retweets, Int. AAAI Conf. Weblogs Soc. Media (2013).
Mark. 21 (2004) 109122, doi:http://dx.doi.org/10.1108/07363760410525678. [37] A. Rajadesingan, H. Liu, Identifying users with opposing opinions in Twitter
[12] J.E. Phelps, R. Lewis, L. Mobilio, D. Perry, N. Raman, Viral marketing or debates, Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell.
electronic word-of-mouth advertising: examining consumer responses and Lect. Notes Bioinformatics) (2014) 153160, doi:http://dx.doi.org/10.1007/
motivations to pass along email, J. Advert. Res. 44 (2004) 333348, doi:http:// 978-3-319-05579-4-19.
dx.doi.org/10.1017/S0021849904040371. [38] Y. Ren, N. Kaji, N. Yoshinaga, M. Toyoda, M. Kitsuregawa, Sentiment
[13] Z. Zhang, X. Li, Y. Chen, Deciphering word-of-mouth in social media, ACM classication in resource-Scarce languages by using label propagation, 25th
Trans. Manage. Inf. Syst. 3 (2012) 123, doi:http://dx.doi.org/10.1145/ Pacic Asia Conf. Lang. Inf. Comput. (2011) 420429.
2151163.2151168. [39] U.N. Raghavan, R. Albert, S. Kumara, Near linear time algorithm to detect
[14] E. Demers, C. Vega, Soft information in earnings announcements: news or community structures in large-scale networks, Phys. Rev. E Stat. Nonlinear Soft
noise? INSEAD Bus. Sch. World (2010) 170, doi:http://dx.doi.org/10.2139/ Matter Phys. 76 (2007), doi:http://dx.doi.org/10.1103/PhysRevE.76.036106.
ssrn.1153450. [40] J.X. Hao, J.B. Orlin, A faster algorithm for nding the minimum cut in a directed
[15] C.C. Yang, Understanding online consumer review opinions with sentiment graph, J. Algorithms 17 (1994) 424446, doi:http://dx.doi.org/10.1006/
analysis using machine learning sentiment analysis using machine learning, jagm.1994.1043.
Pacic Asia J. Assoc. Inf. Syst. 2 (2010) 6. [41] R.H. Mhring, A.S. Schulz, F. Stork, M. Uetz, Solving project scheduling
[16] M.M. Mostafa, More than words: social networks text mining for consumer problems by minimum cut computations, Manage. Sci. 49 (2003) 330350,
brand sentiments, Exp. Syst. Appl. 40 (2013) 42414251, doi:http://dx.doi.org/ doi:http://dx.doi.org/10.1287/mnsc.49.3.330.12737.
10.1016/j.eswa.2013.01.019. [42] T.M. Murali, C.-J. Wu, S. Kasif, The art of gene function prediction, Nat. Biotech.
[17] W. Li, H. Xu, Text-based emotion classication using emotion cause extraction, 24 (2006) 14741475, doi:http://dx.doi.org/10.1038/nbt1206-1474.
Exp. Syst. Appl. 41 (2014) 17421749, doi:http://dx.doi.org/10.1016/j. [43] P.F. Felzenszwalb, D.P. Huttenlocher, Efcient graph-based image
eswa.2013.08.073. segmentation, Int. J. Comput. Vis. 59 (2004) 167181, doi:http://dx.doi.org/
[18] S. Fouzia Sayeedunnissa, A. Hussain, M. Hameed, Supervised opinion mining of 10.1023/B:VISI.0000022288.19776.77.
social network data using a bag-of-words approach on the cloud, in: J.C.
Bansal, P. Singh, K. Deep, M. Pant, A. Nagar (Eds.), Proc. Seventh Int. Conf. Bio- Jiexun Li is an Assistant Professor in the Department of Decision Sciences, College of
Inspired Comput. Theor. Appl. (BIC-TA 2012), Springer, India, 2012, pp. 299 Business & Economics, at Western Washington University. He earned his Ph.D. in
309, doi:http://dx.doi.org/10.1007/978-81-322-1041-2_26. MIS from the Eller College of Management at the University of Arizona, M.S and B.S
[19] Z. Akaichi, Text mining facebook status updates for sentiment classication, in MIS at Tsinghua University in China. His research interests include data mining,
2013 17th Int. Conf. Syst. Theory, Control Comput. ICSTCC 2013; Jt. Conf. SINTES business analytics, social media analytics, and health informatics. His research has
2013, SACCS 2013, SIMSIS 2013 - Proc. (2013) 640645, doi:http://dx.doi.org/ appeared in journals including JMIS, DSS, IEEE Transactions, JASIST, JAIS,
10.1109/ICSTCC.2013.6689032. Bioinformatics, CACM, ESA, ISF, and so on.
[20] S. Ben Hamouda, J. Akaichi, Social networks text mining for sentiment
classication: the case of facebook statuses updates in the arabic spring era,
Int. J. Appl. Innov. Eng. Manage. 2 (2013) 470478.
[21] M. Mysln, S.H. Zhu, W. Chapman, M. Conway, Using twitter to examine Xin Li is an Assistant Professor in the Department of Information Systems at the City
smoking behavior and perceptions of emerging tobacco products, J. Med. University of Hong Kong. He received his Ph.D. in Management Information Systems
Internet Res. 15 (2013), doi:http://dx.doi.org/10.2196/jmir.2534. from the University of Arizona. He received his Bachelors and Master's degrees
[22] A. Hassan, A. Abbasi, D. Zeng, Twitter sentiment analysis: A bootstrap from the Department of Automation at Tsinghua University, China. His work has
ensemble framework, in: Proc. - Soc. 2013, 2013: 357364. 10.1109/ appeared in the MISQ, JMIS, INFORMS JOC, DSS, JASIST, ACM and IEEE Transactions,
SocialCom.2013.56. among others.
[23] F.H. Khan, S. Bashir, U. Qamar, TOM: twitter opinion mining framework using
hybrid classication scheme, Decis. Support Syst. 57 (2014) 245257, doi:
http://dx.doi.org/10.1016/j.dss.2013.09.004. Bin Zhu is an Associate Professor of Business Information Systems at Oregon State
[24] W. Maharani, Microblogging sentiment analysis with lexical based and University. She earned her Ph.D. in Management Information Systems from
machine learning approaches, Inf. Commun. Technol. (ICoICT), 2013 Int. Conf. University of Arizona. Her current research interests include business intelligence,
(2013) 439443, doi:http://dx.doi.org/10.1109/ICoICT.2013.6574616. information analysis, social network, human-computer interaction, information
[25] G. Paltoglou, M. Thelwall, Twitter, MySpace, digg: unsupervised sentiment visualization, computer-mediated communication, and knowledge management
analysis in social media, ACM Trans. Intell. Syst. Technol. 3 (2012) 119, doi: systems. Her work has appeared in ISR, DSS, JASIST, IEEE Transactions, D-Lib
http://dx.doi.org/10.1145/2337542.2337551. Magazine, and so on.

Information & Management: Jiexun Li, Xin Li, Bin Zhu

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Information & Management: Jiexun Li, Xin Li, Bin Zhu

Transféré par

Droits d'auteur :

Formats disponibles

G Model

INFMAN 2917 No. of Pages 10

Information & Management xxx (2016) xxxxxx

Contents lists available at ScienceDirect

Information & Management

User opinion classication in social media: A global consistency

1. Introduction with hundreds of millions of active users and postings generated

2 J. Li et al. / Information & Management xxx (2016) xxxxxx

J. Li et al. / Information & Management xxx (2016) xxxxxx 3

4 J. Li et al. / Information & Management xxx (2016) xxxxxx

supporter (for) and xi = 1 if user i is an opponent (against) of a

In practice, it is often easier to identify some highly visible

J. Li et al. / Information & Management xxx (2016) xxxxxx 5

4. Experimental evaluation Content-based classier

4.1. Data set A content-based classier (CC) differentiates two sides of a

6 J. Li et al. / Information & Management xxx (2016) xxxxxx

propagation process stops when there is no further change on the Table 2

5. Results 1. Rank by the number of tweets (# tweets): This indicates how

J. Li et al. / Information & Management xxx (2016) xxxxxx 7

Methods Accuracy (%) For Against

Fig. 3. Robustness Test for CC (performance vs.% of training data used).

8 J. Li et al. / Information & Management xxx (2016) xxxxxx

Fig. 4. Robustness Test for LP (performance vs.% of training data used).

J. Li et al. / Information & Management xxx (2016) xxxxxx 9

10 J. Li et al. / Information & Management xxx (2016) xxxxxx

Vous aimerez peut-être aussi