Vous êtes sur la page 1sur 18

Journal of Marketing Management

ISSN: 0267-257X (Print) 1472-1376 (Online) Journal homepage: http://www.tandfonline.com/loi/rjmm20

We (dont) know how you feel a comparative


study of automated vs. manual analysis of social
media conversations

Ana Isabel Canhoto & Yuvraj Padmanabhan

To cite this article: Ana Isabel Canhoto & Yuvraj Padmanabhan (2015) We (dont)
know how you feel a comparative study of automated vs. manual analysis of social
media conversations, Journal of Marketing Management, 31:9-10, 1141-1157, DOI:
10.1080/0267257X.2015.1047466

To link to this article: http://dx.doi.org/10.1080/0267257X.2015.1047466

Published online: 18 Jun 2015.

Submit your article to this journal

Article views: 519

View related articles

View Crossmark data

Citing articles: 1 View citing articles

Full Terms & Conditions of access and use can be found at


http://www.tandfonline.com/action/journalInformation?journalCode=rjmm20

Download by: [University of Veracruzana] Date: 06 November 2015, At: 16:51


Journal of Marketing Management, 2015
Vol. 31, Nos. 910, 11411157, http://dx.doi.org/10.1080/0267257X.2015.1047466

We (dont) know how you feel a comparative


study of automated vs. manual analysis of social
media conversations
Ana Isabel Canhoto, Faculty of Business, Oxford Brookes University,
UK
Yuvraj Padmanabhan, Mindgraph, UK
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

Abstract The ever-growing volume of brand-related conversations on social


media platforms has captivated the attention of academics and practitioners,
as the analysis of those conversations promises to offer unparalleled insight
into consumers emotions. This article takes a step back from the hype, and
investigates the vulnerabilities related to the analysis of social media data
concerning consumers sentiment. A review of the literature indicates that the
form, focus, source and context of the communication may negatively impact on
the analysts ability to identify sentiment polarity and emotional state. Likewise,
the selection of analytical tool, the creation of codes, and the classification of
the data, adversely affect the researchers ability to accurately assess the
sentiment expressed in a social media conversation. Our study of Twitter
conversations about coffee shows low levels of agreement between manual
and automated analysis, which is of grave concern given the popularity of the
latter in consumer research.

Keywords consumer behaviour; emotions; sentiment analysis; social media;


data analysis; CAQDAS

Introduction
Social media data have been heralded as revolutionary to study consumer
behaviour, by practitioners (e.g. Casteleyn, Mottart, & Rutten, 2009) and
social scientists (e.g. Baker, 2009) alike. Social media platforms allow for the
collection of data in real time and in a non-intrusive manner (Murthy, 2008),
and are more cost-effective than traditional approaches (Christiansen, 2011).
They are particularly promising in the study of feelings and emotions (Cooke
& Buckley, 2008). Accordingly, researchers have investigated various aspects of
social media data collection, such as the ethics of using such data (e.g. Nunan &
Domenico, 2013), or the impact of varying levels of use of social media
platforms by class, race and gender on research results and even the type of
work done (Murthy, 2008). However, there is very limited discussion in the
literature of the issues arising once the data has been collected. Such absence of
2015 Westburn Publishers Ltd.
1142 Journal of Marketing Management, Volume 31

research represents a significant gap in the literature, given that how researchers
process and analyse online data is profoundly social with tremendous
sociological implications (Halford, Pope, & Weal, 2013, p. 180). That is, the
lack of published research specifically examining how social media data is
processed and analysed presents a significant gap in the understanding of the
value and limitations of using such data in consumer research and, specifically, in
the study of consumer sentiment.
The volume of data available and the pressure to process it quickly means that,
increasingly, data analysis is done without human intervention (Nunan & Domenico,
2013). Accordingly, there is now an abundance of commercial software tools that
mine textual data and produce reports of expressed opinions and sentiment (Sterne,
2010). These tools produce scores reflecting the emotions expressed in the segments
of text analysed (Cambria & Hussain, 2012), though it is difficult to assess how good
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

or limited the tools are, and what aspects of automated sentiment analysis are
particularly strong or weak, given that the companies behind such commercial
applications do not reveal their algorithms (Beer & Burrows, 2013). This limitation
violates one of the key principles of using software to analyse qualitative data, namely
that researchers need to verify the accuracy of those tools (Brown, Taylor, Baldy,
Edwards, & Oppenheimer, 1990).
The promotional literature of the providers of automated sentiment analysis tools
typically report an accuracy rate between 60% and 85% (Carson, 2014). While these
percentages suggest a high level of accuracy (see Gwet, 2012), it is not clear how the
coefficients were calculated. Moreover, it is not possible to independently verify the
companies claims, as there is no information on the input for those calculations; for
instance, the sample used. This uncertainty is problematic for research, given that the
purpose for which a software product was developed may make it unsuitable for
application in another type of research (Rose, Spinks, & Canhoto, 2014). Given that
social media data are increasingly used as a source of insight into consumers
emotions, it is imperative to investigate the issues emerging in the analysis of such
data, both concerning the type of research (i.e. emotions) and concerning the use of
software in the analysis of social media conversations. Accordingly, this study
investigates the following research question:
What are the vulnerabilities related to the analysis of social media data concerning
consumers sentiment towards a product?
The next section outlines the rationale, goals and processes of sentiment analysis,
before considering how the nature of social media conversations and the
characteristics of automated analysis tools may limit the researchers ability to
identify sentiment polarity and emotional state in social media content. The
subsequent section details the research design that saw 200 Twitter posts about
coffee being analysed manually and with various software products, to study
sentiment towards this drink and its consumption practices. The results revealed
low levels of inter-coder agreement, not just in terms of manual vs. automated
approaches, but also between the automated tools considered. These low levels of
agreement were particularly noticed for negative or neutral sentiments, for segments
of text with multiple foci, and where the expression of sentiment is made via
abbreviations and subtle elements, or results from the absence of the product in
question. The implications of these findings for research on consumer behaviour are
considered, and directions for further research presented.
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1143

Using social media data to study sentiment

The study of emotions is a topical subject within the marketing literature, given
their impact on consumer behaviour for instance, being in a good mood makes
people more willing to take risks (Johnson & Tversky, 1983) and shortens the
decision-making process (Forgas, 1991). The role of emotions in consumer
behaviour (see Loewenstein & Lerner, 2003, for a detailed discussion) led to the
development of sentiment analysis, also known as opinion mining. Sentiment
analysis consists of a number of techniques and, increasingly, technical artefacts
to identify and analyse feelings. The goals of sentiment analysis are to identify
whether consumers are expressing emotions, as well as the nature of those emotions
and how strong those feelings are.
Sentiment data can be collected via experiments, as traditionally done in the field
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

of psychology, for instance. Experiments allow for the simultaneous assessment of


various dependent variables, but have several limitations in terms of mood induction
and manipulation, as well as the isolation of independent and dependent effects
(Cohen, Pham, & Andrade, 2008). An alternative approach is to conduct
interviews or surveys, which ask participants to reflect on previous emotionally
charged experiences, but this approach, too, has its drawbacks. One limitation is
that participants may be unwilling to invoke or revisit emotionally charged memories
(Cohen et al., 2008). The other restriction is that the quality of the insight depends
on the participants ability to verbalise their emotions (Cooke & Buckley, 2008).
Social media promise to overcome these limitations. As individuals became
lifecasters (Patterson, 2012) who share information about themselves, their
behaviours and relationships (Kietzmann, Hermkens, McCarthy, & Silvestre, 2011),
so social media platforms became popular vehicles to study consumers on a large
scale and in a natural setting (Kivran-Swaine, Brody, Diakopoulos, & Naaman,
2012). Moreover, as a significant share of social media conversations express
sentiment about products and brands (Jansen, Zhang, Sobel, & Chowdury, 2009),
these platforms have become very appealing as a source of information about
consumer sentiment for product and brand managers.
Once data, for instance online reviews, have been collected from the relevant
social media platforms, researchers can analyse those inputs looking for terms,
phrases or expressions that reflect sentiment. There are a number of specialist
software products available to mine documents using a range of keywords (Thet,
Na, & Khoo, 2010), such as great for a positive emotion, or revolting for a
negative one. The text segments collected are subsequently classified according to
their sentiment polarity, that is, whether the overall feeling expressed in the unit
of text selected is positive or negative (Thelwall, Buckley, & Paltoglou, 2011).
In addition to studying sentiment polarity, researchers can analyse emotional
states. It is valuable to understand the specific type of emotion experienced by the
consumer because emotions are highly differentiated in their impact (Laros &
Steenkamp, 2005). For instance, unhappiness and anger are both negative
emotions. Yet, they have different consequences in terms of consumer behaviour, in
that the former lacks a focus, whereas the latter tends to be targeted and will lead to
context-specific responses (Bushman, Baumeister, & Phillips, 2001).
The study of sentiment polarity and emotional state may be increasingly popular,
but is not without its challenges. First, the expression of sentiment varies with
1144 Journal of Marketing Management, Volume 31

cultures and over time, both in terms of the languages syntactic features and in terms
of style (Abbasi, Chen, & Salem, 2008). Moreover, a single segment of text may
express more than one sentiment, and refer to more than one object, creating
uncertainty regarding the prevalent sentiment. For instance, the author of a
product review may judge the product positively, but express dissatisfaction with
specific features (Liu, 2010). In this example, whether the review should be classified
as positive or negative depends on whether the focus of the analysis is the overall
impression or the specific features, respectively. Another challenging factor is that
sentiment may be expressed through subtle elements such as the use of exception or
conditional clauses (Kim & Hovy, 2006), or even the choice of words and their
placement (Davis & OFlaherty, 2012). Lastly, sentiment about an object may not be
expressed directly but through comparisons, instead, which requires the analyst to
have domain knowledge to identify whether the comparative terms used reflect a
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

positive or a negative opinion about the product (Liu, 2010).


In addition to these inherent challenges of sentiment analysis, studying the
expression of emotions on social media may present its own additional difficulties,
as summarised in Table 1 and discussed next.
In terms of the syntactic and stylistic aspects of expressing sentiment, it should be
noted that social media users tend to apply certain colloquialisms and abbreviations
with multiple and ever-evolving meanings. For instance, LOL started by being an
acronym for lots of love, but now is also used as a replacement for laughing out
loud. In addition, users may also employ an increasing array of text symbols or
emoticons to support the communication of feelings and emotional statuses.
Another challenge is that social media messages such as status updates or
comments on a blog tend to be fairly short. This characteristic of social media
messages arises partly because of the features of particular platforms, such as the
limit of 140 characters for Twitter messages. However, this characteristic also
reflects the instant and informal nature of communication on social media.
Messages are frequently complemented, or even replaced, by non-textual
elements such as links to external sources of information, pictures, videos
and audio files (Kietzmann et al., 2011). The use of short messages and non-
textual elements may hinder the identification of multiple foci within one
segment of text.
It has also been noted that social media content that is critical of brands often
employs irony and sarcasm (e.g. Dahl, 2015). It is very difficult for analysts to detect
and classify sarcastic content in general, due to its nuances and multidimensional
nature (Vanden Bergh, Lee, Quilliam, & Hove, 2011), and the same applies for
sentiment analysis. The importance of contextual knowledge for the study of

Table 1 Challenges in sentiment analysis.

Related to. . . General Social media specific


Form Syntax and style Use of colloquialisms, abbreviations,
symbols and emoticons
Focus Multiple sentiments Short text segments and use of non-textual
and objects elements
Source Subtlety Use of irony and sarcasm
Context Contextual knowledge Complexity of social media
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1145

online content has been emphasised by several authors, including Kozinets (2002)
who reflected on the complexity of studying meaning in social media conversations.
Yet, existing data analysis software has very limited ability to analyse data in context
(Cambria & Hussain, 2012).
The large volume of social media data available, and the complexity of
monitoring conversations over multiple and very diverse platforms, have led
both managers (Davis & OFlaherty, 2012) and researchers (Nunan &
Domenico, 2013) to turn to third-party providers of automated tracking and
analysis of social media data. This is not a problem in itself, given that specialist
software can help with the manipulation of big data sets and the generation of
codes in qualitative research (Lage & Godoy, 2008). Using software to analyse
textual data can also improve the credibility of a qualitative study, even if it does
not change the rigour of the analytical work done, or the outcome of the analysis
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

(Ryan, 2009). However, the characteristics of automated tools for instance, the
algorithms used reflect the purpose for which the software was developed and,
thus, analysts need to carefully assess the suitability of that software for their
projects (Rose et al., 2014). Researchers also need to be actively involved in the
creation of categories, and in deciding what data to retrieve and collate (Basit,
2003). Finally, they need to carefully verify the accuracy of the classification, as
content analysis software has limitations in terms of discerning nuances in
meaning, leading to the partial retrieval of information (Brown et al., 1990). In
the case of sentiment analysis software, it is very difficult for researchers to verify
any of these aspects (Beer & Burrows, 2013), putting researchers at risk of acting
on inaccurate data analysis outcomes.
In summary, the study of sentiment in social media conversations, while popular
and promising, may be negatively impacted by issues related to the communication of
sentiment, as well as issues related to the automated analysis of that sentiment. These
vulnerabilities may affect the researchers ability to identify and classify the texts
sentiment polarity and emotional state, as depicted in Figure 1.
The following section describes how the issues depicted in Figure 1 were
investigated in an empirical setting.

Figure 1 Sources of vulnerability in the study of sentiment in social media


conversations.

Issues associated with: Negatively impact on accuracy of:

Communication of sentiment
Form
Focus
Source
Context

Sentiment polarity
Emotional state

Automated analysis of sentiment


Selection of tool
Creation of codes
Classification of data
1146 Journal of Marketing Management, Volume 31

The empirical study

To investigate the issues depicted in Figure 1, empirically, we set out to study the
expression of sentiment on social media following a qualitative content analysis
(QCA) approach. QCA is a systematic approach to the analysis of both verbal and
visual textual material in either paper or digital format, including online material
(Rose et al., 2014, p. 135). In QCA, the language or imagery used is the focus of the
research, rather than a resource, and so its the words themselves, and how they are
used, that are analysed (Schreier, 2012).
As the topic of food and beverages is the one most widely discussed on social
media (Forsyth, 2011), this was chosen as the focus for data collection. The topic of
food is also extremely important in the social sciences literature, with the past
20 years having seen an explosion of work (p. 369), particularly within the sub-
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

topics of children, health and social aspects (Uprichard, 2013). Within the broad
topic of food, it was decided to focus on social media conversations about coffee as
this broadly popular beverage is charged with a wide range of cultural meanings
(Grinshpun, 2014). Moreover, coffee has been the subject of other netnographic
studies (e.g. Kozinets, 2002), which offered a starting point for the development of
coding schemes for the present study.
The specific online community selected for observation was Twitter. This is
because the Twitter platform is often used as a source of qualitative data by both
practitioners and academics (Williams, Terras, & Warwick, 2013). Tweets were
collected over a period of one month, using the search term coffee and its
variants latte, mocha, cappuccino, espresso and Americano, as well as the
terms flavour, aroma and caffeine. Care was taken to include multiple users
and to exclude tweets by manufacturers and retailers. Of the corpus of remaining
tweets, 200 were selected randomly to test the framework depicted in Figure 1.
In the research methods literature, accuracy is assessed by the extent to which
different researchers agree on the classification of a particular data object, that is the
rate of inter-coder agreement (Gwet, 2012). In the case of an automated tool, this
will be the extent to which the classification produced by the software matches that
of human coders. Therefore, to investigate the accuracy of automated sentiment
analysis, we used more than one software, and used the rate of inter-coder
agreement as a proxy for accuracy. Specifically, we checked the rate of agreement
between coders, between software products, and between coders and software
products.
Specifically, data were analysed manually and with two popular automated
sentiment analysis tools. Software number L was a commercial product offered by
the leading international provider of social medial analytics, and uses natural
language processing and adaptive learning techniques.1 Software T was a product
developed and commercialised by a leading international university and is based on
computational linguistics.2 Coding was done with a scheme that reflected polarity of
emotion (positive vs. negative). In addition, as advised by Koppel and Schler (2006),
comments that related to the product (i.e. coffee and its synonyms, as previously
described) but that did not express an emotion, were not excluded from the sample;
instead, they were coded as neutral.

1
Source: company website.
2
Source: company website.
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1147

Subsequently, data were analysed by type of emotion, because different emotions


produce different behavioural consequences (Laros & Steenkamp, 2005). This was
done manually and with software L described above, as well as two academic
products using semantic analysis techniques (specifically, rule-based and the M-C-
based sentic computing; for more on sentic computing, see Cambria & Hussain,
2012). Coding was done using Plutchiks (2001) wheel of emotions schema, which
identifies eight primary bipolar emotions.

Findings and discussion

This section presents the outcomes of the sentiment analysis conducted manually and
with the software products, before examining the reasons for the problems of
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

accuracy encountered in the exercise.

Agreement of classification of sentiment polarity and emotional state analysis


There was a high rate (89%) of inter-coder agreement between the two manual
coders. However, there were significant differences between the outcomes of the
manual vs. the automated approaches to sentiment analysis, as depicted in
Figure 2.
As Figure 2 shows, the three approaches (manual vs. software L vs. software T)
only delivered the same score in circa one-third of the cases (32%). In 7% of the cases
both software products produced the same outcome, but differed from the manual
analysis. In other words, the overall agreement rate between the two software
products was just under 40%. While it had been noted that automated tools are
blunt instruments to study sentiment (e.g. Cambria & Hussain, 2012), this was still a
surprising result, well below the rates typically reported in the commercial literature,
as discussed in Davis and OFlaherty (2012), and even taking into account the
challenges presented by social media data, as per Figure 1.

Figure 2 Extent of inter-coder agreement in the analysis of sentiment polarity.

Manual analysis
All approaches
and Software L
produce the
produce the
same outcome,
same outcome,
32%
24%

All approaches
produce
different
outcomes, 11%
Manual analysis The software
and Software T products
produce the produce the
same outcome, same outcome,
28% which differs
from manual
analysis, 7%
1148 Journal of Marketing Management, Volume 31

In 11% of the cases, each approach delivered a different score. In around half the
cases the outcome of the manual analysis mirrored that of one of the software
products but not the other. Hence, it cannot be said that one of the software
products is clearly superior to the other in terms of accuracy (using inter-coder
agreement rates, as a proxy), even though the software had such different origins.
On the contrary, both products have similar rates of manual vs. automated coding
agreement.
Overall, the manual analysis of tweets was most likely to result in a positive score.
One of the software products was most likely to produce negative scores, while the
other had a higher proportion of neutral classifications.
The discrepancy between manual and automated analysis was also evident in the
analysis of emotional states, though this tended to vary for specific states, as
illustrated in Figures 3 and 4. Again, there wasnt clear evidence of superiority of
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

one software over the others, contrary to what may be seen with other analytical
software as noted by Rose et al. (2014).
Agreement tended to occur around the positive emotion joy. In turn, differences
were particularly marked for the emotion surprise this was evident both between
manual analysis vs. automated, and between the various automated software, as
illustrated by the quote provided in Table 2.

Investigation of the causes for disagreement in coding


Further analysis sought to probe the factors related to the communication of
sentiment on social media and the factors related to the automated analysis of
sentiment, and how they impact on the classification of tweets.

Figure 3 Extent of inter-coder agreement in the analysis of emotional state manual


(pink, upper line) vs. software L (blue, lower line). (This figure is available in full
colour in the online version of the article.)

Line Chart

6.5

5.5
Sum(Manual), Sum(Lymbix)

4.5

3.5

2.5

1.5

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
(Row Number)
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1149

Figure 4 Extent of inter-coder agreement in the analysis of emotional state manual


(solid line) vs. rule-based sentic (dashed line) vs. M-C-based sentic (dotted line).

8
Sum(Manual), Sum(Sentic File Based, Sum(Sentic M/C based)

4
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
(Row Number)

Table 2 Example of disparity in emotional state classification.

Entry Manual L Rule M-C


12. This coffee shop needs to 4. Anger 3. Surprise 3. Surprise 2. Joy
change there music up every
once and a while. Or maybe I
should go home

As exemplified by the entries in Table 3, there were instances of agreement between


manual coding and two software products for all types of messages: neutral, positive or
negative. However, focusing on those messages where all types of coders agreed, it is
interesting to see that they are most likely to reflect positive emotions, as exemplified by
these tweets: Found a euro cent on my walk and have a great cup of coffee in hand.
Monday is already off to a good start (entry 18) and Feeling much more alive this
morning now that Ive had my coffee. Thank you #Nespresso (entry 28). Similarly,
emotions that were clearly positive, like joy, showed higher rates of agreement than
those that were neutral or negative (Table 2). By contrast, an example of a problematic
sentence is the entry Think I need an IV of caffeine today. So tired, courtesy of my
beautiful angelic children. . . (entry 99). The expressions courtesy and angelic were
used sarcastically in this expression, and the software disagreed on how to code it (software
T deemed it negative, whereas software L deemed it positive). These observations extend
previous findings (e.g. Vanden Bergh et al., 2011) noting that it is difficult to detect and
1150 Journal of Marketing Management, Volume 31

Table 3 Examples of messages where there was agreement between all coders.

Entry Comment Manual Software L Software T


Mommys making me a rough comment 1. Neutral 1. Neutral 1. Neutral
pot of coffee for the implies a negative
night ahead. #ohboy sentiment, but it
#thisisgonnaberough seems to apply to the
night ahead, not the
coffee.
I am doing several User is stating a fact 1. Neutral 1. Neutral 1. Neutral
things over the weeks and listing various
of Lent. Food, TV, items. No clear or
Internet, Caffeine, implicit expression of
Music, Sleep, Shopping emotion.
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

for Non-Essentials
Coffee and sunny The reference to sunny 2. Positive 2. Positive 2. Positive
skies. . . Life is good. skies and the use of
the term good
suggest a positive
emotion.
Doing some late night Use of expression 2. Positive 2. Positive 2. Positive
paper/laptop work . . . mighty good and
Hope to be done in the smiley face suggest
next few hours. lol . . . positive emotion.
Yep, a hot cup of coffee
sounds mighty good
now!;o)
i think coffee make Referring to negative 3. Negative 3. Negative 3. Negative
headache more worst side effect of coffee.
-.- get a tea or juice. Considers alternative
products.
Good morning! Hope Negative side effect of 3. Negative 3. Negative 3. Negative
everyone had a good drinking coffee.
nights sleep! I will Expresses intention to
never drink anything avoid coffee in the
with caffeine ever future.
again! I was up sick
half the night!

accurately code irony and sarcasm. Specifically, these findings indicate that, as far as Twitter
is concerned, there are challenges associated with the expression of neutral and negative
sentiments in general, as discussed next, not just those of a sarcastic nature (though these,
too, presented challenges).
The very small number of tweets where the software products agreed with each other
but the classification was different from the manual one were for text segments that were
very short, such as these examples: In uni. I think without this cup of coffee I would hulk
out (entry 69) or Cups and cups of coffee is whats going to keep me up at work tonight
(entry 36). Both of these extracts have less than 70 characters, and were deemed neutral by
the automated software but positive by the manual coders.
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1151

In terms of the text segments that caused the software to produce different
outcomes from each other, it was observed that problems tended to occur where
more than one sentiment was expressed, and more than one object was
mentioned, in a single segment of text, such as Its so cold and guess whats on
my room. . . Air conditioning. Seriously well off to work I go. Might have to stop
and get some coffee (entry 51) or The early shift sucks. Oh well at least my latte
is yummy:) (entry 19). In entry 19 we have two objects, namely the early shift
and the coffee drink; and two sentiments, a negative one towards the early shift
and a positive one towards the drink. It has been noted elsewhere (e.g. Liu, 2010)
that automated tools have difficulty in coding this type of messages. And this was
certainly the case in this sentence with software T, which failed to adjust to the
nuance of multiple objects and emotions, deeming this sentence as displaying a
negative sentiment.
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

Differences also arose when the expression of sentiment was implied rather
than explicit. In other words, when the segment of text itself did not contain
any of the keywords associated with emotion, but it was nonetheless rich in
meaning. For instance, this segment depicts coffee positively, as a reward even
though that word (or a synonym) was never mentioned: 100 copies of [product
title] sold overnight means a definite Starbucks run this morning. Possibly coffee
out twice this week! Maybe even sushi!! (entry 46). In this specific example,
software T attributed a negative score to the segment, and software L a neutral
one. Correctly classifying this text segment requires an understanding of the
context of the conversation, namely that selling that particular number of copies
overnight should be considered a positive event. It also requires an
understanding of the cultural meanings attached to coffee. For instance, in the
United Kingdom, drinking coffee is traditionally seen as an energy booster;
however, having coffee out is deemed a treat (Forsyth, 2011). This particular
entry mentions having sushi and coffee out twice in week, which the manual
analysts construed as being a very special treat. This level of contextual and
cultural understanding would be very difficult to programme in an algorithm. In
the example cited, software T deemed the sentence negative and software L
deemed it neutral.
Another factor that may cause problems in the classification of tweets is
subtlety, and concerns the case where the negativity emerges from the absence of
coffee. In such cases, the polarity of the text segment is negative, but it actually
expresses a positive attitude towards coffee, as in this case: how the heck am I
supposed to be able to sleep well without coffee in my system? fucking snow
(entry 31).
There were also problems caused by syntax and style, namely around the use of
abbreviations and slang. One example is the tweet: Having coffee with my grandma
before work right now. QT (entry 25). The abbreviation QT is used here instead of
the phrase Quality Time and expresses a positive emotion. Software T deemed it
positive, and software L neutral.
Finally, there were a small number of tweets that were picked up and classified by
the software because they contained the keyword coffee, but which were not
expressing any emotion towards the drink itself. For instance, the segment This
coffee shop needs to change there music up every once and a while. Or maybe I
should go home (Entry 12) expresses anger, which is a negative sentiment. However,
the consumer is referring to a place, not the coffee drink itself.
1152 Journal of Marketing Management, Volume 31

In summary, the software struggled to cope with some very short sentences,
which they tended to deem neutral, but not when these were clearly positive or
negative. The main issues arose where a negative sentiment was expressed but
this resulted from the absence of coffee and the entry was, thus, classified as
positive by the manual coders; where a negative or neutral sentiment was
expressed but this referred to a different object (e.g. shift work), not the
coffee, and there was a positive sentiment expressed towards the latter; and
where the positive sentiment is not explicitly expressed but rather implied
through cultural associations such as having coffee out, or through
abbreviations such as QT. These issues combined to make the overall
sentiment of the corpus of tweets more positive than initially considered.
Conversely, some sentences were changed into negative or neutral entries due
to the use of irony and sarcasm, or because they contained the word coffee and
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

expressed a sentiment, but did not refer to the drink itself.

Conclusions
Emotions are key to both explain and anticipate consumer behaviour, and sentiment
analysis offers marketers in academia and in industry a way of measuring and
summarising those emotions. Emotions displayed on social media conversations, in
particular, are very appealing for research, as these platforms offer many
opportunities to listen to the conversations in real time, with minimum disruption
for the individuals expressing those emotions and in a cost-effective way. Despite its
promise and popularity, the sentiment analysis of social media conversations is
neither a simple nor a straightforward process.
The research question in this study asked, What are the vulnerabilities related
to the processing and analysis of social media data concerning consumers
sentiment towards a product?. The framework in Figure 1 pointed to two
categories of vulnerabilities, both of which were present in our analysis. Unlike
with the collection and analysis of quantitative data for instance, studying the
correlation between two variables where there are well-established standards
for both processing and analysing data, the study of emotions is embedded in
nuances and subjectivity. There are many words that can be associated with any
given emotion; some words that can be associated with more than one emotion
and, as our study showed, it is also possible to communicate emotion without
using emotionally charged words. These challenges are accentuated by the fact
that the segments of text available on social media are very short, rich in
abbreviations and slang, and often with typos or grammatical errors.
In this study, not only were multiple types of software used, but also software
products from both a commercial and an academic origin were employed. There
were no marked differences in performance between the various products, indicating
that this is not a failure of one product or the other but, rather, a challenge presented
by the subject matter (emotions and sentiments) and by the channel, with its technical
limitations and very specific culture and netiquette.
The impact of the characteristics of the researcher and, in particular, his or her use
of social media platforms also influences data selection and analyses (Murthy, 2008).
For instance, in this case, the researchers used British English to formulate their
search queries (e.g. flavour), unconsciously leaving out the American spelling of the
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1153

word (namely, flavor). Likewise, due to their age, they may have failed to capture or
decode particular spellings or abbreviations and, indeed, sarcasm.
These vulnerabilities have a number of effects on the use of automated tools
to analyse sentiment in online conversations. The first effect was that the
problems with classification of tweets led to an inaccurate representation of
the overall sentiment towards coffee, both in terms of sentiment polarity and
in terms of emotional state. The second effect was that segments that should
have been excluded from the analysis because they did not relate to the topic
under analysis coffee were retained in the corpus of data, possible
skewing the results. Given that so many commercial and, increasingly,
academic research projects rely on the automated analysis of sentiment data,
these findings raise concerns for the quality of those insights and subsequent
decisions.
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

These results are very concerning given the popularity of automated


sentiment analysis in consumer behaviour research. It is concerning for
academics, particularly the novice user, who may be too reliant on these tools
to analyse large volumes of consumer data. One of the reasons why using
qualitative data analysis software may improve the credibility of a qualitative
study is that the software enables researchers to make visible their data coding
and data analysis processes (Rademaker, Grace, & Curda, 2012). This is not the
case with most automated sentiment analysis tools, given that the coding and
analysis process is performed by algorithms strongly guarded by the commercial
organisations that sell these applications (Beer & Burrows, 2013). It is even
more concerning for practitioners in search of speedy and inexpensive customer
insight and who are unlikely to assess the robustness of the automated tools, as
we did in this study.
The findings from this study have important implications for consumer behaviour
research in academia and in the industry. One likely impact of the low inter-coder
agreement rates observed in this study is that they may discourage researchers from
using sentiment analysis software in their work, which may negatively impact their
ability to use large data sets, or to see their work accepted and recognised.
Alternatively, it may discourage some researchers from using social media data
altogether in their research, which would be a great loss for the development of
the discipline of consumer behaviour.
To improve the classification of tweets, sentiment analysis needs to take into
consideration the social context within which the conversation takes place, for
instance analysts need to look at tweets before or after the one being coded, or
consider wider patterns (e.g. more negative tweets on Mondays). Moreover,
analysts need to consider the cultural connotations of the object that they are
studying, including international variations for instance, in Japan the
consumption of coffee is associated with the idea of foreignness (Grinshpun,
2014), whereas this is no longer the case in the United Kingdom (Forsyth,
2011). Additionally, it is important to keep developing dictionaries that reflect
the specific syntax and style used in social media conversations, or even software
solutions that, in the first stage of analysis, replace commonly used abbreviations
with their formal equivalent for instance, replacing BRB with be right
back. However, it must be recognised that as language and communication
styles are constantly evolving, these dictionaries and tools will never
completely reflect the full variations and nuances in social media
1154 Journal of Marketing Management, Volume 31

communication. Moreover, they will struggle to capture sarcasm and highly


contextualised uses of language for instance, teenagers using the term sick
to refer to a very good experience.
It needs to be emphasised that software is constantly being updated and
improved, and that some of the problems highlighted here might have been
addressed in versions of the software released after the time when this study was
conducted. For instance, there is more data from which the software can learn
from, dictionaries can be improved and new techniques can be implemented.
This study does not aim to discourage researchers from using automated
sentiment analysis tools in general, or the ones that we mentioned here in
particular. Instead, our message is that researchers need to spend considerable
time familiarising themselves with the technical and pragmatic aspects of
communication in the social environment, and with the characteristics and
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

limitations of the software that they may use to analyse social media data.
Social media offers a window into consumers minds and holds much promise for
the development of consumer behaviour research. In particular, the analysis of social
media conversations offers many advantages over alternative methods to study
consumers emotions. However, as this research showed, the researchers ability to
accurately identify the sentiment expressed in a tweet or other similar short textual
data extract is limited by how emotions are verbalised, and the contextual nature of
the communication of emotions. Moreover, while automated tools may be effective
at processing large volumes of data, the lack of sophistication and contextual
awareness of those tools, plus the biases introduced by the researchers themselves,
reduce such tools ability to accurately identify sentiment polarity or emotional state.
As with any other automated data analysis tool, researchers need to carefully assess
the suitability of sentiment analysis software for their projects, and understand their
limitations.

Disclosure statement
No potential conflict of interest was reported by the authors.

References
Abbasi, A., Chen, H., & Salem, A. (2008). Sentiment analysis in multiple languages: Feature
selection for opinion classification in web forums. ACM Transactions on Information
Systems, 26(3), 134. doi:10.1145/1361684.1361685
Baker, S. (2009, May 21). Learning, and profiting, from online friendships. Bloomberg
Businessweek Magazine.
Basit, T. N. (2003). Manual or electronic? The role of coding in qualitative data analysis.
Educational Research, 45(2), 143154. doi:10.1080/0013188032000133548
Beer, D., & Burrows, R. (2013). Popular culture, digital archives and the new social life of data.
Theory, Culture & Society, 30(4), 4771. doi:10.1177/0263276413476542
Brown, D., Taylor, C., Baldy, R., Edwards, G., & Oppenheimer, E. (1990). Computers and
QDA - can they help it? A report on a qualitative data analysis programme. The Sociological
Review, 38(1), 134150. doi:10.1111/j.1467-954X.1990.tb00850.x
Bushman, B., Baumeister, R. F., & Phillips, C. M. (2001). Do people aggress to improve
their mood? Catharsis beliefs, affect regulation opportunity, and aggressive responding.
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1155

Journal of Personality and Social Psychology, 81(1), 1732. doi:10.1037/0022-


3514.81.1.17
Cambria, E., & Hussain, A. (2012). Sentic computing: Techniques, tools, and applications.
Dordrecht: Springer. ISBN 978-94-007-5070-8
Carson, E. (2014, June 18). Sentiment analysis: Understanding customers who dont mean
what they say. TechRepublic.
Casteleyn, J., Mottart, A., & Rutten, K. (2009). How to use Facebook in your market research.
International Journal of Market Research, 51(4), 439447. doi:10.2501/
S1470785309200669
Christiansen, L. (2011). Personal privacy and internet marketing: An impossible conflict or a
marriage made in heaven? Business Horizons, 54(6), 509514. doi:10.1016/j.
bushor.2011.06.002
Cohen, J. B., Pham, M. T., & Andrade, E. B. (2008). The nature and role of affect in consumer
behavior. In C. P. Haugtvedt, P. Herr, & F. Kardes (Eds.), Handbook of consumer
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

psychology (pp. 297348). Mahwah, NJ: Lawrence Erlbaum.


Cooke, M., & Buckley, N. (2008). Web 2.0, Social networks and the future of market research.
International Journal of Market Research, 50(2), 267292.
Dahl, S. (2015). Social media marketing - theories & applications. London: Sage.
Davis, J. J., & OFlaherty, S. (2012). Assessing the accuracy of automated twitter sentiment
coding. Academy of Marketing Studies Journal, 16(Suppl.), 3550.
Forgas, J. P. (1991). Affective influences on partner choice: Role of mood in social decisions.
Journal of Personality and Social Psychology, 61(5), 708720. doi:10.1037/0022-
3514.61.5.708
Forsyth, J. (2011). Coffee - UK (pp. 169). London: Mintel.
Grinshpun, H. (2014). Deconstructing a global commodity: Coffee, culture, and
consumption in Japan. Journal of Consumer Culture, 14(3), 343364. doi:10.1177/
1469540513488405
Gwet, K. L. (2012). Handbook of inter-rater reliability. Gaithersburg, MD: StatAxis Publishing
Company.
Halford, S., Pope, C., & Weal, M. (2013). Digital futures? Sociological challenges and
opportunities in the emergent semantic web. Sociology, 47(1), 173189. doi:10.1177/
0038038512453798
Jansen, B. J., Zhang, M., Sobel, K., & Chowdury, A. (2009). Twitter power: Tweets as
electronic word of mouth. Journal of the American Society for Information Science and
Technology, 60(11), 21692188. doi:10.1002/asi.21149
Johnson, E. J., & Tversky, A. (1983). Affect, generalization, and the perception of risk. Journal
of Personality and Social Psychology, 45(1), 2031. doi:10.1037/0022-3514.45.1.20
Kietzmann, J. H., Hermkens, K., McCarthy, I. P., & Silvestre, B. S. (2011). Social media? Get
serious! Understanding the functional building blocks of social media. Business Horizons, 54
(3), 241251. doi:10.1016/j.bushor.2011.01.005
Kim, S.-M., & Hovy, E. (2006, July). Automatic identification of pro and con reasons in online
reviews. Paper presented at the COLING/ACL, Sydney.
Kivran-Swaine, F., Brody, S., Diakopoulos, N., & Naaman, M. (2012, May). Of joy and
gender: Emotional expression in online social networks. In Companion proceedings of
ACM CSCW12 conference on Computer Supported Cooperative Work (pp. 139142).
New York, NY: ACM.
Koppel, M., & Schler, J. (2006). The importance of neutral examples for learning sentiment.
Computational Intelligence, 22(2), 100109. doi:10.1111/coin.2006.22.issue-2
Kozinets, R. V. (2002). The field behind the screen: using netnography for marketing research
in online communities. Journal of Marketing Research, 39(1), 6172. doi:10.1509/
jmkr.39.1.61.18935
1156 Journal of Marketing Management, Volume 31

Lage, M. C., & Godoy, A. S. (2008). Computer-aided qualitative data analysis: Emerging
questions. RAM. Revista De Administrao Mackenzie, 9(4), 7598. doi:10.1590/S1678-
69712008000400006
Laros, F. J. M., & Steenkamp, J.-B. E. M. (2005). Emotions in consumer behavior: A
hierarchical approach. Journal of Business Research, 58(10), 14371445. doi:10.1016/j.
jbusres.2003.09.013
Liu, B. (2010). Sentiment analysis and subjectivity. In N. Indurkhya & F. J. Damerau (Eds.),
Handbook of natural language processing (pp. 627666). Boca Raton, FL: Taylor &
Francis.
Loewenstein, G., & Lerner, J. S. (2003). The role of affect in decision making. In R. J.
Davidson, K. R. Scherer, & H. H. Goldsmith (Eds.), Handbook of affective sciences (pp.
619642). Oxford: Oxford University Press.
Murthy, D. (2008). Digital ethnography: An examination of the use of new technologies for
social research. Sociology, 42(5), 837855. doi:10.1177/0038038508094565
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

Nunan, D., & Domenico, M. D. (2013). Market research and the ethics of big data.
International Journal of Market Research, 55(4), 505520.
Patterson, A. (2012). Social-networkers of the world, unite and take over: A meta-introspective
perspective on the Facebook brand. Journal of Business Research, 65(4), 527534.
doi:10.1016/j.jbusres.2011.02.032
Plutchik, R. (2001). The nature of emotions. American Scientist, 89(4), 344350.
Rademaker, L., Grace, E., & Curda, S. (2012). Using computer-assisted qualitative data
analysis software (CAQDAS) to re-examine traditionally analyzed data: Expanding
our understanding of the data and of ourselves as scholars. Qualitative Report, 17
(43), 111.
Rose, S., Spinks, N., & Canhoto, A. I. (2014). Management research - applying the principles.
London: Routledge.
Ryan, M. E. (2009). Making visible the coding process: Using qualitative data software in a
post-structural study. Issues in Educational Research, 19(2), 142161.
Schreier, M. (2012). Qualitative content analysis in practice. London: Sage.
Sterne, J. (2010). Social media analytics: Effective tools for building, interpreting, and using
metrics. London: Wiley.
Thelwall, M., Buckley, K., & Paltoglou, G. (2011). Sentiment in Twitter events. Journal of the
American Society for Information Science and Technology, 62(2), 406418. doi:10.1002/
asi.21462
Thet, T. T., Na, J.-C., & Khoo, C. S. G. (2010). Aspect-based sentiment analysis of movie
reviews on discussion boards. Journal of Information Science, 36(6), 823848. doi:10.1177/
0165551510388123
Uprichard, E. (2013). Describing description (and keeping causality): The case of academic
articles on food and eating. Sociology, 47(2), 368382. doi:10.1177/
0038038512441279
Vanden Bergh, B. G., Lee, M., Quilliam, E. T., & Hove, T. (2011). The multidimensional
nature and brand impact of user-generated ad parodies in social media. International
Journal of Advertising, 30(1), 103131. doi:10.2501/IJA-30-1-103-131
Williams, S. A., Terras, M., & Warwick, C. (2013). What do people study when they study
Twitter? Classifying Twitter related academic papers. Journal of Documentation, 69(3),
384410. doi:10.1108/JD-03-2012-0027
Canhoto and Padmanabhan Automated vs. manual analysis of social media conversations 1157

About the authors


Ana Isabel Canhoto is Senior Lecturer in Marketing at Oxford Brookes University and
Programme Lead of the MSc Marketing. She researches, writes and advises organisations on
how to identify and manage difficult customers, and terminate bad commercial relationships.
She is also interested in the use of social media to build customer profiles. Prior to joining
academia, she worked as a management consultant in the telecommunications industry and as a
portfolio manager at a leading media and entertainment company, among others.
Corresponding author: Ana Isabel Canhoto, Oxford Brookes University, Wheatley Campus,
Wheatley, Oxford OX33 1HX, England.
T +44 (0)1865 485858
E adomingos-canhoto@brookes.ac.uk
Downloaded by [University of Veracruzana] at 16:51 06 November 2015

Yuvraj Padmanabhan is the managing director of Mindgraph, with a special expertise in social
media and sentiment analysis.

Vous aimerez peut-être aussi