Vous êtes sur la page 1sur 4

POSTNOTE

Number 460 March 2014

Social Media and Big Data


Overview

Analysing large quantities of readily available


data from social media has created new
opportunities to understand and influence how
people think and act. This POSTnote examines
the application of big data approaches to
social media in three key areas: elections and
polling, commercial applications and security. It
also covers the regulation of social media data
and public concerns around privacy.

Background
Social media platforms are websites and applications or
apps (software programs) that enable people to interact,
create, share and exchange information (see Box 1). Every
day people around the world post 400 million tweets on
Twitter, add 350 million photos to Facebook and view 4
billion videos on YouTube.1 57% of over-16s in the UK use
some form of social media, generating vast quantities of
data.2 This has prompted the development of new technical
and methodological (big data) approaches to capture,
process and analyse large and complex data (see
forthcoming POSTnote on Big Data).
Big data approaches to analysing social media data can
increase understanding of how people think and act.
Organisations can use this information to inform their
activities, improve decision-making, target products and
services more effectively, and to try to influence users
behaviours in the future. In 2012, the Government allocated
189 million for research in big data, with a further 73
million announced in February 2014. The Economic and
Social Research Council is investing 64 million in the Big
Data Network, which includes funding to facilitate access to
social media data and further research on its use.

57% of over-16s in the UK use social media,


generating vast amounts of accessible data.
Analysing social media data can help
organisations to understand behaviours and
target products and services more effectively.
Key applications include profiling voters and
complementing traditional polling, targeting
adverts at consumers, credit scoring and
informing policing decisions.
There is debate about how to analyse social
media data, including which methods to use
and how to control for biases.
Personal data can be shared or sold with
users consent, or if anonymised.
There are concerns that users are not fully
aware of how their data are being used and
that it is often possible to identify individuals
from linking anonymised datasets.
This POSTnote examines:
the collection and analysis of social media data
key applications of social media data analysis
data regulation and public concerns around privacy.

Collecting Social Media Data


Access to large quantities of readily-available data on
millions of peoples activities and behaviours are a highly
valuable resource for researchers and organisations.
Further, social media data are not created for the purpose of
research, so they can offer insight into the way people
naturally interact online. Data can be automatically extracted
from social media sites via Application Programming
Interfaces (Box 2).
Box 1. Examples of Social Media Platforms
Facebook: a social networking service that allows users to create
personal profiles. Interactions take the form of posts, and users can
indicate their preferences for user-generated content, articles,
products and services through a Like function. Other similar services
include Google+ and LinkedIn.
YouTube: allows users to upload, view and comment on videos.
Twitter: a microblogging service that allows users to broadcast and
read tweets of up to 140 characters.
Flickr: allows users to upload, view and comment on photos. Other
photo and image sharing sites include Instagram and Pinterest.

The Parliamentary Office of Science and Technology, 7 Millbank, London SW1P 3JA T 020 7219 2840 E post@parliament.uk www.parliament.uk/post

POSTnote March 2014 Social Media and Big Data

Box 2. Application Programming Interfaces (APIs)


APIs are instructions and tools for managing interactions between
different software, and can be used to automatically extract social media
data. Some social media platforms provide APIs for free, but impose
restrictions on the amount or type of data that can be accessed. A small
number of companies, most notably DataSift and Gnip in the UK, have
deals with social media companies to access greater quantities of data
through APIs, such as the full public Twitter feed. These are then sold on
to third parties, with prices starting from around 2,000 per month. APIs
can also be used to access some private data.
What data can be accessed for research or commercial
purposes depends in part on whether users have
designated their content as public (it can be viewed by
anyone) or private (the user specifies who can view it).
Public data. This includes most Twitter content. Anyone
can access publicly available data from social media
platforms subject to terms and conditions.
Private data. Facebook content is often private. Social
media companies may transfer data that have been
designated private to third parties if users have given their
consent, or if the data are anonymised (identifying
information has been removed).

Analysing Social Media Data


The rate of unstructured data production on social media
makes it difficult to analyse using traditional methods that
rely on human analysts. Social media analytics is a new
field of study that is developing automated or semiautomated methods for analysing data. One prominent
technique is called sentiment analysis (see Box 3). This can
be a useful tool to assess public reaction to a particular
event, such as a protest or TV show. However, outside of
specific contexts the insights that can be drawn from this
technique are currently limited. Research is underway to
improve the technology and apply it to wider settings. One
example is the WeGov project, funded by the European
Commission, which is building tools to analyse responses to
government policies on social media.

Methodological Issues
Some advocates of big data argue that the sheer size of the
datasets reduces, or even eliminates, the need for
established statistical methods such as random sampling,
3
because all the data can be analysed. However, in the
case of social media data, it only contains data about people
that use social media. In the UK, around 49% of the
population use Facebook and 24% use Twitter and not all
4
users create content. There are concerns that social media
data may not represent vulnerable groups in society, such
as the elderly or those from lower income backgrounds. This
means that there are significant gaps in the data, and there
are not yet accepted methods for controlling for biases.
There is also debate as to whether social media data should
be viewed and analysed as quantitative or qualitative data.
Many researchers have argued that greater methodological
clarity and consistency are needed for analysing large
quantities of unstructured and complex data.

Page 2

Box 3. Sentiment Analysis


Sentiment analysis uses natural language processing techniques to
read and attribute meaning to textual information, such as whether the
author felt positive, negative or neutral. It can provide broad insights
on public reactions to a particular event in ways that have not
previously been possible. The two main methods are:
The list or corpus approach, where software is used to search for
particular words, which are considered to have positive or negative
meanings. This is the quickest and least labour intensive approach.
The machine learning approach, which builds on the list approach
using machine learning algorithms. This involves a human analyst
manually indicating how the programme should interpret the use of
specific terms or phrases in different contexts using examples of
text. This requires more human input, but produces more accurate
results, for example by detecting humour or irony.

Key Areas of Application


The following examples highlight some key applications in
politics, commerce and policing. Other key areas include
finance and the charitable sector, but because of space
constraints these are not discussed here.

Polling
Sentiment analysis of Twitter has been used to reveal
insights in real-time of users political opinions. During the
2010 UK General Election, this technique was used to
create a visual display of Twitter users reactions to
televised political debates.5 During the 2012 Presidential
Election in the USA, the political news site Politico used
sentiment analysis to examine large volumes of public and
private data on Facebook as a complement to traditional
methods of polling. However, there have been doubts
expressed as to how useful analysis of the often
spontaneous, emotional content of social media is for
predicting the potentially more calculated decisions of voters
during elections. There have also been concerns raised
about privacy in the context of the use of aggregated social
media data about sensitive topics (see Regulation below).

Electioneering
Big data analysis played a substantial role in guiding the
election strategy of the Obama 2012 campaign in the US.
The campaign had a large data analytics team, who used
data from social media (including Facebook and Twitter),
alongside data from their own party database, which
included information on approximately 180 million voters. By
looking for correlations in past voter characteristics and
behaviour, they were able to build up profiles of the kinds of
people who might vote for them, and to target resources
more efficiently. For example, TV adverts were broadcast
when they were known to have the most impact with the
targeted swing voters, rather than in premium generic
prime-time slots. Analytics was also used to determine
which households to target door-to-door. These approaches
have not yet been taken up to the same extent in the UK,
although they are likely to become more prominent in the
2015 general election. However, differences in data
regulation and campaign spending may affect how widely
social media data analysis is used in UK politics.

POSTnote March 2014 Social Media and Big Data

Marketing
In marketing, one of the most significant applications of
social media data is through retargeting. This is where
advertising companies mark or tag online users when they
visit a certain brand or company website, and re-advertise
only to the people who have shown some interest in the
brand. Retargeting potential consumers is a large market,
as it provides an opportunity to target the 98% of visits to
retailers websites that do not result in a purchase. While
this is an established practice, applying big data analytics to
social media data on consumers has the potential to make it
more sophisticated (Box 4). These new personalised
retargeting techniques create tailored adverts in real-time
by comparing individuals specific browsing history and
social media profile with aggregated data on what
consumers with similar profiles have purchased. Sales data
from Amazon can be linked to social media data to create a
better idea of what people like them have bought in the past.
By tracking and profiling consumers in this way, advertisers
can potentially increase revenue without bombarding
customers with unwanted marketing.

Credit Scoring
Traditionally, credit agencies have decided whether to grant
a loan on the basis of scores derived primarily from financial
information, such as loan repayment histories. Some
individuals, especially those with lower incomes, are
excluded from credit by these metrics. Recently, a new
generation of loan companies has emerged, offering shortterm, high interest payday loans with no traditional credit
check. These companies use alternative means of judging
creditworthiness. For example, Wonga in the UK, LendUp in
the US and Kreditech in Germany, gather thousands of
pieces of information about individuals, including social
media data. These are used to determine whether to grant
an individual a loan, the maximum size of loan, and the rate
of interest. Payday loan companies and their advocates
argue that such techniques allow them to make the benefits
of credit more widely available to those rejected by
traditional credit checks. However, critics are concerned that
judging people on data they might assume is private, such
as who they are friends with online, sets a precedent that
may be followed in other sensitive areas, such as insurance.

Policing Public Demonstrations and Events


Social media data are increasingly being used as a new
source of intelligence for policing public demonstrations or
events. Social Media Intelligence, or SOCMINT, can help
to assess threats to public order and safety, for example by
providing indications of whether violence is likely to occur.
Potential benefits include better informed policing decisions,
such as more accurate estimates of the number of officers
required for ensuring public safety. Using social media data
to inform policing is regulated under the Data Protection Act
1998 (see below). It can also be subject to a number of
other laws such as the Human Rights Act 1998 and the
Regulation of Investigatory Powers Act (RIPA) 2000.
However, these were passed before the mainstream use of
social media and some commentators, such as the thinktank Demos and lobby group Big Brother Watch, have

Page 3

argued that more up-to-date legislation may be required to


ensure that SOCMINT does not infringe on civil liberties.6
Others have expressed concerns that police monitoring of
social media is not sufficiently open or transparent.
However, attempts to update RIPA have been controversial
(see POSTnote 436, Monitoring Internet Communications).

Regulating Social Media Data


The Data Protection Act 1998 (DPA) implements the EU
Data Protection Directive 95/46/EC. It regulates the
processing of personal data through restrictions on how
such data including social media data can be recorded,
stored, altered, used or disclosed. Under the DPA, personal
data means data related to a living individual who can be
identified, either directly or indirectly, from the data, or from
other information held by the same organisation. There are
concerns over the effectiveness of the DPA. The European
Commission proposed a reform of the EUs data protection
rules in January 2012, to take account of new technologies
and the changing ways that personal data are being used.
The draft European General Data Protection Regulation
(GDPR), is currently being debated in the European
Parliament and could introduce more stringent regulation of
social media data. However there is significant political
disagreement about the Regulation and it is unclear whether
it will proceed in its current form. Key concerns about data
protection and privacy are outlined below.

Giving Consent
Under the DPA, individuals must give their consent for their
personal data to be processed by an organisation, both at
the stage of initial registration for a social media service,
and for any subsequent changes to the terms of use of the
data. There are two main ways of obtaining consent online:
Explicit. This requires users to consent to their data
being processed and stored, usually by clicking accept
on a terms and conditions page in order to continue
browsing or using an application. Users are usually
required to provide explicit consent when initially
registering for a social media service. Political opinions
are classed as sensitive data under the DPA, which
means users must provide explicit consent.
Implicit. This requires companies to inform users that
their personal data may be processed if they continue to
browse a website or use a service. If they proceed, it is
implied that they consent. Users consent to changes in
the terms and conditions of use of their data is usually
implied; however the GDPR is likely to require explicit
consent for changes to terms.
Box 4. Tracking Online Behaviour through Social Media
Some social media companies offer third-party authentication services,
such as Facebook Connect and Google+ Sign-in, which allow users to
register with other websites through their social media account. Using
these services means that users specific browsing behaviours can be
tracked using software tracking devices called cookies and pixels,
which are installed on computers when users visit certain websites.
Social media platforms can then link users online activities outside of the
social media platform to the data from their personal profiles. It can also
allow other websites greater access to the users social media data.

POSTnote March 2014 Social Media and Big Data

The Open Rights Group (ORG) has expressed concerns


that users of social media do not always read or fully
understand the terms and conditions when they share data
online. For example, users may inadvertently grant
permission to third parties accessing their private data, in
return for access to free apps. ORG has argued that this is
because the terms are often intentionally long, complicated
or difficult to read, and do not highlight key points. There is
also controversy surrounding the frequent changes made to
terms of use. Further, research suggests that social media
users often think of their data as public or private, but this
is not a distinction that is made by the DPA. The US Federal
Trade Commission (FTC) has handled numerous
investigations into privacy violations, mostly concerning
claims that some platforms misled their users into believing
that their profiles were more private than they actually were.
For example, Facebook is now subject to a 20-year consent
decree, monitored by the FTC, which requires them to get
explicit consent from users before implementing any
changes that would alter their privacy settings.

Withdrawing Consent
Under the DPA, it must be possible for the individual to
withdraw consent for their data to be processed or stored.
However, in practice this is not always straightforward to do.
Currently, Article 17 of the draft GDPR would give EU
citizens the Right to Erasure. If included in the final
Regulation, this might require websites that store data on
users to provide an easy option for consumers to have all
data on them erased. Potentially, this would enable users to
erase all data that a social network held on them at the click
of a button. Supporters of this clause argue that it would
provide a clear mechanism through which individuals could
withdraw consent if they believed that their privacy was
being infringed. However, internet companies, the UK
Government and the Information Commissioners Office
(ICO), which oversees the DPA, have expressed concern
that all aspects of this requirement could be difficult or
impossible to meet in practice because data from social
networks may have already been passed on or sold
numerous times, or used to create new datasets.

Re-Identification
The DPA does not apply to anonymised data; personal
data which has had obvious identifiers (like name, date of
birth or email address), and non-obvious identifiers (such as
computer IP addresses), removed. Anonymised data can be
distributed or sold without users consent. Data are classed
as anonymised under the DPA if all reasonable efforts
have been taken by the organisation holding the data (the
data controller) to ensure that it cannot be re-identified.
However, research has indicated that it is possible to reidentify individuals in large datasets through crosscorrelation with other databases.7 This is a particular
concern when multiple anonymised datasets containing the
same individuals are available to third parties, as they can
be matched up, or overlaid with one another, to re-identify
individuals. Anonymised Google search histories, Facebook
Friends lists and Netflix film ratings have all been shown to
be re-identifiable. This may permit far greater intrusions into

Page 4

privacy than most people realise. Given these concerns,


some privacy advocates and lawyers have argued that the
definition of reasonable in the DPA is insufficient and that
anonymised data should be regulated in a similar way to
identifiable personal data. The ICO has produced guidelines
to help organisations and companies assess the risks of
anonymisation and identification of individuals and comply
8
with the DPA.

Attitudes towards Personal Data


A survey conducted in June 2013 by ComRes found that
68% of UK respondents were concerned about their
personal privacy online. In the context of social media,
which involves the routine sharing of personal information,
some research suggests that users understandings of
privacy are changing, and that concerns are centred on who
can see their personal information and how it is used. Social
media companies grant users access to their sites for free;
however, it has been estimated that Facebook and Google
make between $5 and $20 annually per user, and that each
users data could be worth as much as $1,200 per year to
the wider advertising economy. This has led some
researchers and privacy advocates to raise concerns that
social media data are being considered as a resource to be
harvested, rather than as an expression of peoples
identities. They argue that users should be given greater
control over their personal data and a share of the profits
being generated from its commercialisation.
Some users have directly responded to personal data
concerns by leaving, or threatening to leave, mainstream
platforms. For example, Instagram reversed a change in its
terms and conditions after a backlash from users who were
concerned that this could have allowed uploaded pictures to
be sold to advertisers. A new generation of social media
services, such as Snapchat, which automatically delete
content shortly after it is viewed, are becoming more popular
among younger users. Some researchers, however, have
argued that as social lives become increasingly dependent
on social media, opting out of a mainstream service
completely and losing their online data profile is considered
to be too high a price for some users. Data privacy
advocates have argued that users should be enabled to
easily transfer their data between social media sites to
create a market for privacy. However, although some
services allow users to extract some of their personal data,
customer lock-in is a key aspect of many social media
companies business strategies, making them unlikely to
support this. A number of companies, like Mydex, Ctrlio and
Handshake, are developing services to let users securely
store their personal data and choose what to keep private
and what to allow access to in return for money or offers.
Endnotes
1 Social Media Today (2013). Social Media in 2013.
2 Office of National Statistics (2013). Social Networking.
3 Cukier, K. and Mayer-Schonberger, V. (2013). Big Data, Hachette.
4 McGrory, R. (2014). UK Social Media Statistics 2014.
5 Anstead, N. and O'Loughlin, B. (2012). Semantic polling, LSE.
6 Omand, D., et al. (2012). #Intelligence, DEMOS.
7 Ohm, P. (2010). Broken promises of privacy, UCLA Law Review, 57.
8 Information Commissioners Office (2012). Anonymisation Code of Practice.

POST is an office of both Houses of Parliament, charged with providing independent and balanced analysis of policy issues that have a basis in science and technology.
POST is grateful to Benjamin Taylor for researching this briefing, to the AHRC for funding his parliamentary fellowship, and to all contributors and reviewers. For further
information on this subject, please contact the co-author, Dr Abbi Hobbs. Parliamentary Copyright 2014. Image copyright istockphoto.com.

Vous aimerez peut-être aussi