Académique Documents
Professionnel Documents
Culture Documents
Background
Social media platforms are websites and applications or
apps (software programs) that enable people to interact,
create, share and exchange information (see Box 1). Every
day people around the world post 400 million tweets on
Twitter, add 350 million photos to Facebook and view 4
billion videos on YouTube.1 57% of over-16s in the UK use
some form of social media, generating vast quantities of
data.2 This has prompted the development of new technical
and methodological (big data) approaches to capture,
process and analyse large and complex data (see
forthcoming POSTnote on Big Data).
Big data approaches to analysing social media data can
increase understanding of how people think and act.
Organisations can use this information to inform their
activities, improve decision-making, target products and
services more effectively, and to try to influence users
behaviours in the future. In 2012, the Government allocated
189 million for research in big data, with a further 73
million announced in February 2014. The Economic and
Social Research Council is investing 64 million in the Big
Data Network, which includes funding to facilitate access to
social media data and further research on its use.
The Parliamentary Office of Science and Technology, 7 Millbank, London SW1P 3JA T 020 7219 2840 E post@parliament.uk www.parliament.uk/post
Methodological Issues
Some advocates of big data argue that the sheer size of the
datasets reduces, or even eliminates, the need for
established statistical methods such as random sampling,
3
because all the data can be analysed. However, in the
case of social media data, it only contains data about people
that use social media. In the UK, around 49% of the
population use Facebook and 24% use Twitter and not all
4
users create content. There are concerns that social media
data may not represent vulnerable groups in society, such
as the elderly or those from lower income backgrounds. This
means that there are significant gaps in the data, and there
are not yet accepted methods for controlling for biases.
There is also debate as to whether social media data should
be viewed and analysed as quantitative or qualitative data.
Many researchers have argued that greater methodological
clarity and consistency are needed for analysing large
quantities of unstructured and complex data.
Page 2
Polling
Sentiment analysis of Twitter has been used to reveal
insights in real-time of users political opinions. During the
2010 UK General Election, this technique was used to
create a visual display of Twitter users reactions to
televised political debates.5 During the 2012 Presidential
Election in the USA, the political news site Politico used
sentiment analysis to examine large volumes of public and
private data on Facebook as a complement to traditional
methods of polling. However, there have been doubts
expressed as to how useful analysis of the often
spontaneous, emotional content of social media is for
predicting the potentially more calculated decisions of voters
during elections. There have also been concerns raised
about privacy in the context of the use of aggregated social
media data about sensitive topics (see Regulation below).
Electioneering
Big data analysis played a substantial role in guiding the
election strategy of the Obama 2012 campaign in the US.
The campaign had a large data analytics team, who used
data from social media (including Facebook and Twitter),
alongside data from their own party database, which
included information on approximately 180 million voters. By
looking for correlations in past voter characteristics and
behaviour, they were able to build up profiles of the kinds of
people who might vote for them, and to target resources
more efficiently. For example, TV adverts were broadcast
when they were known to have the most impact with the
targeted swing voters, rather than in premium generic
prime-time slots. Analytics was also used to determine
which households to target door-to-door. These approaches
have not yet been taken up to the same extent in the UK,
although they are likely to become more prominent in the
2015 general election. However, differences in data
regulation and campaign spending may affect how widely
social media data analysis is used in UK politics.
Marketing
In marketing, one of the most significant applications of
social media data is through retargeting. This is where
advertising companies mark or tag online users when they
visit a certain brand or company website, and re-advertise
only to the people who have shown some interest in the
brand. Retargeting potential consumers is a large market,
as it provides an opportunity to target the 98% of visits to
retailers websites that do not result in a purchase. While
this is an established practice, applying big data analytics to
social media data on consumers has the potential to make it
more sophisticated (Box 4). These new personalised
retargeting techniques create tailored adverts in real-time
by comparing individuals specific browsing history and
social media profile with aggregated data on what
consumers with similar profiles have purchased. Sales data
from Amazon can be linked to social media data to create a
better idea of what people like them have bought in the past.
By tracking and profiling consumers in this way, advertisers
can potentially increase revenue without bombarding
customers with unwanted marketing.
Credit Scoring
Traditionally, credit agencies have decided whether to grant
a loan on the basis of scores derived primarily from financial
information, such as loan repayment histories. Some
individuals, especially those with lower incomes, are
excluded from credit by these metrics. Recently, a new
generation of loan companies has emerged, offering shortterm, high interest payday loans with no traditional credit
check. These companies use alternative means of judging
creditworthiness. For example, Wonga in the UK, LendUp in
the US and Kreditech in Germany, gather thousands of
pieces of information about individuals, including social
media data. These are used to determine whether to grant
an individual a loan, the maximum size of loan, and the rate
of interest. Payday loan companies and their advocates
argue that such techniques allow them to make the benefits
of credit more widely available to those rejected by
traditional credit checks. However, critics are concerned that
judging people on data they might assume is private, such
as who they are friends with online, sets a precedent that
may be followed in other sensitive areas, such as insurance.
Page 3
Giving Consent
Under the DPA, individuals must give their consent for their
personal data to be processed by an organisation, both at
the stage of initial registration for a social media service,
and for any subsequent changes to the terms of use of the
data. There are two main ways of obtaining consent online:
Explicit. This requires users to consent to their data
being processed and stored, usually by clicking accept
on a terms and conditions page in order to continue
browsing or using an application. Users are usually
required to provide explicit consent when initially
registering for a social media service. Political opinions
are classed as sensitive data under the DPA, which
means users must provide explicit consent.
Implicit. This requires companies to inform users that
their personal data may be processed if they continue to
browse a website or use a service. If they proceed, it is
implied that they consent. Users consent to changes in
the terms and conditions of use of their data is usually
implied; however the GDPR is likely to require explicit
consent for changes to terms.
Box 4. Tracking Online Behaviour through Social Media
Some social media companies offer third-party authentication services,
such as Facebook Connect and Google+ Sign-in, which allow users to
register with other websites through their social media account. Using
these services means that users specific browsing behaviours can be
tracked using software tracking devices called cookies and pixels,
which are installed on computers when users visit certain websites.
Social media platforms can then link users online activities outside of the
social media platform to the data from their personal profiles. It can also
allow other websites greater access to the users social media data.
Withdrawing Consent
Under the DPA, it must be possible for the individual to
withdraw consent for their data to be processed or stored.
However, in practice this is not always straightforward to do.
Currently, Article 17 of the draft GDPR would give EU
citizens the Right to Erasure. If included in the final
Regulation, this might require websites that store data on
users to provide an easy option for consumers to have all
data on them erased. Potentially, this would enable users to
erase all data that a social network held on them at the click
of a button. Supporters of this clause argue that it would
provide a clear mechanism through which individuals could
withdraw consent if they believed that their privacy was
being infringed. However, internet companies, the UK
Government and the Information Commissioners Office
(ICO), which oversees the DPA, have expressed concern
that all aspects of this requirement could be difficult or
impossible to meet in practice because data from social
networks may have already been passed on or sold
numerous times, or used to create new datasets.
Re-Identification
The DPA does not apply to anonymised data; personal
data which has had obvious identifiers (like name, date of
birth or email address), and non-obvious identifiers (such as
computer IP addresses), removed. Anonymised data can be
distributed or sold without users consent. Data are classed
as anonymised under the DPA if all reasonable efforts
have been taken by the organisation holding the data (the
data controller) to ensure that it cannot be re-identified.
However, research has indicated that it is possible to reidentify individuals in large datasets through crosscorrelation with other databases.7 This is a particular
concern when multiple anonymised datasets containing the
same individuals are available to third parties, as they can
be matched up, or overlaid with one another, to re-identify
individuals. Anonymised Google search histories, Facebook
Friends lists and Netflix film ratings have all been shown to
be re-identifiable. This may permit far greater intrusions into
Page 4
POST is an office of both Houses of Parliament, charged with providing independent and balanced analysis of policy issues that have a basis in science and technology.
POST is grateful to Benjamin Taylor for researching this briefing, to the AHRC for funding his parliamentary fellowship, and to all contributors and reviewers. For further
information on this subject, please contact the co-author, Dr Abbi Hobbs. Parliamentary Copyright 2014. Image copyright istockphoto.com.