Vous êtes sur la page 1sur 35

Fron%ers

of
Computa%onal Journalism
Columbia Journalism School

Week 5: Social Filtering
October 3, 2014

User

stories not covered


x

x
x
x

x
x

x
ltering

User

who user chooses to follow =


social ltering
x
x
x

x
x

TwiKer follower network


We have crawled the en%re TwiKer site and obtained
41.7 million user proles, 1.47 billion social rela%ons, 4,
262 trending topics, and 106 million tweets. In its
follower-following topology analysis we have found a
non-power-law follower distribu%on, a short eec%ve
diameter, and low reciprocity, which all mark a
devia%on from known characteris%cs of human social
networks

- Kwak et. al, What is Twi*er, a Social Network or a
News Media?

More followings than followers

Small avg distance between two nodes


(why? and what does this mean?)

Its a news network


Small number of high-degree hubs

Its a news network


Small number of high-degree hubs

Dierent network structure than e.g. Facebook.

Dierent uses.

why?

- Zynep Tufekci, What Happens to #Ferguson Aects Ferguson:


Net Neutrality, Algorithmic Filtering and Ferguson

John McDermoK, Why Facebook is for ice buckets, Twi*er is for Ferguson

- Sunita, Why #Ferguson broke out on Twi*er, not Facebook

Finding sources on social media

Classify Users
Classic machine learning problem. Classify each
user as one of:
journalist/blogger
organiza%on
ordinary individual

First, need to encode as a vector / select
features...

Features for user classier


# of followers / following
# of posts, favorites
percentage of posts that are RTs, @replies,
links
presence/absence of named en%%es
topic distribu%on of tweets (IPTC top level
topics)

Digression: IPTC Media Topic Codes


Interna%onal standard hierarchical taxonomy, part of
the NewsML markup system. Dened by Reuters, AP,
NYTimes...

K-nearest neighbor classier

Take K closest training points (in high dimensional


feature space), choose majority label.

Crea%ng the training data


1,850 random users
1,532 known organiza%ons
1,490 known journalists and bloggers

Hired Mechanical Turk workers to apply labels.
Each user labeled by two workers, discarded if
disagreement.

Classier Accuracy

Eyewitness classier
Goal is to nd individual tweets that are eyewitness
reports.





Started with LIWC (linguis%c inquiry and word
count) dic%onary that classies English words
along 70 dierent dimensions, including emo%on,
cogni%on, %me, health...

Word Aspects

Used percep%on category words


plus insight and certainty words

Eyewitness tweet classier


Its an eyewitness tweet if it contains any of
these special words! (or their stems)

High precision! Low recall.

89% of tweets classied as eyewitness actually were.
But only 32% of eyewitness tweets detected.

Other dimensions
Tweet contains URL to photo or video (used table of
domain names, e.g. ickr.com = photo)

Posted from mobile device (from tweet metadata naming
pos%ng app)

Geocode users stated loca%on (this is painful and
unreliable)

Distribu%on of friends loca%ons. (Friend = mutual
following)

Test user reac%ons


This gives you context you have the context for
whether or not you think theyre reputable or whether or
not theyre worth reaching out to.

Its giving me a lot of context which is really useful when
youre trying to verify if someone is reputable or not.

I would tend to focus on the eyewitnesses and
journalists/bloggers. Eventually Id look at everyone else
but Id want to start my search with those two groups
because they would normally provide me with the most
informaPon.

Test user reac%ons



Popular features:
Eyewitness ltering, user loca%on, image/video lter


Unpopular features:
En%ty extrac%on not helpful, no ability to lter by
loca%on and eyewitness status, focus on users
instead of content

Social Sorware
Basic assump%on: structure of sorware
inuences how groups use it.







or: architecture inuences behavior

Three ways to inuence behavior


Norms: culture, habits, e%queKe, the users
sense of what is right or appropriate

Laws: rules enforced by the administrator

Code: what it is actually possible to do

Design problem...
What do we want the users to accomplish
together?

How do we encourage this?

We can write the code, but the culture is a
separate issue.

Vous aimerez peut-être aussi