Vous êtes sur la page 1sur 44

Web Content Analysis

Workshop in Advanced Techniques for Political Communication Research: Web Content Analysis Session Professor Rachel Gibson, University of Manchester

Aims of the Session


Defining and understanding the Web as on object of study & challenges to analysing web content Defining the shift from Web 1.0 to Web 2.0 or social media Approaches to analysing web content, particularly in relation to election campaign research and party homepages. Beyond websites? Examples of how web 2.0 campaigns are being studied key research questions and methodologies.

The Web as an object of study


Inter-linkage Non-linearity Interactivity Multi-media Global reach Ephemerality
Mitra, A and Cohen, E. Analyzing the Web: Directions and Challenges in Jones, S. (ed) Doing Internet Research1999 Sage

From Web 1.0 to Web 2.0


Tim Berners-Lee
Web 1.0 was all about connecting people. It was an interactive space, and I think Web 2.0 is of course a piece of jargon, nobody even knows what it means. If Web 2.0 for you is blogs and wikis, then that is people to people. But that was what the Web was supposed to be all along. And in fact you know, this Web 2.0, it means using the standards which have been produced by all these people working on Web 1.0, (from Anderson, 2007: 5)

Web 1.0 The read only web Web 2.0 What is Web 2.0 Tim OReilly (2005) The read/write web

Web 2.0
Technological definition
Web as platform, supplanting desktop pc Inter-operability , through browser access a wide range of programmes that fulfil a variety of online tasks i.e. manage a home page, communicate with friends, share/publish pictures, receive news.

Social understanding
Web 2.0 is based around social networking activities relies on and built through social or participatory software. Users create, distribute, value-add to online content through blogs, facebook profiles, online video, wikis, tagging tools. Produsers Bruns (2007) or Prosumers distinction between producers and users/consumers of content disappears. Hallmark of these applications is the way in which they devolve creative and classificatory power to ordinary users.

Web 1.0 & Web 2.0 applications social media


Web 1.0 Websites Email Discussion forums/chat room/IM Web 2.0 or social media Blogs Social Network Sites (Facebook) Online Video sharing/posting (YouTube) Online Photo sharing/posting (Flickr) Microblogging sites (Twitter) @, #, RT Online Link sharing/posting (Del.icio.us, Digg)

Methodologies used for analysing online campaigns (supply(supply-side)


Adapted from offline - `digitized methods` (Rodgers, 2010) Web 1.0
Online interviews, elite surveys Online discourse analysis Online content/feature analysis
Classic or offline approaches Web specific mixed mode

Online specific - digital methods Web 2.0


Hyperlinking and webspheres Voson, Issue Crawler, SocScibotOnline, NodeXL, Blog analysis toolkits and search engines Technorati . BAT (Blog Analysis Toolkit) Twitter and facebook scrapers - Infoscape Lab, 140kit.com, Twapperkeeper, Discovertext YouTube scrapers -Tubemogul, Tubekit Wikipedia Wikiscanner

Methodologies for analysing online campaigns (demand side)


User Focused (demand)
Focus groups: Price and Capella (2001); Stromer-Galley and Foot (2002) Surveys: Bimber and Davis (2003); Gibson & McAllister (2007) Experiments: Iyengar (2002); Lupia & Philpott (2005) User statistics: Hitwise, Sitemeter, Alexa, Google Analytics Computer tracking devices: Phorm Online SNA software: NodeXL,

Discourse analysis to study web content


Focus of the analysis is on web as socio-cultural texts. Described and explored from a qualitative perspective to uncover underlying meaning and significance. Examples of studies:
Markham (1998) Howard (2002) Hakken (1999) Hine (2000) Benoit and Benoit (2000) The Virtual Campaign: Presidential Primary Websites in Campaign 2000 American Communication Journal Warnick (1998) Appearance or Reality? Political Parody on the Web in Campaign 96 Critical Studies in Mass Communication

Web specific content/feature analysis


Focus is on the web page/site as unit of analysis Structural-Functional & Quantitative - systematic set of criteria developed and applied to sites to yield largely quantitative measures of content, functionality, usability, and design features. Quantitative-automated - Bauer and Scharl (2000) Quantitative semi-automated Web 1.0 Web 1.5? Action Centers Heuristic Evaluation - Nielsen & Molich (1990) Heuristic evaluation of user interfaces; Collings and Pearce (2002); Yates (2006).

Bauer and Scharl (2000) Internet Research

Quantitative semi-automated semi1. Web 1.0 static/fixed homepages - Parties and campaigns sites Gibson and Ward (2000; 2002) Foot & Schneider (2002) - E-government field see Baker (2009) Pina et al. (2009) Panopoulou et al (2008) West (2007) Henriksson et a. (2006) Holzer and Kim (2005); Garcia et al (2005) http://www.insidepolitics.org/egovt06us.pdf 2. Web 1.5? - Updated web 1.0 schemes to incorporate web 2.0 elements. Party and Campaign home pages - Gulati & Wiliams, 2006; Foot and Schneider 2006 NSM - Stein, 2009 3. Action Centers - MyBO, Membersnet, MyConservatives, LibDemACT (Gibson, 2010; Lilleker and Jackson, 2010)

Gibson & Ward (2000) SSCORE

Gibson & Ward (2002) AJPS

Gibson & Ward (2002) AJPS

Gibson & Ward (2002) AJPS

Stein, L.. 2009 NM&S

Stein, L.. 2009 NM&S

Stein, L.. 2009 NM&S

Stein, L.. 2009 NM&S

Gibson, R. 2010 APSA

Hyperlinking and webspheres


Focus is on inter-relational/textual element of websites and the wider context. Unit of analysis is the web network rather than individual sites. Thematic set of websites are identified, captured/archived over time and the linkage patterns explored as well as content.
Method pioneered by Schneider and Foot (2004) Web Sphere analysis. Define a Web Sphere as a hyperlinked set of dynamically defined digital resources spanning multiple Web sites and deemed relevant to related to a central theme or object. The borders of the sphere are defined relationship to central theme/object and a given time period.

See Library of Congress, Internet Archive http://www.archive.org/index.php & WebArchivist.org http://webarchivist.org/

Hyperlink analysis
More sophisticated/automated methods have developed for academic research since around 2003.
Adaptations of search engine technology most simple form of online link analysis. Hindman et al. (2003) Googlearchy pioneered the term googlearchy referred to a power law .operating in regard to the prominence of sites. Adamic and Glance (2005) examined linkage between left and right-wing blogs in the U.S. http://www.blogpulse.com/papers/2005/AdamicGlanceBlogWWW.pdf Ackland and Gibson (2007) comparative analysis of political party systems linkage using VOSON http://anu.voson.edu.au crawler to examine the out and inbound linkages among political parties in 6 different democracies. Hyperlinks = networked communication inclusivity, identity, opponent dismissal, force multiplication. Test via nos to links to other parties, target of links by tld, inter-linkage within party groups. For tools see http://www.issuecrawler.net ; http://www.touchgraph.com http://socscibot.wlv.ac.uk/

InterInter-linkage between Parties by ideological family


Table 4: Inter-linkage between political parties by party type
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Far Left Left Centre Right Far Right Ecol. Reg. Total no. of parties linking to Total no. % of links other parties of links within party family made to other parties Ecologist 0 0 2 1 0 6 0 6 14 79 Far Left 11 7 0 2 1 3 1 12 71 46 Left 3 4 3 2 0 2 0 2 22 50 Centre 0 4 2 2 0 1 0 5 12 25 Right 1 3 1 8 1 2 1 9 20 55 Far Right 0 0 0 1 2 0 0 3 7 71 Regionalist 0 0 0 0 0 0 1 1 1 100 39 147
Note: The numbers in columns 1-7 are counts of parties (e.g. 11 far left parties linked to other far left parties). Note that the same party can appear in more than once in a given row in columns 1-7, and this is why column 8 (showing the total number of parties that have linked to another seed site) is not equal to the sum of the preceding columns.

Beyond websites?
Blog analysis key studies
Types, Content, Structure Sweetster-Trammel, Kaye D. 2007. Candidate campaign Blogs: Directly Reaching Out to the Youth Vote. The American Behavioral Scientist. 50(9): 1255-1264. Xifra, J. and A. Huerta2008. s Blogging PR: An Exploratory Analysis of Public Relations Public Relations Review 34:265-275. Herring, S. et al. Weblogs as a bridging genre Information Technology and People 18(2): 2005: 142-171. Influence Karpf, D. 2008. Measuring Influence in the Political Blogosphere Institute Politics and Technology Review, Institute for Politics, Democracy and the Internet. :33-41 http://www.the4dgroup.com/BAI/articles/PoliTechArticle.pdf Etling, B. et al. 2010. Mapping the Arabic blogosphere: politics and dissent online 12: 1225-1243. Elmer, G.; Ryan, P.M.; Devereaux, Z.; Langlois, G.; Redden, J. and McKelvey, F. 2007. Election Bloggers: Methods for Determining Political Influence. First Monday 12(4). Drezner and Farrell. 2008. The Power and Politics of Blogs Public Choice (134): 15-30

Beyond websites?
Twitter analysis Boyd, D. et al. 2010 Tweet Tweet Retweet Working paper. http://www.danah.org/papers/TweetTweetRetweet.pdf Tumasjan et al. 2010. Predicting Elections with Twitter Association for the Advancement of Artificial Intelligence. Social Science Computer Review http://ssc.sagepub.com/content/early/2010/09/24/089443931038655 7.full.pdf+html Zuckerman, E. Studying Twitter and the Moldovan Protests www.Ethanzuckermann.com/blog/

Beyond websites?
YouTube analysis
Shah, C. 2010. Supporting Research Data Collection from YouTube with Tubekit Journal of Information Technology & Politics 7(2&3): 226-240 Wallsten, K. 2010. Yes We Can: How Online Viewership, Blog Discussion, Campaign Statement, and Mainstream Media Coverage Produced by a Viral Video Phenomenon. Journal of Information Technology & Politics 7(2&3): 163-181 Zink, M. Suh, K and J. Kurose. 2009. Characteristic of YouTube network traffic at a campus network, Computer Networks 53: 501-514 Tian, Y. 2010. Organ Donation on Web 2.0: Content and Audience Analysis of Organ Donation Videos on YouTube Health Communication 25: 238-246.

Online resources and tools for collection and analysing web 2.0 content.
Digital Tool Kits
Digital Methods Initiative - https://wiki.digitalmethods.net/Dmi/ToolDatabase Richard Rodgers,
Natively Digital: The Link | The URL | The Tag | The Domain | The PageRank | The Robots.txt Device Centric: Google | Google Images | Google News | Google Blog Search | Yahoo | YouTube | Del.icio.us | Technorati | Wikipedia | Alexa | IssueCrawler | Twitter | Facebook

Infoscape Research Lab Tools www.infoscapelab.ca Greg Elmer - Blog Aggregator measures activity levels over time and hyperlinks cited in blog posts
- YouTube & Twitter scraper title, tags, links, date uploaded, author info, views per week - Facebook Group scraper - title, type, number and members of group

Training / Research
Exploring Online Research Methods Website: http://www.geog.le.ac.uk/ORM/site/home.htm Digital Methods Initiative Training Course https://www.digitalmethods.net/Digitalmethods/WebHome http://nms.sagepub.com/ http://jcmc.indiana.edu/ http://www.connectedaction.net/ http://www.methodspace.com/group/sageresearchmethodsonline

Concluding comments
There are a growing number of options open to researchers seeking to capture and analyze campaigns and their influence on voters. Approaches have evolved as Web itself has. From adaptation traditional methods suitable to analysis of static point to mass old media to newer web specific tools that allow for capture and analysis of more dynamic many to many medium. Network analysis tools increasingly used reflecting a shift in the understanding and practice of the Web as a dynamic social context - conversational - rather than a fixed infrastructure. The notion of content itself has changed fundamentally - user-generated rather than editor controlled (although MSM still dominant in news). This has some significant implications for content analysis:
1. Linkage of the context to the content. To interpret meaning of web content twitter feed, facebook profiles, locate it in the wider and ongoing stream of dialogue. 2. . Locating content inter-operability of the various platforms means that content is shared across multiple sites. So need to follow the content. 3. Volume of content increases - capturing and archiving issues 4. Quality of content changes . A new language or terminology emerging content = actions. Posts, embeds, widgets, tweets, email, views, followers, befriending or liking a politician/party . A new software language has grown up - @, #, RT, http. Automating/manual coding.

Short demo of a digital methods


Getting started with NodeXL. Manual: Hansen et al. Analyzing Social Media Networks with NodeXL Not focused on web content per se but structure or influence of that content. So ex. compare politicians twitter networks, or a trending topic import by #. Example: Fake FB network but you can import pajek, ucinet datasets or network data from twitter, youtube, email lists Vertices = nodes or agents and edges or ties between nodes. Edges are relationships between them - undirected or directed Unweighted edge is simplest whether relationship exists. Weighted adds information (strength, frequency of the tie) have darker/thicker lines. NodeXL works through setting up an edge list which represents a network. This network can then be represented in a graphical form. Through this you can calculate various measures of centrality of individuals or nodes in the network. Group function can group types of individuals and calculate a collective weight.

Vous aimerez peut-être aussi