Vous êtes sur la page 1sur 3

Technical Assignment

Get Real! Peter Melis 3641872 Master NMDC WG1: Ann-Sophie Lehmann

The Issuecrawler and colink analysis


As Lev Manovich writes in his article Data Visualisation as New Abstraction and Anti-Sublime examples of data visualization can already be found in the eighteenth century. Nowadays with the aid of computer technology we can visualize more and more data and in more ways than we can imagine (4). This is why data visualization is a hot topic right now. Manovich makes a clear distinction between visualization and mapping. He uses the term visualization for the situations when quantified data which by itself is not visual ().Visualization () can be thought of as a particular subset of mapping in which a data set is mapped into an image (3-4). The subject I want to talk about is the Issue Crawler from Govcom.org. The IssueCrawler is web network location and visualization software. It consists of crawlers, analysis engines and visualisation modules. It is server-side software that crawls specified sites and captures the outlinks from the specified site (govcom.org). One of the methods used to visualize a network is colink analysis. In the case of the Issue Crawler, colink analyses means that the crawler is given a number of URLs to start with (seed URLs). The Issue Crawler crawls these URLs and retains the pages that receive at least two links from these seeds. Together with the visualization modules you can get a mapping of for example the websites surrounding a specific subject or problem. Below you can find an cropped example of a visualization done by myself via the Issue Crawler to give you an idea.

The concept of web colink analysis (WCA) comes from the concept of author cocitation analysis (ACA). Where ACA is concerned with which author is well cited in what field of research, WCA results generally results in web pages or sites that are well linked from other pages or sites (Zuccala 1488). There are two types of colinks: colinks that are based on inlinks and colinks that are based on outlinks (Qiu et. al. 327). The quickest way to explain this, is via the pictures below: Co-inlink Website B Website A Website C Colinked Co-outlink Website B Colinked Website A Website A

In the left example B and C are colinked because A links to both. In the right example A is colinked because both B and C link to A. The Issue Crawler makes use of the co-inlink principle. This example of different ways to look at colinking is just the beginning. Different methods are being used and developed. Al methods are based on a basic sequence of steps (see McCain in Qiu et. al. 329): 1. 2. 3. 4. 5. Selection of the core set of items for the study. Retrieval of co-citation frequency information for the core set. Compilation of the raw co-citation frequency matrix. Correlation analysis to convert the raw frequencies into correlationcoefficients. Multivariate analysis of the correlation matrix using principle componentsanalysis cluster analysis or multidimensional scaling techniques. 6. Interpretation of the resulting map and validation. The first example of a decision to make when performing colink analysis is if you let your search or crawl look at just the page where the links link to or to the whole site where the linked page is on. This means choosing for example between just looking at the links on a homepage instead of the links on the entire website. Vaughan and You write that for research in business relations the links found on a homepage represent the relations better than all the links on the entire website (436). Lang et. al. believe that it is important not to just take In account the links on the homepages of your seed URLs, but also the links on the rest of the whole website if you want to avoid restricted data problems found in analyzing smaller networks (161-162). Furthermore researchers have used keyword searches in combination with colink analysis to avoid websites and companies in the results that actually arent involved in the same issues. They only took in account the websites in the results that mentioned a specific keyword on their homepages to make sure all the companies were involved in the same issue (Vaughan and You 440-441). My last example of a customization of the colink analysis method is the method developed by Qiu et. al. They state that most colink analysis is done by taking all found links in to account. They stress that it is important to be critical about the quality of links. A link can be substantive (they have their own real meanings and incentives such as agreement and recommendation.) or non-substantive (those

that do not) (328). Their case study shows that when you only look at substantive links, you will get more in-depth results giving a better representation of the relations between websites. As you can see there is a lot of considerations to take into account when performing colink analysis. This is why I think it is very interesting to research in combination with the fact that I wasnt happy with the results from my crawl. If this is because I did it wrong, or if the Issue Crawler doesnt give correct representations I dont know yet. It is Manovich who points us in this direction when he talks about data visualization and art: Since computers allow us to easily map any data set into another set, I often wonder, why did the artist choose this or that form of visualization or mapping when endless other choices were also possible? Even the very best works that use mapping suffer from this fundamental problem. This is the dark side of the operation of mapping and of computer media in generalits built-in existential angst. By allowing us to map anything onto anything else, to construct an infinite number of different interfaces to a media object, to follow infinite trajectories through the object, and so on, computer media simultaneously make all these choices appear arbitraryunless the artist uses special strategies to motivate her or his choice (7). Literature Lang, P et. al. Site co-link analysis applied to small networks. Scientometrics 83 (2010): 157-166. Manovich, L. Data Visualisation as New Abstraction and Anti-Sublime. Small Tech. The Culture of Digital Tools Eds. Hawk, B. et al. Minneapolis: University of Minnesota Press, 2008. 3-9. Qiu, J. et. al. An exploratory study on substantive co-link analysis. Scientometrics 76-2 (2008): 327 341. Vaughan, L. and J. You. Content assisted web co-link analysis for competitive intelligence. Scientometrics 77-3 (2008): 433-444. Zuccala, A. Author Cocitation Analysis is to intellectual structure as Web Colink Analysis is to ? Journal of the American Society for Information Science and Technology 57-11 (2006): 1487-1502. Other sources www.govcom.org (Issue Crawler)

Vous aimerez peut-être aussi