Vous êtes sur la page 1sur 39

Data Visualization

What is it? Visualization is a technique to graphically represent sets of data. When data is large or abstract, visualization can help make the data easier to read or understand. Data visualization is the graphical representation of information. Bar charts, scatter graphs, and maps are examples of simple data visualizations that have been used for decades. Information technology combines the principles of visualization with powerful applications and large data sets to create sophisticated images and animations.

What is it?
mapping systems and remote sensors and generates a visualization that shows where nitrates concentrate in soil and how different modes of fertilizer deliverycoupled with variables such as precipitationaffect the rates and locations of groundwater pollution. Faculty and researchers in a wide range of academic disciplines use visualizations to present data in ways that help generate new knowledge and understanding.

Need for Data Visualization


Lets say you need to understand thousands or even millions of rows of data, and you have a short time to do it in. The data may come from your team, in which case perhaps youre already familiar with what its measuring and what the results are likely to be. Or it may come from another team, or maybe several teams at once, and be completely unfamiliar. Either way, the reason youre looking at it is that you have a decision to make, and you want to be informed by the data before making it. Something probably hangs in the balance: a customer, a product, or a profit. How are you going to make sense of all that information efficiently so you can make a good decision? Data visualization is an important answer to that question. However, not all visualizations are actually that helpful. You may be all too familiar with lifeless bar graphs, or line graphs made with software defaults and couched in a slideshow presentation or lengthy document. They can be at best confusing, and at worst misleading. But the good ones are an absolute revelation. The best data visualizations are ones that expose something new about the underlying patterns and relationships contained within the data. Understanding those relationships and being able to observe them is key to good decision making. The Periodic Table is a classic testament to the potential of visualization to reveal hidden relationships in even small datasets. One look at the table, and chemists and middle school students alike grasp the way atoms arrange themselves in groups: alkali metals, noble gasses, halogens. If visualization done right can reveal so much in even a small dataset like this, imagine what it can reveal within terabytes or petabytes of information.

Cannot see a pattern without data visualization. Simply seeing numbers on a grid often does not convey the whole story and in the worst case, it can even lead to a wrong conclusion. This is best demonstrated by Anscombes quartet where four seemingly similar groups of x/y coordinates reveal very different patterns when represented in a graph (see Figure 1).

Cannot fit all of the necessary data points onto a single screen. Even with the smallest reasonably readable font, single-line spacing, and no grid, one cannot realistically fit more than a few thousand data points on a single page or screen using numerical information only. When using advanced data visualization techniques, one can fit tens of thousands (an orderof-magnitude difference) of data points onto a single screen. In his book The Visual Display of Quantitative Information, Edward Tufte gives an example of more than 21,000 data points effectively displayed on a US map that fits onto a single screen.

Cannot effectively show deep and broad data sets on a single screen. Fitting in and analyzing hundreds or thousands of columns of attributes (dimensions in BI speak) is an enormous challenge. Imagine a typical drug trial conducted by a pharmaceutical company where each patient has thousands of attributes: physical, psychological, genetic, behavioral, etc. Analysts looking for patterns, dependencies, and correlations typically need to run the data through complex statistical models before they can find a pattern or correlation. Building such models and running them through millions of rows of data can be time-consuming and can tax even the most advanced software and hardware resources. But in a technique often used in the pharma industry, reducing each data point in a column to a single pixel and colorcoding pixels according to their value ranges can let an analyst relatively easily visualize and identify a pattern and then quickly zoom in to research the details. Types of visualization Its important to point out that not all data visualization is created equal. Just as we have paints and pencils and chalk and film to help us capture the world in different ways, with different emphases and for different purposes, there are multiple ways in which to depict the same dataset. Or, to put it another way, think of visualization as a new set of languages you can use to communicate. Just as French and Russian and Japanese are all ways of encoding ideas so that those ideas can be transported from one persons mind to another, and decoded again and just as certain languages are more conducive to certain ideas so the various kinds of data visualization are a kind of bidirectional encoding that lets ideas and information be transported from the database into your brain. Explaining and exploring

An important distinction lies between visualization for exploring and visualization for explaining. A third category, visual art, comprises images that encode data but cannot easily be decoded back to the original meaning by a viewer. This kind of visualization can be beautiful, but it is not helpful in making decisions. Visualization for exploring can be imprecise. Its useful when youre not exactly sure what the data has to tell you and youre trying to get a sense of the relationships and patterns contained within it for the first time. It may take a while to figure out how to approach or clean the data, and which dimensions to include. Therefore, visualization for exploring is best done in such a way that it can be iterated quickly and experimented upon, so that you can find the signal within the noise. Software and automation are your friends here. Visualization for explaining is best when it is cleanest. Here, the ability to pare down the information to its simplest form to strip away the noise entirely will increase the efficiency with which a decision maker can understand it. This is the approach to take once you understand what the data is telling you, and you want to communicate that to someone else. This is the kind of visualization you should be finding in those presentations and sales reports. Visualization for explaining also includes infographics and other categories of hand-drawn or custom-made images. Automated tools can be used, but one size does not fit all.

How does it work?


Visualizations encompass a wide and growing range of projects, reflecting creative ways of representing all sorts of data visually, with virtually no limit to what kind of information can be translated into an image. Visualizations have been created that represent the possible moves on a chess board, the structure of a piece of music, and the messages in a persons email inbox, to name a few. The designer of a visualization determines which visual element (color, shape, size, motion, and so forth) will represent individual data points. Images can be 2D or 3D, can be fixed or dynamic, and can allow user interaction. One application, for example, shows political contributions to various candidates. You can select a state, a political race in that state, and a monetary threshold for contributions. The application builds a 2D image that shows who supported each candidate and at what level (based on the relative sizes of the circles that represent contributors), revealing interesting webs of political influence. Astrophysicists use visualizations to create 3D images that model the forces of a supernova. In each case, images or animations are the products of applications that render data in a visual form based on the design of the visualization.

Data visualization techniques

Two-dimensional data Two-dimensional data can be visualized in different ways. A very common visualization form is the scatterplot. In a scatterplot the frame for the data presentation is a Cartesian coordinate system, in which the axes correspond to the two dimensions. The data is usually represented by points in the coordinate systems first quadrant (assuming the data point values are not negative). In case of two or more data sets being displayed in the same coordinate system different colours can be used to distinguish between the distinct plots. A problem with this way of displaying data arises when the amount of data points gets very high as the points become too dense. In order to avoid this Becker suggests binning of the data set [Sahling03]. The quality of the visualization now depends on the number of bins and their sizes. Figure 1 shows the distribution of miles per gallon (MPG) vs. horsepower for American (red), European (blue) and Japanese (green) cars.

Figure 1: Scatterplot of car data set [Hoffmann99] Another important visualization technique for two-dimensional data is the linegraph. The difference to scatterplots is that this time the relation between the dimension on the horizontal axis and the one on the vertical axis is definite. The following figure shows an example for a linegraph displaying the number of crimes in Niedersachsen in the years 1993 to 2002.

Figure 2: Total crimes (1993 - 2002) Extensions of linegraphs are survey plots. They can be obtained by turning the plot 90 degrees clockwise and then halve the length of the rays and add this half on the other side of

the now vertical axis. The last technique I would like to mention here is the visualization of data as barcharts. Considering the last figure a barchart representation would be the same as above but with the area under the graph filled in. Histograms are particular barcharts with the bar standing for the sum of the data point class [Hoffmann02].

3.2 Three-dimensional data The two-dimensional techniques can easily be extended to three dimensions. The third dimension is achieved in scatterplots and barcharts by adding a further axis, orthogonal to the other two. The additional dimension in a linegraph representation has the effect that the resulting plot is a surface. Figure 3 shows an example that has been generated with Matlab.

Figure 3: 3D linegraph (surface) [generated with Matlab] A very widespread technique for visualizing the third dimension in a two-dimensional coordinate system is the use of colour or a variation of the data point size. Another very interesting visualization technique is animation to show the variation of the plot with time for instance. 3.3 High-dimensional data The visualization of high-dimensional data raises a very severe problem: the visualization space is limited to three dimensions or even to only two since data is usually displayed on screens or paper. One of the obstacles in the discovery of high-dimensional data sets information Mihalisin [Mihalisin02] points out is that techniques of extracting lowdimensional information and displaying it cannot automatically be employed for highdimensional data as the data set size is too large. Next we have to study the effect on the possible resulting data sets if we increase the number of variables or values they can hold. In order to do this, consider the following example [Mihalisin02]: We have a data set consisting of six columns which represent the attributes product, territory, sales channel, method of payment, time of payment and a unique identifier. Furthermore we have 100,000 rows representing the records. Our company sells five products in five different territories via two sales channels. We also offer the opportunity of two distinct methods of payment, all divided

into five quarters. This means there are 55225 = 500 possible cell results. 100,000 records, each having one of the 500 cell results, leads to 100,499!/(100,000499) = 101350 as the amount of different data sets. This is a huge number (larger than the number quantity of atoms in the universe!) and it is only a very small database. Coming now to the different visualization techniques, we distinguish between icon-based, hierarchical and geometrical methods. 3.3.1 Icon-based methods Icon-based methods are approaches that use icons (or glyphs) to represent high-dimensional data. They map data components to graphical attributes. The most famous technique is the use of Chernoff faces [Hoffmann02]. In this case a data point is represented by an individual face whereas the features map the data dimensions. Five different sizes of the eyes could correspond to the five products of the example above and the mouth might symbolize the two methods of payment. This scheme uses a persons ability of recognizing faces. Examples for Chernoff faces shows figure 4.

Figure 4: Chernoff faces [Ward99] The probably most common icon-based technique is the use of star glyphs to denote data points. A star glyph consists of a centre point with equally angled rays. These branches correspond to the different dimensions and the length of the limbs mark the value of this particular dimension for the studied data point. A polygon line connects the outer ends of the spokes [Oellien03]. An illustration of the star glyphs approach is figure 5.

Figure 5: Star glyphs [Oellien03]

These icon-based techniques are very vivid but have several disadvantages. A very severe problem is the organisation of the glyphs on the screen as no coordinate system representing two of the dimensions is provided. Even if you decided to use a Cartesian system it would put more weight on these two dimensions and so probably distort the data pattern. Another obstacle is the amount of variables and the size of the data set itself. If the number of rays become too high a distinction between the different spokes and the values they represent is not possible anymore. A similar unclear map emerges if the number of data points exceeds a certain amount. 3.3.2 Hierarchical methods The most important representative of the group of hierarchical visualization techniques is dimensional stacking. It is a method of embedding coordinate systems recursively into each other [Grinstein02a]. Consider again the example with the five products, five territories, two sales channels, two methods of payment and five quarters [Mihalisin02]. First of all you have to select the two outermost dimensions. We choose the quarters and the pay types. Our horizontal axis is now divided into five parts while the vertical axis becomes halved. We now decide that we would like the sales channel to be embedded into the method of payment, so each part of the pay type axis gets further divided into two parts that represent the different channels. The axis corresponding to the quarters will embed the products so these elements become subdivided as well. Finally the upright axis lodges the five territories. The resulting coordinate axes combination system is the following:

Figure 6: Dimensional stacking In the figure above goods of product type four, sold in quarter one in territory four, via the first sales channel and the first type of payment can be represented by the coloured rectangle. In order to visualize the amount of data points you can use a colour/grey scale. Considering the colour scale drawn next to the plot the filled rectangle would represent an amount of less than 40,000 items. This value is binned since otherwise a clear visualization would not be possible. The common depiction of the dimensional stacking technique is a bit more compact and not as nicely presented as the one above. Usually the rectangles, which are now spaced to make the distinction between the different attribute combinations easier, are close to each other, only separated by a thicker line. This method is very useful for hierarchical data sets that only have a small number of dimensions as otherwise the embedding process will make the resulting plot too crowded. A great challenge is the question of labelling. The way chosen in the example is one possibility of naming the different variables in the plot. A technique that displays the correlation between dimensions (not the data itself!) recursively [Hoffmann02] is the fractal foam. The starting point is a chosen dimension that is depicted by a coloured circle. Attached to this circle are further circles, which symbolize the other dimensions. The size of these rings corresponds to the correlation between the inner circle and the fastened ones. A high correlation requires a large circle. Fixed to the second layer of circles is a third layer which describes the correlation of these dimensions and so on. An example of fractal foam can be found in figure 7.

Figure 7: Fractal foam (sepal length centre (white), petal length right (red), petal width top (yellow), sepal width bottom (green)) [Hoffmann99] 3.3.3 Geometrical methods Geometrical methods are a very large group of visualization techniques. Probably the easiest and most commonly used one is the method of parallel coordinates. Here the dimensions are represented by parallel lines, which are equally spaced. They are linearly scaled so that the bottom of the axis stands for the lowest possible value whereas the top corresponds to the highest value. A data point is now drawn into this system of axes with a polygonal line, which crosses the variable lines at the locations the data point holds for the examined dimension. A simple example with three points and four dimensions is shown below: The points displayed are A = (1; 3; 2; 5), B = (2; 4; 1; 6) and C = (1; 4; 3; 5)

Figure 8: Parallel coordinates for data points A, B, C This method is not exclusively applicable to data sets that are as simple as the one above. One of the familiar high-dimensional data set examples used to explain data visualization techniques is the Iris data set. It consists of three different Iris types, namely Iris setosa, Iris Versicolor and Iris Virginica. The variables of this data set are the sepal length, the sepal width, the petal length and the petal width, all measured in millimetres. As you can see in the

plot below the parallel coordinate technique is a tool which enables you to find out attributes that allow a categorization of the different flower types. In the diagram the petal width seems to be a good classifier for the red Iris type. It is also a fairly good attribute to distinguish between the violet and green flower category.

Figure 9: Parallel coordinates (Iris data set) [Grinstein01] A very significant feature of this visualization technique is that the dimensions are treated equally. This characteristic permits a rearrangement of the displayed dimensions, which gives another view on the data and therefore might lead to the recognition of certain patterns (or classification attributes) that would otherwise be hidden in the actual visualization arrangement. Figure 10 shows the same Iris data set but this time normalized and with the dimensions sepal width and sepal length swapped. The resultant graph looks very different and much clearer.

Figure 10: Parallel coordinates (Iris data set) [Hoffmann99] ^top^

Another interesting geometrical visualization technique is the use of Andrews curves [Hoffmann02]. This method plots each data point as a function of the data values using a specific equation. The data point curves are usually sketched in the interval -<t<. The function which draws these curves is shown as:

where x=(x1, x2, ..., xn) and xn are the values of the data points for the particular dimension. Consider the example of the three data points already used to explain the parallel coordinates technique (A = (1; 3; 2; 5), B = (2; 4; 1; 6) and C = (1; 4; 3; 5)). For data point A, the function

For data point B, the function

For data point C, the function

If you plot these three data points into one coordinate system using Matlab you obtain the following result:

Figure 11: Andrews curves for data points A, B, C Applying this algorithm now on the Iris data set mentioned before results in a graph that looks slightly more complex.

Figure 12: Andrews curves (Iris data set) [Hoffmann99] The advantage of this algorithm is that it is easily applied to data with a large amount of dimensions. The disadvantage is the long computational time as every data point requires the calculation of a trigonometric function [Hoffmann02]. ^top^ A very basic technique to visualize high-dimensional data is the application of multiple views. They are often used with scatterplots or barcharts leading to an nn cell matrix, where n is the number of dimensions. Each cell of this matrix is then a scatterplot or a barchart respectively. This method is widely employed for data sets that contain diverse attributes. It reveals correlations and disparities between variables since the representation of the different component combinations next to each other allows a visual comparison of the possible connections. In the next example the method has been applied to the car data set, another widely employed set for visualization techniques. This table contains the combinations of miles per gallon (MPG), year of manufacture, cylinders, acceleration, horsepower and weight for three different car types. The red spots in the figure below symbolize American cars, the green ones Japanese cars and the blue ones European cars.

Figure 13: Scatter plot matrix (Car data set) [Hoffmann99] This figure clearly identifies a positive correlation between horsepower and weight, whereas the combination of MPG and weight reveals a negative correlation [Hoffmann02]. Even though this method is a very functional tool in the visualization of data it does have several disadvantages. A very problematic one is the fact that the user becomes overwhelmed by the number of charts they have to evaluate and keep in mind while doing so. The usage of space is a more practical aspect that needs consideration. The car example produces a matrix, which is not only a manageable quantity to work with but also to display. If the data set was extended to ten dimensions for instance the presentation of the corresponding graph in a clear way would no longer be possible. The last two techniques I would like to present in this paper belong to the division of anchor visualization methods. They are both fairly new approaches to the problem, the second being the further development of the first one. Radial Coordinate Visualization (RadViz) uses the spring paradigm [Hoffmann02]. From a centre point n equally spaced limbs of the same length spread out, each representing one dimension. The ends of the lines mark the dimensional anchor (DA) of the respective variable, which are connected forming a circle. Before the data points can be visualized by this technique they need to be normalized. After that one end of a spring is fastened to each dimensional anchor, the other end to the data point. The spring constant of each spring is the value of the data point of the respective dimension. In order to determine the location of the data point the sum of the spring forces needs to equal zero. If you apply this method to the well known Iris data set you can obtain figure 14.

Figure 14: RadViz (Iris data set) [Grinstein01] An advantage of RadViz is the fact that it preserves certain symmetries of the data set [Hoffmann02]. The major disadvantage is the overlap of points. ^top^

The second dimensional anchor technique, which has been named PolyViz, takes remedial measures. The emerging plot is a combination of RadViz and the application of the barchart technique. It illustrates the DAs not as points as in RadViz but as lines so that the graph becomes a polygon. This technique nevertheless shows the clustering of the data points in the middle of the polygon as it uses the same spring paradigm. But it also makes a study of the distribution along the different dimensions possible since it plots this scattering along the axes using the barchart technique [Hoffmann02].

Figure 15: PolyViz (Iris data set) [Grinstein01]

All the techniques explained above visualize data sets without trying to change them in order to simplify the visualization. In the following chapter I will introduce non-linear projection methods that reduce the size of the dimension vector so that the display of the data sets becomes facilitated.

DATA visualization tools 1. Mindmaps Trendmap 2007

Informationarchitects.jp presents the 200 most successful websites on the web, ordered by category, proximity, success, popularity and perspective in a mindmap. Apparently, web-sites are connected as theyve never been before. Quite comprehnsive. 2. Displaying News Newsmap is an application that visually reflects the constantly changing landscape of the Google News news aggregator. The size of data blocks is defined by their popularity at the moment.

Voyage is an RSS-feader which displays the latest news in the gravity area. News can be zoomed in and out. The navigation is possible with a timeline.

Digg BigSpy arranges popular stories at the top when people digg them. Bigger stories have more diggs.

Digg Stack: Digg stories arrange themselves as stack as users digg them. The more diggs a story gets, the larger is the stack.

3. Displaying Data Amaztype, a typographic book search, collects the information from Amazon and presents it in the form of keyword youve provided. To get more information about a given book, simply click on it.

Similar idea is being used by Flickrtime. The tool uses Flickr API to present the uploaded images in real-time. The images form the clock which shows the current time.

Time Magazine uses visual hills (spikes) to emphasize the density of American population in its map.

CrazyEgg lets you explore the behavior of your visitors with a heat map. More popular sections, which are clicked more often, are highlighted as warm in red color.

Hans Rosling TED Talk is a legendary talk of the Swedish professor Hans Rosling, in which he explains a new way of presenting statistical data. His Trendalyzer software (recently acquired by Google) turns complex global trends into lively animations, making decades of data pop. Asian countries, as colorful bubbles, float across the grid toward better national health and wealth. Animated bell curves representing national income distribution squish and flatten. In Roslings hands, global trends life expectancy, child mortality, poverty rates become clear, intuitive and even playful.

Three Views shows three views of the earth, in which each country is represented by a circle that shows the amount of money spent on the military (size of circle) and what fraction of the countrys earnings that uses (colour). Compact and beautiful presentation of data.

We Feel Fine shows human feelings, calculated from a large number of weblogs.

Visualizing the Power Struggle in Wikipedia displays the most popular articles and the most frequent search queries in the heatmap.

Websites as graphs. An HTML DOM Visualizer Applet, which displays sites as graphs depending on the amount of links, tables, div tags, images, forms and other tags.

Interactive History Timeline presents the history of Great Britain, divided into interactive data blocks. The density of events is displayed on the map.

Winning Lotto Numbers is supposed to present the frequency of appearance of every number from one year to the next one. This graph is definitely not one of the most clear ones.

Elastic Lists demonstrates the elastic list principle for browsing multi-facetted data structures. You can click any number of list entries to query the database for a combination of the selected attributes. The approach visualizes relative proportions (weights) ofmetadata by size and visuzalizes characteristicness of a metadata weight by brightness. Authors blog regularly informs about new experiments in the area of data visualization. Nice to observe, useful to bookmark.

The JFK Assassination TimelineAn Ajax-based approach vor visual presentation of historical events. John F. Kennedy assassination as timeline with numerous presentation options. The related article with further examples. 4. Displaying connections Munterbund showcases the results of research graphical visualization of text similarities in essays in a book. The challenge is to find forms of graphical and/or typographical representation of the essays that are both appealing and informative. We have attempted create a system which automatically generates graphics according to predefined rules.

Burst Labs suggests similar or connected items to your search queries (favourite artists, tv shows, movies, genres etc.) in a bubble. Not really new, but still inspiring.

Universe DayLife displays events, connections and news as circles which gravitate around the topic they are related to.

Musiclens gives music recommendations and presents your current mood and musical taste as a diagram.

Figdt Visualizer allows you to play around with your network. You interface with the Visualizer through Flickr and LastFM tags, using any tag to create a Magnet. Once a Tag Magnet is created, members of the network will gravitate towards it if they have photos or music with that same Tag. Available for Mac OS X, Windows and Linux. Alpha-version.

What have I been listening to?: Lee Byron describes his approach of creating a histogram about his music listening history.

Shape Of Song: What does music look like? The Shape of Song is an attempt to answer this seemingly paradoxical question. The custom software in this work draws musical patterns in the form of translucent arches, allowing viewers to see literally the shape of any composition available on the Web.

Musicmap: connections are represented as connected lines; they create a web.

Musicovery displays music taste connections and lets you listen to the song and browse through similar songs.

Lanuage Poster proves that even simple lines can be descriptive enough. The History of Programming Languages as an original timeline.

5. Displaying web-sites Spacetime offers Google, Yahoo, Flickr, eBay and images in 3D. The tool displays all of your search results in an easy to view elegant 3D arrangement. Company promises that the days of mining through pages and pages of tiny thumbnails in an effort to find the item you are looking for are over.

UBrowser is an open source test mule that renders interactive web pages onto geometry using OpenGL and an embedded instance of Gecko, the Mozilla rendering engine.

6. Articles & Resources

Visualcomplexity.com

The project presents the most beautiful methods of data visualization as well as further references and book suggestions. The gallery has over 450 entries. In his article Infosthetics: the beauty of data visualization Andrew Vande Moere, well-known through his blog Infosthetics, discusses the aesthetics of data visualization and modern apparoaches in this area. Creative design ideas combine form and content and generate fascinating graphs is it a new area in the art of next

generation?

The article presents 13 new techniques of data visualization, with examples and further references. 16 Awesome Data Visualization Tools From navigating the Web in entirely new ways to seeing where in the world twitters are coming from, data visualization tools are changing the way we view content. We found the following 16 apps both visually stunning and delightfully useful. An extensive overview by Mashable.com. Dataesthetics Eric Blue provides some references to unusual Data Visualization methods. infosthetics information aesthetics

Andrew Vande Moere about data visualization, latest development and design ideas. Visualizing Delicious Roundup An overview of Del.icio.us tools you can use to visualize your bookmarks.

Periodic Table A periodic table of visualization methods.

7. Tools and Services


You can create your own timelines with Xtimeline and Circavie. IBM Many Eyes

This Java-based service visualizes data online and helps to create pie charts, diagrams, tree maps, bar charts and histograms. Registration is required. Some examples are simply amazing. prefuse | the prefuse visualization toolkit Presents the beta-version of a Java-based toolkit for programming of application with integrated data visualization methods Swivel This service creates pie charts, diagrams and histograms on the fly. It also provides a Swivel API you can use to improve already existing visualization methods. You can find even more tools for designing your own diagrams and charts online in our article Charts and Diagrams Tools.

Sites Dedicated to Visualization

IBM's Many Eyes (our coverage) is a shared visualization and discovery service offering all kinds of visualizations you can explore or create.

Informationarchitects.jp presents the 200 most successful websites on the web, ordered by category, proximity, success, popularity and perspective in a mindmap. VisualComplexity.com is an online collection of visualizations (our coverage) Infosthetics discusses the aesthetics of data visualization Blogger Anonymous Professor is into visualization, offering visualizations like the 3D visualization/tour of classical music/composers, Visualization of the StumbleUpon network, the value of a Digg and more. Zip Codes visualized

Many Eyes Search Heatmaps: Heatmaps site CrazyEgg applies heatmaps to tracking what visitors do on a user's website. Their software captures user clicks on each page and then presents a summary in the form of a heatmap. Other heatmap sites include Feng-GUI and FuseStats. Summize applies heatmaps to shopping via their search engine(our coverage here, here and here). Visualizing the Power Struggle in Wikipedia displays the most popular articles and the most frequent search queries in the heatmap. Visual Search Engines:

Riya's Like.com: first true visual search engine does visual search for shopping. Searchme: upcoming visual search for the web Xcavator: A photo search engine which utilizes visual clues that you provide to identify and extract similar pictures from large groups of digital images. ManagedQ: A visual search experiment with some built-in semantics. (our coverage) oSkope: Visual search engine for finding products that searches Amazon, Ebay, Flickr, Fotolia, Yahoo!Image Search and YouTube. Quintura: visual search engine that uses clouds, tags, and highlighting. Tafiti: Microsoft's experimental visual search engine running on Silverlight. Retrievr is an experimental service which lets you search and explore in a selection of Flickr images by drawing a rough sketch. Mooter: Visual search engine that organizes results In clusters. KartOO: visual web searc. SearchCrystal is a search visualization tool that let you compare, remix and share results from sources on the web, whether sites, images, videos, blogs, news engines or RSS feeds. (see also KoolTorch) Spacetime: search Google, YouTube, RSS, eBay, Amazon, Yahoo!, Flickr and images all in one 3D space. grokker: web search or enterprise search offering map views of data. Burst Labs suggests similar or connected items to your search queries in a bubble UBrowser renders interactive web pages onto geometry using OpenGL and an embedded instance of Gecko walk2web - enter a URL, then visually browse web sites linked from it

TouchGraph's Amazon Browser, Google Browser, and LiveJournal Browser mapping systems and remote sensors and generates a visualization that shows where nitrates concentrate in soil and how different modes of fertilizer deliverycoupled with variables such as precipitationaffect the rates and locations of groundwater pollution. Faculty and researchers in a wide range of academic disciplines use visualizations to present data in ways that help generate new knowledge and understanding.

How does it work?


Visualizations encompass a wide and growing range of projects, reflecting creative ways of representing all sorts of data visually, with virtually no limit to what kind of information can be translated into an image. Visualizations have been created that represent the possible moves on a chess board, the structure of a piece of music, and the messages in a persons e-mail inbox, to name a few. The designer of a visualization determines which visual element (color, shape, size, motion, and so forth) will represent individual data points. Images can be 2D or 3D, can be fixed or dynamic, and can allow user interaction. One application, for example, shows political contributions to various candidates. You can select a state, a political race in that state, and a monetary threshold for contributions. The application builds a 2D image that shows who supported each candidate and at what level (based on the relative sizes of the circles that represent contributors), revealing interesting webs of political influence. Astrophysicists use visualizations to create 3D images that model the forces of a supernova. In each case, images or animations are the products of applications that render data in a visual form based on the design of the visualization. Why is it significant? Computer systems generate and store massive and growing amounts of data. At the same time, advanced networks, distributed processing, and other developments allow unprecedented access to data. Data visualizations offer one way to harness this infrastructure to find trends and correlations that can lead to important discoveries. Representing large amounts of disparate information in a visual form often allows you to see patterns that would otherwise be buried in vast, unconnected data sets. As opposed to the traditional hypothesis-and-test method of inquiry, which relies on asking the right questions, data visualizations bring themes and ideas to the surface, where they can be easily discerned. Visualizations allow you to understand and process enormous amounts of information quickly because it is all represented in a single image or animation. Moreover, virtually any kind of data from a broad range of academic disciplines can be represented visually, making data visualization a potentially valuable approach to learning for a large number of students and researchers.

What are the downsides?


Visualizations rely on accurate and matched data. If data are incomplete or faulty, or if data sets use different definitions or units, these issues must be resolved in order to create a valid visualization, and this can be time-consuming. Even if the data are reliable and consistent, a poorly conceived visualization might show nothing of consequence or exaggerate the significance of certain trends, resulting in flawed or misleading conclusions. In some cases, a lot of time and trouble go into a visualization that adds nothing to an understanding of the data that you wouldnt find in a simple table or even a textual description. Finally, users who prefer conventional ways of learning and processing information might be uncomfortable working with data visualizations, which require a different approach to understanding data.