Vous êtes sur la page 1sur 34

Data Visualization

By: Taggert J. Brooks

Representing Data Graphically


Data visualization, sometimes called information visualization - or infovis1 for short comes from the convergence of computer science, statistics and design. It is a marriage between science and art, between the left and right halves of the brain. The goal is to make data presentation interesting, aesthetically pleasing and hopefully informative. Good data visualization goes further by revealing relationships in the data that might otherwise have gone unnoticed. With the absence of hypothesis tests it is easy to discount visualization as unscientific, but that would be a mistake. There are many uses of data visualization, and the reality is hypothesis testing can bore the audience, if not completely surpass their level of understanding. Data visualization then is a means to an end for statisticians who want to be better communicators. And its a pathway to a better understanding of the data for the designers amongst us. "In our excitement to produce what we could only make before with great effort, many of us have lost sight of the real purpose of quantitative displays to provide the reader with important, meaningful, and useful insight." Stephen Few I would add that good visualization techniques will not only help the reader, but also help the producer of the visualization to discover meaningful insights This document is meant to be an introduction to different visualization techniques, and though I provide some practical how to, I do not provide everything. Where I fail, Google and the internet can fill in the gaps. Too Much Data The internet has led to an explosion in the amount of data we have collected, stored and easily accessible. It has done this through dramatically lowering the costs of those activities. The problem we now face is filtering the valuable data from the invaluable data and determining how we use it to inform business decisions or research. A recent example of the ubiquity of new data can be taken from the presidential election. We have data on the frequency of word searches in Google by each minute of the Vice Presidential debate between Senator Joe Biden and Governor Sarah Palin.2 Apparently people were trying to figure out exactly what a Maverick actually is. What type of media will you use to make your presentation? How long does your audience have to take in the data? The longer the audience has the more data dense the visualization can and should be. The less time and autonomy your audience has to peruse the data the more simplified the visualization should be.

A wiki dedicated to Infovis: http://www.infovis-wiki.net/index.php?title=Main_Page A graph of the searches can be found here http://www.readwriteweb.com/archives/google_has_changed_political_d.php

Data Visualization

By: Taggert J. Brooks

Will it be a written report, a power point presentation, or is the data going to be rendered on the web? In other words will the visualization be static or dynamic? These questions are some of the first you should answer when selecting a visualization method. Visualization is about Discovery, Discerning Patterns, and Disseminating Information. Below we have a nice info graphic describing the data collection to data use continuum.

Here is a good example of the effectiveness of visualization for identifying outliers, or data errors can be found below. This is derived from 3

http://www.visualizingeconomics.com/2009/07/12/data-scienist-data-geek-designer/

Data Visualization

By: Taggert J. Brooks

The picture above is a great way of using visualization to identify errant data. The underlying data in this case must be no more than 100%, yet we can see one mistaken observation.4 Selecting the Right Graph Design is choice. The theory of the visual display of quantitative information consists of principles that generate design options and that guide choices among options. The principles should not be applied rigidly or in a peevish spirit; they are not logically or mathematically certain; and it is better to violate any principle than to place graceless or inelegant marks on paper. Edward Tufte, The Visual Display of Quantitative Information Selecting the appropriate display can be difficult because it involves a good understanding of the nature of your data, statistics, as well as a good understanding of design principles. There are many possibilities for a given variable or dataset, but you need a place to start. There are a few web pages, which try to help, but none satisfy both the issues of statistics and design simultaneously.5 As the quote by Tufte suggests, the choice of design does not easily fit into a simple algorithm.

This is from the higher ed weblog http://blog.une.edu.au/robbi/2009/08/06/data-testing-usingvisualisation/


5

This webpage http://interface.fh-potsdam.de/infodesignpatterns/news.php is closer to the visual end while this webpage http://www.ncsu.edu/labwrite/res/gh/gh-graphtype.html does a better job of helping select the appropriate graph from a statistics perceptive and this one helps choose the right statistical test http://www.ats.ucla.edu/stat/stata/whatstat/default.htm, .

Data Visualization

By: Taggert J. Brooks

Some other examples of websites which try to provide guidance in the choice of appropriate representations can be found in the blog entry titled Things should be made as simple as possible, but not any simpler 6, which is a famous Einstein quote. 1. Determine the relationship you want to display 2. Determine if you want to emphasize individual values or the overall pattern 3. Determine the chart type Bad charts Before we begin discussing some of the common, and not so common visualizations it might be better to provide some links to bad charts, and improvements. Stephen Few provides some excellent examples of bad charts and then provides recommendations for fixing the problems.7 Another set of examples is provided here.8 Many of these criticisms and corrections are based upon the rules and suggestions from the work of Edward Tufte. His rules can be found at his website.9

http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/ A follow up can be found here as well. http://blog.xlcubed.com/household-income-distribution-1967-2005-as-small-multiples-chart/. Still another example of a chart chooser can be found here: http://chartchooser.juiceanalytics.com/, which also produces Excel templates from your choices 7 http://www.perceptualedge.com/examples.php 8 http://lilt.ilstu.edu/jpda/charts/bad_charts1.htm 9 http://www.washington.edu/computing/training/560/zz-tufte.html

Data Visualization

By: Taggert J. Brooks

Seth Godin, the famed marketer has rules for making good graphs10. Graph Types Microsoft Excel is a common tool for creating graphic representations, but sadly their default choices are often not good design choices. And many of the default graphs they provide should never be used. While Excel 2007 is much better than the horrible defaults in Excel 2003, they both can benefit from some alterations. For some details on altering the charts after excel has created one using the default templates see the link below.1112 Some traditional graphical means of data representation, which can be found under the INSERT ribbon in Excel 2007: Pie chart The pie chart is useful for representing the relative proportions of a few categories. The more categories, the greater the number of slices, the more difficult the chart is to read.

The field of info visualization is rather new, and like any new field there are often very impassioned people in the field with starkly different opinions. For some their beliefs are almost religious, and the rules they profess delivered with the same vigor as a Baptist Minister delivering a sermon from the pulpit. An example of this occurred in the blogophere when marketing guru Seth Godin suggested there should be no more bar charts, only pie charts. This led to a swift reply from the community of InfoVis folks, many of who countered with the exact opposite advice. Remember the quote from Tufte above, the reality is always somewhere in between, born of the exercise of good judgment. The problem with pie charts as infovis people will tell you - is that consumers of visualizations have a hard time estimating angles. In fact, they get them wrong, thus drawing the wrong inference from the slices of a pie chart. People are better at visually judging height, which is why many infovis people prefer the column chart.13 The visual hierarchy of Cleveland is provided at this website.14

10 11

http://sethgodin.typepad.com/seths_blog/2009/07/how-to-make-graphs-that-work.html How to alter the defaults in Excel: http://blog.xlcubed.com/defaults-in-excel-charting/ 12 http://www.juiceanalytics.com/writing/fixing-excel-charts/ 13 http://seedmagazine.com/content/article/getting_past_the_pie_chart/ 14 http://www.processtrends.com/TOC_data_visualization.htm

Data Visualization

By: Taggert J. Brooks

15

Bar and Column Charts Bar charts are often good for representing categorical data. You can present the frequency of responses in each category, or the relative frequency.16 You can also present the frequency or relative frequency of one variable, over the groups or categories of another variable. Making it an excellent choice when you have two categorical variables.
100 50 0 1 2 3 4 5 6

5 3 1 0 50 100

4 1 0 100 200

Column chart

Bar Chart

Stacked Bar Chart

Here is a recent bar chart I used to highlight US Debt to GDP ratio. Notice the use of the single red bar to draw attention to the US relative to the rest of the OECD. Imagine how ugly this would look, and how confusing if I used a different color for every country? How would this look if I used the same color for every country? Obviously this works in color, would it work in grayscale?

15
16

http://peltiertech.com/WordPress/pie-chart-for-pi-day/ Most of the charts in this article were produced in Microsoft Excel 2007, unless otherwise noted. They were copied into Word 2007 using the pastepaste specialMicrosoft Excel object function.

Data Visualization
0 Japan Greece Italy Belgium Portugal Hungary United Kingdom Austria France Netherlands Poland Iceland United States Turkey Germany Sweden Spain Denmark Finland Korea Canada Ireland Czech Republic Slovak Republic Mexico Switzerland New Zealand Norway Luxembourg Australia 2008 Debt to GDP Ratio for OECD 20 40 60 80 100 120 140

By: Taggert J. Brooks


160 180

Line Graph The traditional line graph is generally used to measure a single variable (usually continuous) over time, with time being represented on the horizontal axis. Though it could be used to measure the relative frequency of a single response category over time as well.
100 50 0 1 2 3 4 5 6 7 8 9 10

Data Visualization

By: Taggert J. Brooks

U.S. Payroll Employment: Total Nonagricultural: SA, Thousands of Persons


142.0 137.0 132.0 127.0 122.0 117.0 112.0 107.0 Jan-90 Jan-92 Jan-94 Jan-96 Jan-98 Jan-00 Jan-02 Jan-04 Jan-06 Jan-08

A few quick notes about the above graph. Ive removed the horizontal gridlines as they were an example of ink with no purpose. The background fill of the chart area has been changed to white. I added shaded bars to denote recessions. If I were to improve this further, I would probably reduce the number of labels on the horizontal axis, say maybe every 36 months, rather than 24. Id also probably also reduce the number of labels on the vertical axis as it currently feels a bit cluttered. Finally I might eliminate the title altogether and make a very small footnote that contained the same information. Or maybe just title the chart Employment and relegate the details to the footnote. Area Chart An area chart is a line chart with the area below the line shaded. This can be useful when you have two lines over time and one line represents a subset of the first. For example, you could have retail sales over time broken into two categories, durable and non-durable goods.
100 50 0 1 2 3 4 5 6 7 8 9 10
200 180 160 140 120 100 80 60 40 20 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

Scatter Plot Scatter plots are useful when you have two continuous variables with one represented by the X axis and the other on the Y axis. A third variable can be used to measure another attribute of the points, yielding a bubble chart, which will be discussed later.

Data Visualization
100 50 0 0 50 100

By: Taggert J. Brooks

Tables We should not always rush to make a chart, sometimes just presenting the numbers in tabular form is sufficient to get your point across, or maybe you blend both? Below are two examples using the conditional formatting in Excel 2007, which blends the graphic design of a chart with the data in tabular form.17
Leisure Time Spent biking 125 hiking 40 reading 30 singing 25 dancing 10 cleaning 5 Leisure Time Spent biking 125 hiking 40 reading 30 singing 25 dancing 10 cleaning 5

Whenever presenting data like this it is useful to rank order the data from largest to smallest. Failure to do so makes it a bit harder for the reader to sift through the data as you can see from the example below.
Leisure Time Spent biking 125 hiking 5 reading 50 singing 75 dancing 10 cleaning 80
Leisure Time Spent biking 125 hiking 5 reading 50 singing 75 dancing 10 cleaning 80

A simple way to quickly deemphasize the numbers is to change the font of the numbers to white.
Leisure Time Spent biking hiking reading singing dancing cleaning
17

125 40 30 25 10 5

In the Home Ribbon select conditional formatting data bars

Data Visualization

By: Taggert J. Brooks

The one very unfortunate issue with this technique is that Microsoft Excel violates an important statistical and visualization principle with their bars. Zero values should be represented by the absence of any color, bar or indicator. Yet, no matter how small the lowest quantity in the range of cells the bar appears to be about 5%, even if the value is zero, as can be seen in the example below.18
Leisure Time Spent biking 125 hiking 40 reading 30 singing 25 dancing 10 cleaning 0

Spark Lines Sparklines are small inline line graphs developed by Edward Tufte19. GDP [5.8%]20 GDP [5.8%] Notice how simple the sparkline is. We have removed the clutter of the Y and X axis labels. Yet the important information is still there, you see the relative values, clearly it is not currently at its highest value yet is higher than previous. Compare that to the more traditional graph below:

GDP
10 8 6 4 2 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

18

Thanks to the excellent juice analytics for making this point. http://www.juiceanalytics.com/writing/excel-2007-and-lie-factor/ 19 Edward Tuftes explanation of the theory and practice of sparklines http://www.edwardtufte.com/bboard/q-and-a-fetch-msg?msg_id=0001OR&topic_id=1 20 The sparkline was created with the free open source add in for Microsoft Excel, called TinyGraphs. It can be found here: http://www.spreadsheetml.com/products.html.

10

Data Visualization

By: Taggert J. Brooks

This representation clearly consumes more space, and invites the reader to linger on the chart, rather than the point you are trying to make about the data. However, this type of chart has its place. For example it might be a better representation if it is important for the reader to see that the highest value occurred in 1996, or that the lowest value was in 1995, or if you want them to easily see that GDP fluctuates between 2% and 6%. It is important to note that sparklines can be more than just line charts. They can be bar charts, pie charts, etc. Sparklines merely refers to what Edward Tufte calls Intense, Simple, Word-Sized Graphics. Sparklines are obviously not well suited for power point type presentation graphics, but are well suited for written reports, or the currently in vogue data dense business intelligence reports referred to as Dashboards.21 Bullet Graph The Bullet graph, due to Stephen Few, is another piece of dashboard graph.
22

There is also a google gadget api for use in google docs that will produce this23. Spine Plots / Mosaic Plots / Matrix Charts These are best used for categorical data. Notice that we have added another dimension to the data by making the width of the bar proportional to the fraction of cars in that category (domestic versus foreign). Thus taking the traditional bar chart and adding another level of data.

Made with Statas ado file spineplot-. Jon Peltier has a solution for Excel which he calls a Matrix Chart 24. It is available in statistical language R as well.25
For some examples see http://www.ozgrid.com/excel-add-ins/spark-maker-explained.htm The picture come from Stephen Fews Perceptual Edge here http://www.perceptualedge.com/blog/?p=375 23 http://dealerdiagnostics.com/blog/2008/09/the-ddr-bullet-graph-gadget/ 24 http://pubs.logicalexpressions.com/Pub0009/LPMArticle.asp?ID=508 25 http://ideas.repec.org/a/tsj/stataj/v8y2008i1p105-121.html
22 21

11

Data Visualization

By: Taggert J. Brooks

Heat maps Heatmaps are 2 dimensional maps where the color intensity represents the underlying data. The above table on the right can be thought of as a heatmap. The darker orange colors represent larger values. When choosing the different colors to use, designers rely on color theory. Colorbrewer is a useful website to make sure that viewers can clearly distinguish differences in your data.26 Choropleth Maps (Color Maps) Choropleth maps are a specific type of heat map where the two dimensional object is a geographical map. The map is then painted with color based upon the intensity of the underlying variable. Often darker colors represent larger values of the underlying variable. This is a great way to visually represent data that varies geographically. The example below was produced with Stata and comes from some foreclosure data I have by county. The data represents the number of foreclosure filings as a percentage of housing units in each county for 2007 and the darker the shading of the county the higher the rate of foreclosures filings in that county. Juneau County sticks out as the obvious county with the highest rate of foreclosures filings.

A similar graph for the state of Wisconsin is below. Note that the shading has changed relative to the previous graph and is now based upon different intervals.

26

The website can be found here: http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_intro.html

12

Data Visualization

By: Taggert J. Brooks

While I used a statistics program (Stata) to generate these graphics, there are many opportunities for producing your own choropleths on the web. Google Documents has added their own visualization tools, which include the ability to create choropleths for different countries.27 These maps and the presentation of this data geographically intersect with a rapidly growing field and use of Geographic Information Systems (GIS) in economic geography. Can you imagine the marketing uses for this type of information? There are of course problems with these types of maps as well. They can mislead a viewer. The geographic area may be completely unrelated to the area at risk. For example, if the map represents foreclosure rates as these do you might think Juneau County represents a large economic problem for the region. However, the reality is that the population of Juneau is quite small relative to La Crosse, and while the foreclosure rate might be high, the total number of foreclosures is still quite small, because there are fewer houses in that county relative to some of the other counties. The fundamental problem is that the graphic invites you to infer economic importance in proportion to geographic size, when this is not true. One solution is to distort the geographic area based instead on the metric of interest. Cartograms (Distorted Maps) Another example of using colors and maps comes from the following distorted maps, where the distortion is based upon some underlying variable, in this case alcohol consumption. Here the color only serves to demarcate the different countries. Rather than color intensity conveying the values of the underlying variable we the creators have

Details on producing these maps can be found here http://documents.google.com/support/spreadsheets/bin/answer.py?answer=91599 And here http://googlesystem.blogspot.com/2008/02/data-visualization-google-gadgets.html

27

13

Data Visualization

By: Taggert J. Brooks

distorted the size of the country proportionally to their alcohol consumption. There are some people who feel cartograms hide more than they reveal.28 Alcohol Consumption (2001)29

Another example of a cartogram comes from the recent election.30Below is a reinterpretation of the simplistic red/blue map you might have seen on TV or in the newspaper. Now the colors are shaded based upon the vote, rather than simply one color for each party based upon the majority vote in that state. The states are also distorted by the number of votes cast in that state.

Compare that to the traditional depiction:


http://flowingdata.com/2008/11/13/alternative-to-cartograms-using-transparency/ The distorted maps presented here come from the following article http://www.dailymail.co.uk/news/article-439315/How-world-really-shapes-up.html. Producing the distorted cartograms involves a substantial knowledge of programming, graph theory. 30 http://www-personal.umich.edu/~mejn/election/2008/
29 28

14

Data Visualization

By: Taggert J. Brooks

Treemaps Tree Maps are another type of heat map, well suited for hierarchical data. The classic example on the internet is the smartmoney.com map of the market31. Here the hierarchy from bottom up is as follows: start with individual stocks, they are group by company, which is represented by market capitalization (outstanding shares of that company times share price). Higher market capitalization for the firm, means a larger area for their box. This would be the initial box. Then companies are further grouped together into a larger box by industry. The small boxes are then colored based upon the percentage gain or lost on the day, with green representing gains and red representing losses. Visually it is very important to distinguish gains from losses by different colors. That was the major shortcoming with a recent NY Times32 heatmap.

31

Smartmoneys map of the market is updated with a 15 minute delay. The site is here: http://www.smartmoney.com/map-of-the-market/ 32 The graphic concerns the performance of the economy under different Presidents and it can be seen here http://www.nytimes.com/interactive/2008/10/18/business/20081019-metrics-graphic.html

15

Data Visualization

By: Taggert J. Brooks

A recent bad day on Wall Street is captured by the following33.

It is possible to produce tree maps of your own, whether through Microsoft Researchs excel add-in34 or the use of IBMs web software ManyEyes.35 There are several examples
33

These data come from http://www.uie.com/brainsparks/2008/09/30/seeing-red-smartmoneycoms-mapof-the-market/ 34 Microsoft provides an AddIn for Treemaps. http://www.gilsmethod.com/node/81

16

Data Visualization

By: Taggert J. Brooks

of data you may have which could be represented by a treemap. Lets say you are working on a project which is looking at students choice of major. The hierarchy from top down could be: CollegeMajornumber of students So the number of students determines the size of the box for each major. Then the majors are collected within the larger box of the college within which they are offered. The boxes could be colored by many different things, for example, lets say you were trying to get a sense of how many students change their major and what the change it to. You could then color the boxes by the percentage of the people in that major who have always had that major, or by the percentage that changed to that major within the last year. Another example could be looking at the time students spend in different activities. Lets say you ask them the average number of hours per week they spend doing several things, such as studying, going to class, reading, writing, etc. Again it would be possible for you to break these down. You could make the first level of boxes equal in size to the average percentage of time spent in the particular activity. The next level of boxes would involve grouping the activities into broader areas, say academic, versus non academic. Basically any data that can be grouped through some sort of hierarchy will make a good treemap. Some examples of brilliant dynamic web treemaps are provided by the New York Times article on changes in inflation36. The New York Times also uses treemaps in a recent graphic depicting the year of heavy losses on Wall Street37. Bump Charts Bump charts are a good way of showing changes in rank order. Below the The New York Times talks about the challenges which face the US and other countries on infant mortality.38 Where would you rather have an infant born? The US or Singapore? According to the chart Singapore. However, remember that this is measuring the number of deaths of infants (one year of age or younger) per 1000 live births. We are more likely than other countries to have successful preterm births, but this group is very much at risk for early death.

35

The service is available here http://services.alphaworks.ibm.com/manyeyes/page/Treemap_for_Comparisons.html 36 A look at recent inflation http://www.nytimes.com/interactive/2008/05/03/business/20080403_SPENDING_GRAPHIC.html?scp=1&s q=inflation%20chart&st=cse 37 http://www.nytimes.com/interactive/2008/09/15/business/20080916-treemap-graphic.html 38 http://www.nytimes.com/2009/04/07/health/07stat.html?ref=science

17

Data Visualization

By: Taggert J. Brooks

Word Clouds Word clouds are good for representing responses to open ended questions39. This is from the following question: Looking ahead, which would you say is more likely - that in the country as a whole we'll have continuous good times during the next five years or so, or that we will have periods of widespread unemployment or depression? A. Good times B. Widespread unemployment or depression C. Other, please specify The word cloud is comprised of the responses to the C. Other, please specify answer, I have removed the first two.
39

An easy to use web site http://wordle.net/ provides allows you to produce your own word clouds

18

Data Visualization

By: Taggert J. Brooks

There are problems with this type of presentation. First, since the responses to the other answer were actually short phrases, we dont really capture the full phrase, but rather the frequency of the words. As a demonstration of this problem lets say 10 people said good times and ten said bad times. Since the word times appears in both, it will be the most frequent response (appearing 20 times) and therefore the largest. But that doesnt tell us much about the sentiment being conveyed by the respondents.

19

Data Visualization

By: Taggert J. Brooks

This is solved below by tying all the words of a single response together with the tilde (~). Joining the words with a ~ like this (joined~words), allows Wordle to produce a phrase cloud, which is a great way of visualizing responses to questions with 5 or so categories, where a phrase represents each category. This is very easy to do in excel, just highlight the column, do a find and replace where you put a blank space in the find and a ~ (tilde) in the replace. Then copy and paste the text into Wordle. Done.

The other problem with this presentation is that it visually doesnt direct and steer the eye, while making the point. Your eye wanders all over the place. Using the question: When you think about the property taxes you or your landlord pay on the home in which you live and the services you receive for those taxes would you say property taxes in Wisconsin (or your state of residence) are much too high, somewhat too high, about right, somewhat too low or much too low? Answers that are joined are a. Much too high b. Somewhat too high c. About right d. Somewhat too low e. Much too low f. Other

20

Data Visualization

By: Taggert J. Brooks

One could easily list the words by frequency from greatest to least, but word clouds are popular because they are more than just data they are art. They invite the observer in, even if they get a little lost in the presentation. Sometimes efficiently conveying information is sacrificed for the visual esthetic of good design. An example where the art matters more than some of the underlying data40

This graphic comes from the website http://www.pitchinteractive.com/election2008/. More artistic visualizations can be found here: http://www.visualcomplexity.com/vc/ and Slate has an excellent collection of artistic visualizations here http://www.slate.com/id/2197749/

40

21

Data Visualization

By: Taggert J. Brooks

The edge of the doughnut lists the names of donors to the 2008 presidential campaigns. Clearly in this level of presentation you cannot read the names. However it still gets some ideas across, like the disproportionate amount of funds raised by Obama, relative to McCain. Bubble Charts Bubble charts allow you to present 3 variables in two dimensions. They are basically traditional XY scatter plots, where the size of the bubble is proportional to a third variable. In the case below the scatter plot represents the unemployment rate and foreclosure rate for each of the Wisconsin counties in the 7 rivers region, and the size of the bubble is proportional to the population of the county. It is a static presentation for one year, 2007.

7RiversRegion2007
8 7 6 5 4 3 2 1 0 0 0.002 0.004 0.006
Unemployment Rate

JacksonJuneau LaCrosse Monroe Trempealeau Vernon 0.008 0.01

Foreclosure Rate

Another example, which highlights the problem with too many colors competing for attention can be found below. In example A the mind gets lost, whereas example B does a good job of highlighting with context the data if the orange circle.41

41

http://charts.jorgecamoes.com/is-data-visualization-useful/

22

Data Visualization

By: Taggert J. Brooks

Dynamic bubble charts allow you to plot the above, for different years, and then you can watch the data change over the years. Ive produced some examples of the foreclosure data to give you another idea for presenting the data42. One of the best examples of dynamic bubble charts can be found at Gapminder.43 How would you insert them into presentations? In the past I have posted them to a webpage, and rendered them separately, or within powerpoint. Obviously this type of presentation is not possible (currently) in a written report. I imagine that technology is not far behind, as you could imagine Amazons kindle bridging the gap. These are beautiful graphic from the New York Times44, but they might be difficult for you to re-create, though they should get you thinking how data can be presented so graphically pleasing and at the same time informative. Presenting data in a written format requires different techniques than presenting the same data orally. You have more time in a written piece for the user to dig into the data, the graph/chart can be more complex as the NYtimes pieces are. In the case of a power point, keep it simple and active. A science meets art, as in the case of graphs and design. It is important to realize there will be differences. There is less likely to be an objective standard. Some arguments will be over design, and some over the content. Always ask yourself who your audience is, what the point of the graph is and if your design is in fact conveying what you want it to45. The following represents some important differences in preferences, but also important differences in terms of information presented. Some other tips can be found at the links46 Dynamic/Interactive Graphs These graphs can be dynamic in the sense that they are constantly updated and changing either due to the influx of new data or interactive manipulations by the viewer.

http://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure.htm and http://www.uwlax.edu/faculty/brooks/prof/charts/foreclosure-state.htm 43 http://googlegadgetsapi.blogspot.com/2008/06/spreadsheet-gadgets-free-dynamic-data.html http://code.google.com/apis/visualization/documentation/gadgetgallery.html 44 Movies. http://www.nytimes.com/interactive/2008/02/23/movies/20080223_REVENUE_GRAPHIC.html NY Times on spending http://www.nytimes.com/interactive/2008/09/04/business/20080907-metricsgraphic.html Drug admts http://www.nytimes.com/2008/06/14/opinion/14blow.html?_r=3&oref=slogin&oref=slogin&oref=slogin 45 http://sethgodin.typepad.com/seths_blog/2008/07/the-three-laws.html http://sethgodin.typepad.com/seths_blog/2008/07/bar-graphs-vs-p.html http://peltiertech.com/WordPress/2008/07/12/bar-graphs-vs-pie-charts/ http://www.perceptualedge.com/blog/?p=247 http://blog.xlcubed.com/chart-rules-as-simple-as-possible-but-not-any-simpler/ 46 http://www.macworld.com/article/134708/2008/07/chartsandgraphs.html?t=103 http://www.giantflightlessbirds.com/workshops/better_graphs.pdf some excel tips http://charts.jorgecamoes.com/category/how-to-and-tips/ http://services.alphaworks.ibm.com/manyeyes/app and another link http://www.decisionsciencenews.com/?p=475

42

23

Data Visualization

By: Taggert J. Brooks

Data Visualization in Seminars/Talks/Presentations. When the audience is in front of you rather than at home in front of their computer, you are responsible for grabbing their attention and keeping them awake. Here is an example of the principle of simplicity in the presentation of data in a lecture/talk/seminar. The chart below contains three values: The percentage of water in the body, the brain and the blood. Put yourself in the shoes of the audience if you saw this chart. Interesting? Mind numbing?

PercentWater
90 80 70 60 50 40 30 20 10 0 body brain blood

Now what if I presented these same three pieces of data in three different power point slides?

24

Data Visualization

By: Taggert J. Brooks

25

Data Visualization

By: Taggert J. Brooks

We could present the boring bar chart. Its simple, easy to understand, but not visually stimulating. It is more data dense than the three slides, yet I think you will agree the three slides would have a bigger impact in a presentation. They engage the audience visually in a way the bar chart does not, giving the data a bigger impact. The slides came from the award winning presentation entitled Thirst47. Another must see slide presentation entitled Death by Power Point48 is available at slideshare.com. Garr Reynolds also provides a good section of his book on Presentation Zen through his blog where he details the 4 principles of design: Contrast, Repetition, Alignment, and Proximity49. Contrast

47

Thirst won the 2008 award for the Worlds Best Presentation from Slideshare.com http://www.slideshare.net/jbrenman/thirst 48 Slideshare has several good presentations on how to present. Death by PowerPoint http://www.slideshare.net/thecroaker/death-by-powerpoint and Presenting With Text http://www.slideshare.net/girba/presenting-with-text 49 Part of Chapter 6 can be downloaded here http://www.presentationzen.com/chapter6_pages.pdf

26

Data Visualization

By: Taggert J. Brooks

Repetition

Alignment and Proximity

27

Data Visualization

By: Taggert J. Brooks

When thinking about PowerPoint design think about other technology. What do we love about Apple? Simple design. What do we love about Facebook? The design and interface is much cleaner than most MySpace pages, though sadly that is changing50. Google, redefined simple and clean, and I am convinced that it helped fuel their early success. Did I mention I think simplicity is important? Avoid all of the visual crap that Microsoft seems to think is important. Good presentations are about more than just good slide design. They are also about being a good speaker and telling a good story. How do you learn this? Watch a few great presentations. Pay attention to how they interact with the audience, how theyve
50

See this article http://www.readwriteweb.com/archives/is_facebook_becoming_myspace.php

28

Data Visualization

By: Taggert J. Brooks

organized their thoughts. A great presentation by Hans Rosling can be found in the link below51. In fact most of the TED talks are useful examples of good succinct presentations5253. Some general principles of slide design by Garr Reynolds at Presentation Zen can be found at the link54. He makes the important point that slides should have a high signal to noise ratio55. Nancy Duarte of Duarte Design, responsible for designing some of the best TED talks and Al Gores An Inconvenient Truth provides a wonderful webinar on using powerpoint56. Nancy also has an excellent book entitled Slide:ology.57 A link to some insights on the presentations of Steve Jobs58. And please no bullet points59.

51 52

http://www.youtube.com/watch?v=hVimVzgtD6w http://www.ted.com/ 53 Additional notes on good presentation organization can be found here: http://www.extremepresentation.com/ 54 http://www.presentationzen.com/presentationzen/2008/08/learning-from-the-design-around-youikea.html 55 http://www.presentationzen.com/presentationzen/2007/03/a_few_weeks_ago.html 56 http://www.vizthink.com/blog/2008/06/18/webinar-creating-powerful-presentations-with-nancy-duarte/ 57 http://www.amazon.com/slide-ology-Science-CreatingPresentations/dp/0596522347/ref=pd_bbs_sr_1?ie=UTF8&s=books&qid=1238982954&sr=8-1 58 http://images.businessweek.com/ss/09/09/0929_jobs_presentations/1.htm 59 http://aralbalkan.com/1286

29

Data Visualization

By: Taggert J. Brooks

Finally, lest you think there is no fun in data visualization, here are some funny graphs60. Some Dos and donts I hate to give you a list of things to do and things not to do because as with any rules, there are times when they should be broken. However, by giving you some rules, you might make sure and only break them when you have good reason to. Dont Use 3-D graphics in excel Use Microsoft clip art Use a powerpoint design template Read your presentation Use bullet points References and Endnotes Some useful links to data visualization blogs and leading thinkers in the infoviz world.: http://junkcharts.typepad.com/ http://www.visualcomplexity.com/vc/ http://www.edwardtufte.com/tufte/ http://www.perceptualedge.com/ http://infoclarity.blogspot.com/ http://eagereyes.org/ http://charts.jorgecamoes.com/ http://visualizeit.wordpress.com/ http://www.visualizingeconomics.com http://www.juiceanalytics.com/writing/ Presentation Related Blogs http://blog.duarte.com/ http://www.presentationzen.com/presentationzen/ Do Use Pictures Use repetition in your design Practice/rehearse presentation Keep each slide to one idea

60

http://graphjam.com/

30

Data Visualization

By: Taggert J. Brooks

Duarte, N. (2008). Slide:ology: The Art and Science of Creating Great Presentations: O'Reilly. Few, S. (2004). Show Me the Numbers: Designing Tables and Graphs to Enlighten (1st ed.). Oakland, CA: Analytics Press. Few, S. (2006). Information Dashboard Design: The Effective Visual Communication of Data (1st ed.). Beijing ; Cambride [MA]: O'Reilly. Reynolds, G. (2008). Presentation Zen: Simple Ideas on Presentation Design and Delivery. Berkeley, CA: New Riders. Tufte, E. R. (2001). The Visual Display of Quantitative Information (2nd ed.). Cheshire, Conn.: Graphics Press. Tufte, E. R. (2003). The Cognitive Style of PowerPoint. Cheshire, Conn.: Graphics Press. Tufte, E. R. (2003). Envisioning Information (9th printing, Aug. 2003. ed.). Cheshire, Conn.: Graphics Press. Tufte, E. R. (2006). Beautiful Evidence. Cheshire, Conn.: Graphics Press. Tufte, E. R. (2007). Visual Explanations: Images and Quantities, Evidence and Narrative (8th printing, with revisions, June. 2007. ed.). Cheshire, Conn.: Graphics Press.

31

Data Visualization Appendix: TIPS for Excel 2007 How to change the axis of a chart to the logarithmic scale.

By: Taggert J. Brooks

From http://office.microsoft.com/en-us/excel/HP030656791033.aspx Make changes to the scales of value axes 1. On a chart sheet or in an embedded chart, click the value (y) axis that you want to change. 2. On the Format menu, click Selected Axis. 3. On the Scale tab, do one of the following:

To change the number at which the value (y) axis starts and ends, type a different number in the Minimum box or the Maximum box. To change the interval of tick marks and gridlines, type a different number in the Major unit box or Minor unit box. To change the units displayed on the value (y) axis, click the units that you want or type a numeric value in the Display units list. To show a label that describes the units expressed, select the Show display units label on chart check box. Tip If your chart values consist of large numbers, you can make the axis text shorter and more readable by changing the display unit of the axis. For example, if the chart values range from 1,000,000 to 50,000,000, you can display the numbers as 1 to 50 on the axis and show a label that indicates that the units express millions.

To change the value (y) axis to logarithmic, select the Logarithmic scale check box. To reverse values so that you can flip bars or columns or other data markers, select the Values in reverse order check box.

32

Data Visualization
How to use the Histogram add-in in Excel

By: Taggert J. Brooks

http://support.microsoft.com/kb/214269
SUMMARY
This step-by-step article describes how to create a histogram with a chart from a sample set of data. The Analysis ToolPak that is included with Microsoft Excel includes a Histogram tool.

Back to the top

Verify Installation of the Analysis ToolPak


Before you use the Histogram tool, you need to make sure the Analysis ToolPak Add-in is installed. To verify whether the Analysis ToolPak is installed, follow these steps:

1. In Microsoft Office Excel 2003 and in earlier versions of Excel, click Add-Ins on the Tools menu. In Microsoft Office Excel 2007, follow these steps: a. Click the Microsoft Office Button, and then click Excel Options. b. Click the Add-Ins category. c. In the Manage list, select Excel Add-ins, and then click Go. 2. In the Add-Ins dialog box, make sure that the Analysis ToolPak check box under Add-Ins available is selected. ClickOK.
NOTE: In order for the Analysis ToolPak to be shown in the Add-Ins dialog box, it must be installed on your computer. If you do not see Analysis ToolPak in the Add-Ins dialog box, run Microsoft Excel Setup and add this component to the list of installed items.

Back to the top

Create a Histogram

1. Type the following in a new worksheet:

A1: 87 A2: 27 A3: 45 A4: 62 A5: 3 A6: 52 A7: 20 A8: 43 A9: 74 A10: 61

B1: 20 B2: 40 B3: 60 B4: 80 B5: B6: B7: B8: B9: B10:

2. In Excel 2003 and in earlier versions of Excel, click Data Analysis on the Tools menu. In Excel 2007, click Data Analysis in the Analysis group on the the Data tab. 3. In the Data Analysis dialog box, click Histogram, and then click OK. 4. In the Input Range box, type A1:A10. 5. In the Bin Range box, type B1:B4. 6. Under Output Options, click New Workbook, select the Chart Output check box, and then click OK.
A new workbook with a Histogram table and an embedded chart is generated.

33

Data Visualization
Based on the sample data from step 1, the Histogram table will look like the following table:

By: Taggert J. Brooks

A1: Bin A2: 20 A3: 40 A4: 60 A5: 80 A6: More

B1: Frequency B2: B3: B4: B5: B6: 2 1 3 3 1

And, your chart will be a column chart that reflects the data in this Histogram table. Excel counts the number of data points in each data bin. A data point is included in a particular data bin if the number is greater than the lowest bound and equal to or less than the greater bound for the data bin. In the example here, the bin that corresponds to data values from 0 to 20 contains two data points, 3 and 20. If you omit the bin range, Excel creates a set of evenly distributed bins between the data's minimum and maximum values. NOTE: You will not be able to create the Histogram chart if you specify the options (Output range or New worksheet ply) that create the Histogram table in the same workbook as your data.

34

Vous aimerez peut-être aussi