Vous êtes sur la page 1sur 3

As an example in an industrial context, suppose

there is concern about the instruments used for


data collection on the factory oor and that the
gauges used on the production line are checked
against a more sophisticated gauge which is stored
somewhere to preserve it from damage. Two
checks which would sensibly be made on a
gauge are for repeatability and reproducibility.
Repeatability refers to a gauge giving consistent
readings for a single operator and reproducibility
refers to it giving consistent values for dierent
operators. In the real world, the eectiveness of

the statistician will often depend on his, or her,


relationships with those who require help, or from
whom help is required.
Finally, it just remains to make a quick reference
to the need to use as much real data in teaching as
possible, so that students learn about the relevance
of statistics to the real world, and to point out the
rightful use of statistics, so that it is not rubbished
by the rogues who misuse the techniques and give
statistics a bad name!

Amazing Graphs
KEYWORDS:
Teaching;
Displaying data;
Visualizing information;
Damped oscillation;
Statistical literacy.

^ INTRODUCTION ^

EARNING, at least at the descriptive level,


how to understand the message from the
picture should be easy. Indeed, we all have an
enormous amount of practice from being exposed
in daily life to a multitude of eye-catching displays.
This article is about one particular graph which
takes some bad habits to an absurd extreme. It
might serve as an example of how to catch
students' attention and make them look critically
at any data display they may encounter in future
work.
From elementary to elaborate, graphs show up
everywhere, and several are far from optimal. It
is amazing how many recent graphical displays
comply with the rules given in H. Wainer's paper
(Wainer 1984) (note: it might be wise to rst read

Herman Callaert

Limburgs Universitair Centrum, Diepenbeek,


Belgium.
e-mail: herman.callaert@luc.ac.be
Summary
A cautionary tale about bad habits in graphical
displays.

the title of the paper before proudly announcing


that at least your graph meets Wainer's rules).
Teaching students to look at graphs with a
critical eye will help them not only in the interpretation but also in the construction of data
displays.

^ AN INTRIGUING GRAPH ^
Labour costs can be analysed from dierent
perspectives. One of several measures of interest
is `gross earnings'. Data for this variable are
available for manual workers in industry, and
they are summarized into one number per country
in the European Union. The study refers to 1993,
which explains why the earnings are expressed
in ECU. There are 16 data points because, for
Germany, the `Old Lnder' and `New Lnder'
Teaching Statistics.

Volume 22, Number 1, Spring 2000 .

25

are treated as separate entities for the purpose


of the study. The associated graph is shown in
gure 1.

Fig 1. Gross hourly earnings of manual workers in industry


in 1993 (ECU) for the 15 states of the European Union
(Germany has been given two entries: D1 for the `Old
Lnder' and D2 for the `New Lnder'). The abbreviations
for the member states of the European Union are:
B
DK
D
GR
E

Belgium
Denmark
Germany
Greece
Spain

F
IRL
I
L
NL

France
Ireland
Italy
Luxembourg
The Netherlands

A
P
FIN
S
UK

Austria
Portugal
Finland
Sweden
United Kingdom.

^ A SHORT ACTIVITY ^
Using an overhead projector, I showed this graph
to a group of students. I mentioned the full
country name for all the abbreviations on the
horizontal axis. I also drew their attention to the
fact that the countries appear in alphabetical
order of their name in their native language. This
explains why Finland (which is Suomi in Finnish)
has its place between Portugal (Republica
Portuguesa) and Sweden (Konungariket Sverige).
I then asked them to focus on Finland and on
countries similar to Finland as far as gross
earnings is concerned. Shortly after that, I took
the graph away, and then asked the students to
write down the four `nearest neighbours' of
Finland. I also urged them to mention any other
particular aspect they had seen in the graph
(such as clusters of countries with exceptionally
low or exceptionally high earnings). The result of
this activity was rather disappointing, and the
graph didn't seem helpful for discovering
information about earnings in the European Union.
When I later repeated the same experiment with
another group of students and with the graph from
gure 3, the students' answers were substantially
more accurate.
26

. Teaching Statistics.

Volume 22, Number 1, Spring 2000

^ TAKING IT TO THE LIMIT ^


Figure 1 is an example of a (not uncommon) type
of graph, suitable for major improvement. The
data set consists of 16 points, each carrying information on gross earnings (a continuous variable)
and on country (a nominal variable). Connected
line segments assume continuity in the variable
plotted on the horizontal axis. This is certainly
not the case for the countries. Figure 1 shows, for
example, a sharp increase on the line segment
connecting Portugal with Finland. But there is no
reason why naming (and ordering) countries by
their English name would make the representation
less (or more) informative. English names would
put Finland between Denmark and France, on a
`downhill' environment, rather than on an `uphill'
one. If, after all, there is no natural ordering in
country names (they are just labels, identifying
the country), any ordering is as sensible as any
other one. Why then not construct a graph we
all know well and whose shape is easily
remembered?
The original gure 1 is only one way of displaying
the information in the 16 data points. Without
changing the type of display, one can choose
among the 20,922,789,888,000 equivalent alternatives, so why not go for a damped oscillation as
shown in gure 2? Apart from the sharp edges,
8 6e0:17x sin2:58x 1:6 would make a good
t.

Fig 2. The same information displayed with the same type


of graph, after a particular reordering of the countries.

In my experience, once students have discovered


the equivalence of gures 1 and 2, and the nonsense of both, many start hunting for more `damped
oscillation situations' in newspapers, magazines
and textbooks. They also don't want to be trapped
by the `damped oscillation' when constructing
graphical displays of their own.

^ ONE MORE PICTURE ^


If it is felt that the information in the 16 data
points needs a graphical representation, something can be learned from Cleveland's excellent
book (Cleveland 1994). He explains how connected line segments make us visually decode
information about relative local rate of change.
His pictures on pages 246^7 show that, for our
data, a dot plot might be the graphical representation of choice. They also show that the
`Austria rst' alphabetical principle (one of
Wainer's rules on how to display data badly) is
able to degrade substantially our visual decoding
of the distribution of earnings.
In gure 3 the quantitative information is ordered
from smallest to largest. Clusters of countries with
low (or high) earnings are easily located. Finding
Finland's `close neighbours' is easy now.

^ SOURCE ^
The graph of gure 1 can be found on page 197
of the fourth edition of Europe in Figures (1995),
published by Eurostat. The book is advertised on
the Web as `A publication containing the essential
socio-economic information needed for a good
understanding of the European Union'. It is widely

Fig 3. The same information in order of increasing


earnings.

circulated, not expensive (15 Euro) and available


in nine dierent languages. Eurostat is the Statistical Oce of the European Union.
References
Cleveland, W.S. (1994). The Elements of
Graphing Data (revised edn). Summit,
NJ: Hobart Press.
Europe in Figures (4th edn) (1995). Luxembourg: Oce for Ocial Publications of the
European Union.
Wainer, H. (1984). How to display data badly.
The American Statistician, 38, 137^47.

Data Have No Meaning when Separated from their Context


KEYWORDS:
Teaching;
Variation;
Evidence;
Fraud.

^ INTRODUCTION ^

HE title of this article is sometimes referred


to as `The First Principle of Understanding
Data', attributed to Shewhart (see Wheeler 1993).

Jostein Lillestl

The Norwegian School of Economics and


Business Administration, Bergen, Norway.
e-mail: jostein.lillestol@nhh.no
Summary
It is argued that many statisticians teach their
subject without much concern about the context
in which their methods will be used. This issue is
illustrated by the analysis of data on cashier fraud
at a supermarket. Here dierent contexts lead to
dierent questions and dierent modes of analysis.
Many teachers of statistics largely neglect this
principle, and perhaps more so teachers who bring
their students into the realm of formal inference
theory. Teachers give their students exercises of
moderate practical relevance that t the theory of
estimation and hypothesis testing, or they invite
Teaching Statistics.

Volume 22, Number 1, Spring 2000 .

27

Vous aimerez peut-être aussi