Vous êtes sur la page 1sur 21

This article was downloaded by:[Carr, Nathan T.

]
On: 1 February 2008
Access Details: [subscription number 790350697]
Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Language Assessment Quarterly


Publication details, including instructions for authors and subscription information:
http://www.informaworld.com/smpp/title~content=t775653669

Using Microsoft Excel to Calculate Descriptive


Statistics and Create Graphs
Nathan T. Carr a
a
California State University, Fullerton, USA

Online Publication Date: 01 January 2008


To cite this Article: Carr, Nathan T. (2008) 'Using Microsoft Excel to Calculate
Descriptive Statistics and Create Graphs', Language Assessment Quarterly, 5:1, 43
- 62
To link to this article: DOI: 10.1080/15434300701776336
URL: http://dx.doi.org/10.1080/15434300701776336

PLEASE SCROLL DOWN FOR ARTICLE


Full terms and conditions of use: http://www.informaworld.com/terms-and-conditions-of-access.pdf
This article maybe used for research, teaching and private study purposes. Any substantial or systematic reproduction,
re-distribution, re-selling, loan or sub-licensing, systematic supply or distribution in any form to anyone is expressly
forbidden.
The publisher does not give any warranty express or implied or make any representation that the contents will be
complete or accurate or up to date. The accuracy of any instructions, formulae and drug doses should be
independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings,
demand or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or
arising out of the use of this material.

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

LANGUAGE ASSESSMENT QUARTERLY, 5(1), 4362, 2008


Copyright Taylor & Francis Group, LLC
ISSN: 1543-4303 print / 1543-4311 online
DOI: 10.1080/15434300701776336

Using Microsoft Excel to Calculate


Descriptive Statistics and Create Graphs

1543-4311Assessment
1543-4303
HLAQ
Language
Assessment,Quarterly
Vol. 5, No. 1, Dec 2008: pp. 00

Descriptive Statistics, Graphs, and Excel


Carr

Nathan T. Carr
California State University, Fullerton

Descriptive statistics and appropriate visual representations of scores are important for all test developers, whether they are experienced testers working on
large-scale projects, or novices working on small-scale local tests. Many teachers
put in charge of testing projects do not know why they are important, however,
and are utterly convinced that they lack the mathematical ability to address the
matter anyway. This article begins by explaining why descriptives need to be calculated in the first place, and then discusses ways in which to display data visually, and how to do this using Microsoft Excel spreadsheet software. The article
then addresses three types of descriptive statistics: measures of central tendency,
measures of dispersion, and indicators of the shape of the distribution. It discusses some basic points about interpreting them and then provides simple
instructions for calculating them in Excel. The article assumes that readers are not
particularly familiar with Excel and does not assume a high level of mathematical
sophistication.

WHY DO WE NEED TO BOTHER WITH DESCRIPTIVES, ANYWAY?


It might be best to begin by discussing why we need to worry about descriptive
statistics and related graphical representations of scores in the first place. This
may be a relevant question for novice test developers, especially those involved
in constructing small-scale tests or tests to be used only in their own institutions.
Many language teachers who are developing tests for their institutions or even
just their own classes may find themselves wondering whether they really need
to. They may say that researchers should report these things when writing up

Correspondence should be addressed to Nathan T. Carr, Department of Modern Languages


and Literatures, California State University, 800 N. State College Boulevard, H-835, Fullerton,
CA 92831, USA. E-mail: ncarr@fullerton.edu

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

44

CARR

their results, and they may expect to see them when reading testing reports produced by high-priced Professional Testing Experts, but aside from calculating the
average score on a test, they may question the need for further statistical or
graphical description of scores. Such additional work is, no doubt, the province
of large testing organizations (see, e.g., Educational Testing Service, 2005). But
for locally developed tests, why bother, especially if it has never been done
before? On the other hand, others may recognize that they need to but find themselves unable to articulate any reasons why.
The easy answer is, Well, we just do, because it is good testing practice, and
therefore one of our responsibilities as test developers. In other words, it is part
of what is expected when you are conducting serious assessment, or want your
work to be taken seriously. But why is that, and why should it matter, really? No
doubt some will reply that perhaps they do not want to be so serious, thank you
very much. The official answer is that it is, in fact, necessary for discharging our
ethical duties as language testers: Principle 1 Annotation 5 and Principle 8 Annotation 1 of the International Language Testing Association Code of Ethics (2000)
call, in part,1 for communicating information in as meaningful a way as possible and to do so accurately.
There are several practical reasons for calculating descriptive statistics as
well, however. The first is that descriptive statistics let us know whether it is
appropriate to perform certain statistical tests (e.g., whether it is appropriate to
perform a t test to determine whether two groups performed differently to a statistically significant degree). Discussion of these statistical tests is beyond the
scope of this article, but they cannot be considered appropriately without first
paying attention to the concerns discussed here. Another practical reason for calculating descriptives is that they also let us know which correlation coefficient
we can use appropriately on our test scores. This is particularly important when
it comes to deciding between the Pearson productmoment correlation coefficient (Pearson r) and Spearman rho ().
Descriptive statistics are also used as a part of other statistical analyses that are
important to ensuring test quality, such as estimating test reliability (Bachman,
2004). For example, the formula for Cronbachs alphaan estimate of the score
consistency of a testrequires calculating the variance for total test scores as well
as for the scores on each individual item. Similarly, if we are interested in improving
reliability by revising problematic items, the most common way of estimating item

1
The full text of Principle 1; Annotation 5 is Language testers shall endeavour to communicate
the information they produce to all relevant stakeholders in as meaningful a way as possible. Principle 8, Annotation 1 reads When test results are obtained on behalf of institutions (government
departments, professional bodies, universities, schools, companies) language testers have an obligation to report those results accurately, however unwelcome they may be to the test takers and other
stakeholders (families, prospective employers etc).

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

45

difficulty (item facility) involves calculating the mean score for each item, as does
a family of well-known approaches to estimating item discrimination (upperlower
method item discrimination, the difference index, and the B-index).
Even more important, descriptive statistics give us basic information about
how people did on our tests. Presumably, we are interested in how people performed overall. But how the students did on the test involves more than just
reporting the average score, as is explained next. Descriptives and related graphical representations of our results help us determine whether we have the distribution of scores that we expectedor even needed. In other words, is the test
functioning as we expected it to? Descriptive statistics offer a precise description
of this (Bachman, 2004), whereas graphs of the score distribution provide a more
intuitive and holistic description of score data.
For example, if we are administering a pretest in a language class (a criterionreferenced pretest, such as a baseline measure intended to verify that students do
not already know the material about to be taught), we normally expect that the
majority of the examinees will score quite poorly, because they have not been
taught the material yet. Knowing whether this expectation actually holds true is
important; if it does not, the students may be in the wrong class, the course content may need to be revised so as to better meet their learning needs, or perhaps
the test was inappropriately constructed. In contrast, on a final exam (a criterionreferenced post-test), we expect most of the students to have mastered the material. If the descriptive statistics and graphs of the score distribution do not show
this, however, we will know that we need to revise the course content or teaching, or revise the test. Similarly, if we have reason to expect a normal distribution
in our test scores (i.e., the classical bell curve), we can use descriptive statistics
and graphs to see how well our results match our expectations. Situations in
which this expectation might be reasonable include proficiency testing, as well as
any other test where we expect most people to be average, and equal numbers to
be above and below average.
Finally, descriptive statistics and graphical representations of data can be useful when making comparisons between sets of test scores. Although there are statistical tests for doing this more precisely, we can still get some indication of the
degree of similarity or difference between groups by comparing descriptives and
graphs. Furthermore, even if we establish that a difference is statistically significantlarge enough that it probably did not happen by chancethat does not
mean that the difference is large enough to really matter. Examining descriptive
statistics such as the means and standard deviations, and comparing graphs of
score distributions, can help us judge whether significant differences are in fact
meaningfully large. One example of when we might want to do this is when we
wish to compare how a group of students performed on two different forms
(i.e., versions) of a test. Another example might be when we wish to compare the
performance of different groups taking the same test.

46

CARR

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

PUTTING DATA INTO EXCEL


Before we can construct charts of our data, or calculate any descriptive statistics,
we first need to get our score data into an electronic format. The best way to do
this is with a spreadsheet program; most people will probably use Microsoft
Excel spreadsheet software, although as Brown (2005) pointed out, the procedures are highly similar regardless of the particular program being used. The only
differences of real importance here are that programs other than Excel (such as
version 15 of SPSS; see Bachman & Kunnan, 2005, for comparison) may use
slightly different names for the functions discussed here and that the procedures
for creating histograms and other graphs will also differ. As Brown also noted,
the procedures described here are virtually identical for all versions of Excel. For
those who are easily intimidated by math in general and statistics in particular, I
should make something clear right now: Excel is not a program that people have
to be good at math to use. Rather, it is a program that people use when they do
not want to do math themselves. None of the procedures outlined in this article
will require you to anything beyond adding, subtracting, multiplying, or dividingand very little of that, too.
Unless the data are being imported from a text file, as would be the case with
tests that use optically scanned score sheets, results will probably have to be
entered by hand. The most accurate way to do this is to have one person reading
the names and data while a second person types everything. Not only is this
method usually faster, it also allows the person entering the data to watch the
screen at all times, increasing the likelihood that data entry errors will be caught
immediately. The nearly universal practice for arranging the file is to have each
examinees data in their own separate row in the spreadsheet, with each score
whether for individual items, sections of the test, or just total test scorein its
own column. Even when there are multiple sources of data for each test taker, as
when a speaking test has been scored by two raters, each test taker should have
one row in the data set. In other words, a given examinees data should not be put
into more than one row. Aside from custom, there are practical reasons for
arranging the data this way, not least of which is that none of the procedures
described here will work otherwise.

DISPLAYING DATA VISUALLY


Overview of Visual Representations of Data
Visual representations of data can be extremely useful. Interpreting them is much
more intuitive than interpreting statistics alone, although using the two formats
together provides the greatest clarity. Essentially, visual representations simply

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

47

TABLE 1
Example of a Frequency Distribution
Bin

Frequency

0
10
20
30
40
50
60
70
80
90
100

0
0
1
4
7
10
10
1
2
0
0

tell us how many test takers got each score. The first step in doing this is to compile a frequency distribution, a table that shows each score that someone received
on the test. In the past, this might have been done by hand, with paper and pencil;
for example, for a test worth 0 to 100 points, a tally mark would be made next to
each possible score, and after all the tests had been gone through, the marks
would be totaled and reported in a table going from highest score to lowest
(Brown, 2005; Guilford & Fruchter, 1978). As Bachman (2004) pointed out,
however, unless we have a small sample, it is more informative to group scores
in the frequency distribution. Fortunately, the entire process can now be done in
moments using Excel, as will be explained next. An example of a frequency distribution can be seen in Table 1, which reports frequencies for a small simulated
data set of 35 cases, similar in size to what a classroom teacher might expect to
deal with.
Guilford and Fruchter (1978) pointed out the importance, when grouping
scores in a frequency distribution, of using appropriately sized intervals, which
Excel refers to as bins.2 They recommended using 10 to 20 intervals, with 10 to 15
being more common. They also recommended using certain sizes for intervals
generally 2, 3, 5, 10, or 20 of whatever units are being usedand beginning each
interval with a number evenly divisible by the size being used (e.g., if 5-unit
intervals are used, then start each one with a number divisible by 5). Once the bin
size has been determined and the frequency distribution created, it can be
graphed. One way to do this is with a frequency polygon, which is basically a

2
I generally use the term bin instead of the better-sounding score interval in this article to remain
consistent with the usage in Excel. If the term does not seem to make much sense, imagine that we are
sorting potatoes, or buttons, and putting them into a number of storage bins based on their sizes.

48

CARR

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

12
10
8
6
4
2
0

10

20

30

FIGURE 1

40

50

60

70

80

90

100

Example of a frequency polygon.

line graph of the frequency distribution. An example can be seen in Figure 1.


Another type of graph that is even more commonly used is a histogram, which is
a bar graph of the frequency distribution (see Figure 2 for an example). Both of
the examples in Figures 1 and 2 are based on the frequency distribution in Table 1.
Creating Histograms in Excel
The first step is making sure that the Data Analysis Toolpack has been installed
in Excel. This does not always require the installation disk for your copy of
Excel, although it seems that newer versions of Excel, or computers that have
recently installed Microsoft updates, may be more likely to require it. Because
this toolpack is not familiar to most Excel users, I briefly explain how to install it.
12
10
8
6
4
2
0

10

20

30

40

FIGURE 2

50

60

70

80

Example of a histogram.

90

100

49

Begin by clicking on the Tools menu. If you see Data Analysis. . ., the toolpack
is installed already. If it is not, click on Add-Ins. . . and select the Analysis
ToolPak. You will then be asked whether you want to install the feature; click
on Yes and follow any additional directions. You will not need to reboot your
computer or restart Excel when the installation is finished.
Once the toolpack is installed, you are ready to construct a histogram in Excel.
You begin by setting up the bins. Although this is theoretically optional, the
results will be much more useful if you set an appropriate bin size (see Figure 3
for an example of what happens when the bins are not specified in Excel and the
program is left to determine them itself). All this requires is finding an empty column in the spreadsheet and entering the interval boundaries in ascending order;
see the first column of Table 1 for an example. You do not need to create the frequency distribution yourself, as Excel will do this for you automatically when it
creates the histogram. Once the bins have been created, go to Tools Data
Analysis. . ., select the Histogram option, and click on the OK button.
The first text box is the input range; click on the button to its right (the one
with the red arrow), and the dialogue box will almost entirely disappear, aside
from a floating text box. This happens so that you can navigate the spreadsheet
and select the raw data for which you wish to construct a histogram. Once you
have selected the data, click on the button on the right edge of the floating text
box (the one with the red arrow). Then repeat the process for the bins range.
Another important part of the process is choosing from among the three
options for output location. Normally, it is better to select Output Range or
New Worksheet Ply; the former will put the histogram in the current worksheet, whereas the latter will create a new worksheet tab within the current workbook (i.e., within the same Excel file). Choosing New Workbook will create a
new Excel workbook, which will probably be neither necessary nor useful for
most users. Finally, it is important to select the Chart Output checkbox, or
Excel will only produce a frequency distribution table, with no histogram. When
this is done, click OK and watch the histogram appear.

Frequency

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

150
100
50
0

FIGURE 3

Results of letting Excel determine the bins automatically.

More

106.5454

95.09090

83.63636

72.18181

60.72727

49.27272

37.81818

26.36363

14.90909

3.454545

Test Score

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

50

CARR

Note that once one histogram has been created in a session of using Excel, the
next one will contain the previous settings as a default. It is also worth remembering that the ever-popular undo button will not work with a histogramonce
created, the chart and the frequency distribution table must be deleted if there has
been a mistake.
Because the histogram is a chart, it can be reformatted like any other chart in
Excel. Resizing works the same as with any chart or picture in Microsoft Office
System applications. Many users will probably want to revise the labels; for
example, Bins, the default label for the x-axis, should probably be replaced
with something more informative, such as Test Score. Likewise, unless the
document will be printed and copied in color, it is better to change all graphs to
black, white, and gray. The color of the bars can be changed by right-clicking on
one, selecting Format Data Series. . ., and changing the color settings on the
Patterns tab. I recommend a dark gray color for clarity. The color of the plot
area can be changed to white in a similar fashion after right-clicking an empty
place in the plot area and selecting Format Plot Area. . . The X or Y axis may
be formattedincluding the direction of the textby right-clicking on any of the
text labeling the axis and selecting Format Axis. . . Text such as the title and
legend of the graph can be deleted entirely, if desired, by simply clicking on the
box and hitting the Delete key. Text that is not deleted can be formatted by rightclicking it, selecting the format option (Format Axis. . . , Format Axis Title. . . ,
etc.), and clicking on the Font tab. One area for particular attention in the
Font tab is the Auto scale checkbox, which controls whether the text size
stays the same at all times or automatically adjusts as the chart is resized. It is
important to note that formatting must be applied separately for each text box
within the chart.
To add a trendline,3 right-click on one of the bars in the graph, and select
Add Trendline. . . Finally, a frequency polygon can be created by changing the
chart type. This is done by right-clicking one of the bars in the graph, selecting
Chart Type. . ., and in the Standard Types tab selecting Line as the chart
type and clicking on Line with markers displayed at each data value as the
chart subtype. Note that even if the color of the bars had been changed already,
converting to a frequency polygon or adding a trend line will produce a colored
line, which should then be reformatted to black and white. A slightly reformatted
example of a histogram can be seen in Figure 4, whereas Figure 5 shows the
same histogram with a trend line added. Figure 6 shows a frequency polygon for
the same variable.
To insert a chart into a Microsoft Word document, simply click on the empty
space inside the borders of the chartnot a section with text or graphicsand

Note that Excel does not allow users to superimpose a normal curve over a histogram.

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

51

Frequency

200
150
100
50
0
Test Score
FIGURE 4

100

90

80

70

60

50

40

30

20

10

Histogram showing a relatively normal distribution.

Frequency

250
200
150
100
50
0

FIGURE 5

100

90

80

70

60

50

40

30

20

10

Test Score

Histogram showing a relatively normal distribution with a trend line added.

250
Frequency

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

250

200
150
100
50
0
100

90

80

FIGURE 6

70

60

50

40

30

20

10

Test Score

Frequency polygon corresponding to the histograms in Figures 4 and 5.

copy it. Open the Word document, put the cursor where you want to insert the
chart, and paste it in. It is important to note that all chart formatting should be
done in Excel first. Reformatting is not always possible in Word, and some
attempts at formatting in Word can even disrupt other, unrelated parts of the chart.
Therefore, once the document is pasted into Word, you should plan to do no additional formatting beyond changing the size of the chart. If you do need to make
changes, make them in Excel, and then paste in the new version of the chart.

CARR

Frequency

Figures 7 through 9 show histograms for 34, 194, and 991 cases, respectively.
Note that as the sample size increases, the shape of the histogram grows
smoother; that is, the larger the sample, the more it will tend to approximate the
normal distribution. This illustrates the point that large samples tend to yield
smoother graphs, although they do not guarantee that you will obtain the
7
6
5
4
3
2
1
0
100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5

Test Score

FIGURE 7

Example histogram with 34 cases and automatically scaled y-axis.

Frequency

30
25
20
15
10
5
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 8

Example histogram with 194 cases and automatically scaled y-axis.

120
Frequency

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

52

100
80
60
40
20
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 9

Example histogram with 991 cases and automatically scaled y-axis.

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

53

distributional shapediscussed nextthat you were anticipating. It should be


further noted that these three histograms deliberately violate a basic principle of
graphically representing data: When graphs are being compared, they should all
be on the same scale; in these three figures, it appears at first glance that similar
numbers of test takers are involved, unless one pays close attention to the scale of
the (vertical) y-axis. When histograms or other types of charts are going to be
compared or considered together, therefore, they need to be placed on the same
scale. It can be challenging to find a scale that works for sample sizes that differ
as much as the ones we see here; however, as Figures 10 through 12 show, doing
so can illustrate important information that might otherwise be missed. Such
adjustments in scale are made by right-clicking the labels of the axis in question,
selecting Format axis. . ., clicking on the Scale tab, and adjusting the values
for minimum, maximum, and major unit. Minor units can usually be ignored, as
can the tick mark options on the Patterns tab. If the number labels do not fit on
the axis, it may be necessary to rotate the labels by reopening the Format axis. . .
dialogue box, clicking on the Alignment tab, and making the text vertical or
angled.4

WHAT DESCRIPTIVES DO YOU NEED TO CALCULATE?


Having addressed why it is important to calculate descriptive statistics, the next
logical step is to identify which ones we need to report. Descriptives can be
divided into three groups: indexes that describe the shape of the distribution,
measures of central tendency, and measures of dispersion.
Indexes That Describe the Shape of the Distribution
When using statistics to describe the shape of a distribution, we report the skewness (also frequently called the skew) and the kurtosis. These two statistics are
probably the least familiar to beginners, but both are relatively simple to grasp.
Skewness, as its name suggests, indicates how far a distribution is skewed offcenter. A perfectly normal distributionthe bell curve that is approximated in
Figure 13has a skewness of zero. Figures 14 and 15, on the other hand, provide
examples of large positive and negative skewness, respectively. Many people
have trouble at first keeping straight which direction is negative and which is
positive, because the rule can seem counterintuitive at first. The thing to remember

4
When the last number on the axis has more digits than the others, the final digit sometimes may
not display if the labels are oriented vertically. The solution to that problem is to put them at a slight
angle, as in many of the examples in this article, which use a 75 orientation for this very reason.

Frequency

CARR

110
100
90
80
70
60
50
40
30
20
10
0
Test Score

Frequency

FIGURE 10
and 12.

Example histogram with 34 cases and y-axis on the same scale as Figures 11

110
100
90
80
70
60
50
40
30
20
10
0

100
95

90
85

80

FIGURE 11
and 12.

75

70
65

60

55
50

45

40

35
30

25

20

15

10
5

Test Score

Frequency

100
95
90

85
80

75
70

65
60

55
50

45
40

35
30

25
20

15
10

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

54

Example histogram with 194 cases and y-axis on the same scale as Figures 10

110
100
90
80
70
60
50
40
30
20
10
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 12
and 11.

Example histogram with 991 cases and y-axis on the same scale as Figures 10

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

55

Frequency

200
150
100
50
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
0
Test Score

FIGURE 13

Histogram showing a near-perfect normal distribution.

Frequency

250
200
150
100
50
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 14

Histogram of a positively skewed distribution.

250
Frequency

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

250

200
150
100
50
0

100
95
90

85
80

75

FIGURE 15

70

65
60

55
50

45

40

35
30

25

20

15
10
5

Test Score

Histogram of a negatively skewed distribution.

is that the tail tells the tale. In other words, the tail points in the direction of
the sign. The sign is not determined by which side the hump is on.
The other statistic used to describe the shape of the distribution is the kurtosis.
This tells us how flat or peaked the distribution is. A perfectly normal distribution has a kurtosis of zero; a distribution in which the scores are clustered tightly

56

CARR

Frequency

200
150
100
50
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 16 Histogram of a distribution with positive kurtosis.

250
Frequency

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

250

200
150
100
50
0

100
95
90
85
80
75
70
65
60
55
50
45
40
35
30
25
20
15
10
5
Test Score

FIGURE 17

Histogram of a distribution with negative kurtosis.

together (see Figure 16) has positive kurtosis, whereas one in which they are
spread out (see Figure 17) has negative kurtosis. If this seems difficult to keep
straight at first, remember that high, peaked distributions are positive, and low,
flattened-out ones are negative.
Measures of Central Tendency
Measures of central tendency (or measures of grouping; Bachman, 2004), as the
term suggests, tell us where the middle values in our data are located. There are
three that are commonly reported: the mean, the median, and the mode. The
mean is usually described as the arithmetic average5 of the scores. The median is
the middle scorethat is, half of the scores are above it, and half are below it,
meaning that it is at the 50th percentile. When there is an even number of scores,
5
People taking their firstor second, for that mattercourse in testing, statistics, or research
methods often wonder what the difference is between an arithmetic average and the ordinary averages they learned to calculate in elementary school. There is no difference.

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

57

the median is the average of the middle two scores. The third measure of central
tendency, the mode, is simply the most common score.
These three measures provide differing levels of information and are useful in
different contexts. The mode provides us with the least information of the three
knowing the most common score is nice, but it does not necessarily tell us much
about what is going on with everyone else who took our test. The mode is more
useful in cases where the variable being described is not a score, but a category
for example, when we are classifying students by first language. As Bachman
(2004) pointed out, the median is particularly useful when the distribution is
skewedextreme values in the tail of the distribution will have a disproportionate effect on the meanand in small samples, although the latter reason is largely
because small samples are unlikely to have a normal distribution. It is also appropriate for rankings and for scores where the distance between levels is not necessarily the same in every case (e.g., if an essay test is rated using a 5-point rating scale,
the difference in quality between a 2 and 3 might not be the same as the difference
between a 3 and 4). The mean is appropriate for any case in which the variable is
not highly skewed and the distances between levels of the variable are equivalent;
this is usually the case when test scores are based on a number of items.
In a normal distribution, the three measures of central tendency should be grouped
or clustered together fairly closely. If the distribution is skewed, they will be farther
apart. In particular, the larger the skewness is, the greater the distance will be between
the mean and the median. The median will be closer to the center of the hump of
the distribution, and the mean will be closer to the tail. The mode, of course, will be at
the tallest point of the hump, because that represents the most common score.
Measures of Dispersion
Measures of dispersion, as their name suggests, indicate how spread out scores
are for a particular variable. They include the standard deviation, variance, semiinterquartile range, and range. As with the measures of central tendency, these
indexes provide varying amounts of information about the data they describe.
The standard deviation is the most informative measure of dispersion and is
appropriate any time that it is appropriate to use the mean. To understand what
the standard deviation is, it is useful to keep in mind that on a given test, very few
test takers will receive scores exactly equal to the mean; that is, there will be
some difference between each examinees score and the mean. Conceptually, the
standard deviation is similar6 to the average of these differences.
6
Strictly speaking, the standard deviation is not really the average of the differences, which is
referred to as the mean deviation (Gorard, 2004). The mean deviation is used so seldom, however,
that thinking of the standard deviation as the average of the differences is unlikely to cause any
problems.

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

58

CARR

Further complicating the picture is that there are two formulas for the standard
deviation: one for when we are calculating the standard deviation for a sample,
and one for when we are calculating the standard deviation for the entire
population of interest. The population formula for the standard deviation is

S=

( X M )2

, and the sample formula is s =

( X M )2

(Brown, 2005),
N
n 1
where X is an individual test takers score, M is the mean, and N or n is the population or sample size, respectively.7 The formulas yield almost identical results if
the number of cases is large, but in small groups, the difference can be noticeable. If the test takers whose data are being analyzed are all of the test takers who
could be expected to take that test, then the population formula should be used
as, for example, when all of the students in a particular language program are
included in the analysis (Brown, 2005). This issue arises again in the context of
estimating test reliability, but further discussion of the matter lies beyond the
scope of this article.
The variance is simply the square of the standard deviation; therefore, as there
are two formulas for the standard deviation, there are also two for the variance
that is, for the variance of a population and the variance of a sample. The variance is not very useful in and of itself, but it is used in calculating a number of
other statistics, such as Cronbachs alpha, an estimate of internal consistency reliability (Allen & Yen, 1979).
The semi-interquartile range (Bachman, 2004) is based on the notion of quartiles,
divisions of the scores into four equally sized groups. Also referred to as the
quartile deviation, it is the average of the difference between the median (the
50th percentile) and the 25th and 75th percentiles; that is, between the second,
first, and third quartiles, respectively. Its calculation is very straightforward once
Q Q1 Fortunately,
the values of the first and third quartiles are calculated:
Q= 3
2
finding these values is very simple in Excel. The semi-interquartile range should
be reported any time the median is used.
The range is probably the simplest of the indicators of dispersion and is equal
to the highest score minus the lowest score, plus 1. As Bachman (2004) noted,
although it is the simplest of these indicators, it is also the least informative, as
distributions with widely varying shapes may all have the same range. This is the
case, in fact, in Figures 13 through 17.

7
Note the use of capital S in the population formula, and lowercase s for the sample formula. The
abbreviation SD is also commonly used, and does not specifically refer to either the population or
sample version.

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

59

What Are Good Values for Descriptive Statistics, and How Normal Is
Normal Enough?
A common question from people applying these statistics for the first time is to
wonder what good values are for descriptive statistics. There is no single
answer to this, as it depends on the distribution that is expected or needed in a
particular situation. For example, we would expect scores on a proficiency test to
be normally distributed (see Figure 13 for an example of an approximately normal distribution). In most cases, we might expect the same for a placement test,
even if it is criterion referenced. A normal distribution is also an assumption for
certain statistical tests. On the other hand, in a criterion-referenced pretest, given
to learners to assess their knowledge of something that has not been taught yet,
we generally expect a positively skewed distribution (see Figure 14). That is
because most of the test takers will have very low scores, although a few may
already know the material they are about to be taught, and will thus score higher
than the others. Similarly, a criterion-referenced posttest should have a negatively skewed distribution (see Figure 15), because the vast majority of the students will (we hope!) have mastered the content, and only a few will have very
low scores.
When we do not get the distribution we expect, there is something wrong. The
problem may lie with the test itself, or there may have been something problematic about our assumptions. To learn which it was will require gathering additional information about our students, further analyzing the test (e.g., item and
reliability analyses), or both. That is why we need to calculate descriptive statistics and create graphs of score distributions: to tell us whether our tests are functioning as we expected or not.
So how normal is normal enough, and how skewed should we expect our
pretests and posttests to be? Bachman (2004, p. 74) advised that as long as the
skewness and kurtosis values are between 2 and +2, the distribution is reasonably normal, meaning that it would be appropriate to perform analyses that
require normality to be appropriate (e.g., calculating the Pearson r, or performing
a t test). On the other hand, that does not automatically mean that a criterionreferenced pretest should have a skewness of at least +2, or that a criterionreferenced posttest should have at least a 2 skew. There are no rules of thumb of
which I am aware for these values; I therefore recommend looking at not only the
skewness statistic but also a histogram or frequency polygon of the scores to see
whether it has a shape that seems reasonable in light of the content being tested
and what you expect the students to know already when they take the test.
Finally, skewness and kurtosis are probably the most commonly misinterpreted and overinterpreted statistics discussed in this article. In particular, it is
important for beginners to keep in mind that any distribution found in real life
will have some degree of positive or negative skewness and kurtosis. Thus, having

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

60

CARR

TABLE 2
Functions and Formulas for Calculating Descriptive Statistics in Microsoft
Excel Spreadsheet Software
Excel Function or
Formula

Result for theExample


Data Set

= AVERAGE(B2:B11)
= MEDIAN(B2:B11)
= MODE(B2:B11)
= MAX(B2:B11)
= MIN(B2:B11)
= B16B17+1
= QUARTILE(B2:B11, 3)
= QUARTILE(B2:B11, 1)
= (B19B20)/2
= STDEV(B2:B11)
= STDEVP(B2:B11)
= VAR(B2:B11)
= VARP(B2:B11)
= SKEW(B2:B11)
= KURT(B2:B11)

45.9
47.5
38
79
3
77
56.8
38
9.4
20.7
19.6
427.7
384.9
0.632
1.354

Statistic
Mean (M)
Median (Mdn)
Mode
High score
Low score
Range
Third quartile (Q3)
First quartile (Q1)
Semi-interquartile range (Q)
Standard deviation (sample) (s, sdsample)
Standard deviation (population) (S, sdpop)
Variance (sample) (s2, varsample)
Variance (population) (S2, varpop)
Skewness
Kurtosis

a minor negative skewness (e.g., 0.034) does not necessarily suggest that a test
was a post-test. The same holds true for minor positive skewness and pretests.
Calculating These Statistics in Excel
Calculating the measures of central tendency and dispersion just discussed is quite
easy in Excel. In most cases, Excel already contains a function that calculates the
desired statistic. In the case of the range and semi-interquartile range, however, there
is no function, which requires users to use functions Excel does have to calculate the
highest score, lowest score, first quartile (Q1), and third quartile (Q3). These values
are then used to calculate the desired statistics. The functions and formulas used are
given in Table 2, along with the results for this sample data set.8 The ranges used in
the functions are based on the example in Figure 18, in which descriptives are calculated using a data set with 10 cases located in cells B2 through B11.
When using a function in Excel, there are two ways to proceed. The faster way is to
type in the function as written in Table 2, starting with the equals sign. After typing
the open parentheses sign, move the cursor to the top of the range (group of contiguous
cells) containing the scores. Click on that cell, and then select the entire range. You can

8
For those having trouble reading the scores in the figure, they are 45, 50, 38, 79, 56, 38, 57, 3, 30,
and 63.

61

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

DESCRIPTIVE STATISTICS, GRAPHS, AND EXCEL

FIGURE 18 Example of calculating the mean in Microsoft Excel spreadsheet software.


Note. Microsoft Excel screen shot reprinted with permission from Microsoft Corporation.

select the range by holding down the mouse button and dragging the cursor down to the
bottom of the data, or by holding down the Shift key and pressing the down arrow ()
or Page Down key on the keyboard. When the entire range is selected, type the close
parentheses sign, and hit the Enter key. Note that the functions are not case sensitive.
When a user is less confident, however, or cannot remember exactly how a
function works, it is possible to click on Insert Function. . ., and find the desired
function there. Something that may prove initially confusing is that for many functions, Excel has two text boxes, labeled Number 1 and Number 2. Simply
ignore the second one, and click on the button to the right of the first text box (the
one with the red arrow). As with creating histograms, most of the dialogue box will
disappear, and you select the range containing the data. After finishing, press the
Enter key, or click on the button to the right of the floating text box (again, the button with the red arrow). Click OK, and the function is calculated.
When entering the formulas for the range and semi-interquartile range, instead
of typing the specific cell addresses given in Table 2, it is necessary to use the
addresses in your own spreadsheet. When it would be time to type a cell address,
use the mouse to click on the desired cell (e.g., the cell containing the value for
Q3), and then continue typingthat is, do not then click on the cell where you are

Downloaded By: [Carr, Nathan T.] At: 20:35 1 February 2008

62

CARR

entering the formula. When the formula is finished, hit the Enter key. As a final
point, the spaces in the formulas are optional.

CONCLUSION
In summary, it is important that we calculate descriptive statistics for our tests
and that we create visual representations of our data. Only by doing both will we
gainor provide to othersa full picture of what is going on with our tests.
Many teachers and others responsible for local test development have probably
long avoided taking these essential steps, though, because of unfamiliarity with
statistics or a lack of access to statistical analysis software. As has just been
shown, however, the statistics required are not actually that complex, do not
require a high level of mathematical sophistication, and can be calculated easily
using one of the most ubiquitous computer programs in the world.

ACKNOWLEDGMENT
I thank two anonymous reviewers for the feedback and encouragement that they
offered on a previous draft of this article. Any remaining shortcomings are, of
course, my own responsibility.

REFERENCES
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Monterey, CA: Brooks/Cole.
Bachman, L. F. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge
University Press.
Bachman, L. F., & Kunnan, A. J. (2005). Workbook and CD for statistical analyses for language
assessment. Cambridge, UK: Cambridge University Press.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language
assessment (2nd ed.). New York: McGraw-Hill.
Educational Testing Service. (2005). TOEFL test and score data summary: 20042005 test year data
(Report No. TOEFL-SUM-0405-DATA). Princeton, NJ: Author. Retrieved December 17, 2006,
from http://www.ets.org/Media/Tests/TOEFL/pdf/Test%20and%20Score%20Data%20Summary
%2004_05.pdf
Gorard, S. (2004, September). Revisiting a 90-year-old-debate: The advantages of the mean deviation. Paper presented at the British Educational Research Association Annual Conference, University of Manchester, England. Retrieved December 20, 2006, from http://www.leeds.ac.uk/educol/
documents/00003759.htm
Guilford, J. P., & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).
New York: McGraw-Hill.
International Language Testing Association. (2000). Code of ethics. Retrieved December 17, 2006,
from http://www.iltaonline.com/code.pdf

Vous aimerez peut-être aussi