Académique Documents
Professionnel Documents
Culture Documents
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
C H A P T E R
CORE
Organising and
displaying data
What is the difference between categorical and numerical data?
What is a frequency table, how is it constructed and when is it used?
What is the mode and how do we determine its value?
What are bar charts, histograms, stem plots and dot plots? How are they
constructed and when are they used?
How do you describe the features of bar charts, histograms and stem plots when
writing a statistical report?
1.1
Classifying data
Statistics is a science concerned with understanding the world through data. The rst step in
this process is to put the data into a form that makes it easier to see patterns or trends.
Some data
The data contained in Table 1.1 are part of a larger set of data collected from a group of
university students.
Table 1.1 Student data
Height
(cm)
Weight
(kg)
Age
(years)
Sex
M male
F female
173
179
167
195
173
184
175
140
57
58
62
84
64
74
60
50
18
19
18
18
18
22
19
34
M
M
M
F
M
F
F
M
Plays sport
1 regularly
2 sometimes
3 rarely
2
2
1
1
3
3
3
3
Pulse rate
(beats/min)
86
82
96
71
90
78
88
70
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Variables
In a data set, we call the things about which we record information variables. An important
rst step in analysing any set of data is to identify the variables involved, their units of
measurement (where appropriate) and the values they take. In this particular data set there are
six variables:
height (in centimetres)
sex (M = male, F = female)
weight (in kilograms)
plays sport (1 = regularly, 2 = sometimes, 3 = rarely)
age (in years)
pulse rate (beats/minute)
Warning!!
It is not the variable name itself that determines whether the data are numerical or categorical, it is
the way the data for the variable are recorded.
For example:
weight recorded in kilograms, is a numerical variable
weight recorded as 1 = underweight, 2 = normal weight, 3 = overweight, is a categorical
variable
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Exercise 1A
1 What is:
a a numerical variable? Give an example.
4 Classify the data for each of the variables in Table 1.1 as numerical or categorical.
1.2
Example 1
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Solution
1 Set up a table as shown. The variable Sex has two
categories: Male and Female.
2 Count up the number of females (6) and males (5).
Record this in the Count column.
3 Add the counts to nd the total count, 11 (6 + 5).
Record this in the Count column opposite Total.
4 Convert the counts into percentages.
Record this in the Per cent column. For example:
Frequency
Sex
Count
Per cent
Female
Male
6
5
54.5
45.5
Total
11
100.0
6
100% = 54.5%
11
5 Finally, total the percentages and record.
percentage of females =
There are two things to note in constructing the frequency table in Example 1.
1 In setting up this frequency table, the order in which we have listed the categories Female
and Male is quite arbitrary; there is no natural order. However, if the categories had been,
for example, First, Second and Third, then it would make sense to list the categories in
that order.
2 The Total count should always equal the total number of observations; in this case, 11.
The percentages should add to 100%. However, if percentages are rounded to one decimal
place a total of 99.9 or 100.1 is sometimes obtained. This is due to rounding error. Totalling
the count and percentages helps check on your counting and percentaging.
How has forming a frequency table helped?
The process of forming a frequency table for a categorical variable:
displays the data in a compact form
tells us something about the way the data values are distributed (the pattern of the
data).
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Example 2
Climate type
Cold
Moderate
Hot
Total
Frequency
Count Per cent
3
14
6
23
13.0
60.9
26.1
100.0
Solution
15
10
Frequency
0
Cold
Moderate
Climate type
Hot
The mode
One of the features of a data set that is quickly revealed with a bar chart is the mode or modal
category. This is the most frequently occurring value or category. This is given by the
category with the tallest bar. For the bar chart above, the modal category is clearly Moderate.
That is, for the countries considered, the most frequently occurring climate type is Moderate.
However, the mode is only of interest when a single value or category in the frequency table
occurs much more often than the others. Modes are of particular importance in popularity
polls. For example, in answering questions such as Which is the most frequently watched TV
station between the hours of 6.00 and 8.00 p.m.? or What are the times when a supermarket
is in peak demand morning, afternoon or night?
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Report
The climate types of 23 countries were classified as being, `cold', `moderate' or `hot'. The
majority of the countries, 60.9%, were found to have a moderate climate. Of the remaining
countries, 26.1% were found to have a hot climate while 13.0% were found to have a cold
climate.
Percentage
Frequency
25
A variation on the standard bar chart is the
Climate
Hot
segmented or stacked bar chart. In a
Moderate
20
segmented bar chart, the bars are stacked
Cold
on one another to give a single bar with
15
several components. The lengths of the
10
segments are determined by the frequencies.
When this is done, the height of the bar gives
5
the total frequency. Segmented bar charts
should only be used when there are
0
a relatively small number of components; usually no more than four or ve. Otherwise it
becomes difcult to distinguish the components. The segmented bar chart above was formed
from the climate data used in Example 2. Note that a legend has been included to identify the
segments.
In a percentage segmented bar chart,
100
Climate
the lengths of each of the segments in the
90
Hot
bar are determined by the percentages.
80
Moderate
70
When this is done, the height of the bar is
Cold
60
100. The percentage segmented bar chart
50
opposite was formed from the climate data
40
used in Example 2.
30
Percentage segmented bar charts are most
20
10
useful when we come to analyse the
0
relationship between two categorical
variables, as we will see in Chapter 4.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Exercise 1B
1 a In a frequency table, what is the mode?
b Identify the mode in the following data sets:
i Grades:
A A C B A B B B B D C
ii Shoe size: 8 9 9 10 8 8 7 9 8 10 12
10
2 The following data identies the state of residence of a group of people, where
1 = Victoria, 2 = SA and 3 = WA.
2 1 1 1 3 1 3 1 1 3 3
a Form a frequency table (with both counts and percentages) to show the distribution of
state of residence for this group of people. Use the table in Example 1 as a model.
b Construct a bar chart using Example 2 as a model.
3 The size (S = small, M = medium, L = large) of 20 cars was recorded as follows:
S S L M M M L S S M
M S L S M M M S S M
a Form a frequency table (with both counts and percentages) to show the distribution of
size for these cars. Use the table in Example 1 as a model.
b Construct a bar chart using Example 2 as a model.
4 The table shows the frequency distribution of School type for a number of schools. The table
is incomplete.
a Write down the information missing from the table.
b How many schools are categorised
as Independent?
c How many schools are there in total?
d What percentage of schools are
categorised as Government?
e Use the information in the frequency table
to complete the following report.
School type
Catholic
Government
Independent
Total
Frequency
Count Percent
4
11
5
20
25
100
Report
schools were classified according to school type. The majority of these schools,
%,
schools. Of the remaining schools,
were
while
were found to be
schools.
20% were
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
5 The table shows the frequency distribution of the place of birth for 500 Australians.
a Is Place of birth a categorical or a numerical variable?
b Display the data in the form of a percentage
segmented bar chart.
c Use the information in the frequency table to
write a brief report.
Place of birth
Per cent
Australia
Overseas
Total
78.3
21.8
100.1
6 The table records the number of new cars sold in Australia during the rst quarter of one
year, categorised by type (private vehicle or commercial vehicle).
a Copy and complete the table giving the
percentages correct to the nearest
whole number.
b Display the data in the form of a
percentage segmented bar chart.
Type of vehicle
Private
Commercial
Total
Frequency
Count
Per cent
132 736
49 109
7 The table shows the frequency distribution of eye colour of 11 preschool children.
a Use the information in the table to construct a
bar chart. Place the columns in order of
decreasing frequency.
b Use the information in the table to construct a
percentage segmented bar chart.
c Use the information in the table to write a brief
report.
8 Twenty-two students were asked the question, How
often do you play sport? with the possible response:
Regularly, Sometimes or Rarely. The
distribution of responses is summarised in the
frequency table.
a Write down the information missing from the table.
b Use the information in the frequency
table to complete the following report.
Frequency
Eye colour Count Percentage
Brown
Hazel
Blue
Total
6
2
3
11
Plays sport
Frequency
Count Per cent
Regularly
Sometimes
Rarely
Total
54.5
18.2
27.3
100.0
5
10
22.7
31.8
22
Report
When
students were asked the question, `How often do you play sport', the dominant
% of the students. Of the remaining students,
response was `Sometimes', given by
% of the students responded that they played sport
while
% said that they
.
played sport
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Example 3
The family sizes of 11 preschool children (including the child itself) are as follows:
3 3 4 4 5 3 2 4 3 5 3
Display the data in the form of a frequency table.
Solution
1 Set up a table as shown. In the data set, the
variable family size takes the values 2, 3, 4
and 5. List these values under Family size
in some order, here increasing.
2 Count up the number of 2s, 3s, 4s and 5s in
the dataset. For example, there are ve 3s.
Record these values in the Count column.
Frequency
Family size
Count
Per cent
2
3
4
5
1
5
3
2
9.1
45.5
27.3
18.2
Total
11
100.1
3 Add the counts to nd the total count, 11. Record this value in the Count column opposite
Total.
4 Convert the counts into percentages. Record them in the Per cent column. For example,
5
100% = 45.5%
11
5 Finally, total the percentages and record.
percentage of 3s =
Grouping data
Some variables can only take on a limited range of values; for example, the number of children
in a family. Here, it makes sense to list each of these values individually when forming a
frequency distribution.
In other cases, the variable can take a large range of values; for example, age (0100).
Listing all possible ages would be tedious and would produce a large and unwieldy display. To
solve this problem, we group the data into a small number of convenient intervals. There are
no hard and fast rules for the number of intervals but, usually, between ve and fteen intervals
are used. Usually, the smaller the number of data values, the smaller the number of intervals.
Note that the intervals are dened so that it is quite clear into which interval each data value
falls. For example, you cannot dene intervals as, 15, 510, 1015, 1520, . . . etc., as you
would not know into which interval to put the values, 5, 10, 15 etc.
Guideline for choosing the number of intervals
There are no hard and fast rules for the number of intervals to use but, usually, between ve
and fteen intervals are used.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
10
11:21
Example 4
Grouping data
The ages of a sample of 200 people aged from 16 to 72 years are to be recorded. Group the
ages into six equal-sized categories that will cover all of these ages.
Solution
1 Write down the required number of intervals.
2 Determine interval width.
Ages range from 16 to 72, which covers
57 years. Six intervals will give intervals
57
= 9.5.
of width
6
Set the interval width to 10, the nearest
whole number above 9.5.
3 Choose a starting point that ensures that
the intervals cover the full range of values.
15 would be a suitable starting point.
4 Write down the intervals.
Number of intervals: 6
57
Interval width =
= 9.5: use 10
6
Starting point: 15
Once we know how to group data, we can form a frequency distribution for grouped data.
Example 5
The data below give the average hours worked per week in 23 countries.
35.0, 48.0, 45.0, 43.0, 38.2, 50.0, 39.8, 40.7, 40.0, 50.0, 35.4, 38.8,
40.2, 45.0, 45.0, 40.0, 43.0, 48.8, 43.3, 53.1, 35.6, 44.1, 34.8
Form a grouped frequency table with ve intervals.
Solution
1 Set up a table as shown. For ve intervals and
data values ranging between 34.8 and 53.1,
use the intervals: 30.034.9, 35.039.9, . . . ,
50.054.9.
2 List these intervals, in ascending order, under
Average hours worked.
3 Count the number of countries whose
average working hours fall into each of
the intervals. For example, six countries have
average working hours between 35.0 and 39.9.
Record these values in the Count column.
4 Add the counts to nd the total count, 23.
Record this value in the Count column
opposite Total.
Average hours
worked
30.034.9
35.039.9
40.044.9
45.049.9
50.054.9
Total
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Frequency
Count
Per cent
1
6
8
5
3
4.3
26.1
34.8
21.7
13.0
23
99.9
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
11
5 Convert the counts into percentages. Record these in the Per cent column.
For example, for 35.039.9 hours,
6
100% = 26.1%
percentage =
23
6 Finally, total the percentages and record.
The histogram
The frequency histogram, or histogram for short, is a graphical way of presenting the
information in a frequency table for numerical data. Later in the chapter, you will learn about
two other graphical displays for numerical data, the stem plot and the dot plot.
Constructing a histogram from a frequency table
In a frequency histogram:
frequency (count or per cent) is shown on the vertical axis
the values of the variable being displayed are plotted on the horizontal axis
for continuous data, each bar in a histogram corresponds to a data interval. For discrete
data, where there are gaps between values, the intervals start and end halfway between
values. Empty classes or missing discrete values have bars of zero height
the height of the bar gives the frequency (usually the count, but it can equally well be the
percentage).
Example 6
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Frequency (count)
1
6
8
5
3
23
Cambridge University Press
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
12
11:21
Solution
Example 7
9
8
7
Frequency
6
5
4
3
2
1
0
25
30
35 40 45 50 55
Average hours worked
60
Family size
2
3
4
5
Total
Frequency (count)
1
5
3
2
11
Solution
5
4
Frequency
3
2
1
0
1
2
3
4
Family size
13
11
12
4
18
25
22
15
17
7
18
14
23
13
15
14
13
12
17
15
18
13
22
16
23
14
Steps
1 Start a new document: Press c and
select New Document (or use / + N).
If prompted to save an existing
document, move cursor to No and press
.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
14
4 Data analysis
a Move cursor onto any column,
will show and the column data will
be displayed as shown opposite.
b To view other column data values
move the cursor to another column.
Note: If you click on a column it will be selected.
To deselect any previously selected columns,
move the cursor to the open area and press
.
Hint: If you accidentally move a column or data
point, press / +
to undo the move.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
15
5 Change the histogram column (bin) width to 4 and the starting point to 2.
a Press / + b to get the contextual menu as shown (below left).
Hint: Pressing / + b with the cursor on the histogram gives you access to a contextual menu
that enables you to do things that relate only to histograms.
d A new histogram is displayed with a column width of 4 and a starting point of 2 but
it no longer ts the viewing window (below left). To solve this problem press
/ + b >Zoom>Zoom-Data to obtain the histogram shown below right.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
16
11
12
4
18
25
22
15
17
7
18
14
23
13
15
14
13
12
17
15
18
13
22
16
23
14
Steps
1 From the application menu
screen, locate the built-in Statistics
to open.
application. Tap
from the icon panel
Tapping
(just below the touch screen) will
display the application menu if it is
not already visible.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
17
Tapping r from the icon panel allows the graph to ll the entire screen. Tap r again to return
to half-screen size.
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
18
5 Tapping
from the toolbar
places a marker (+) at the top of
the rst column of the histogram
(see opposite) and tells us that
a the rst interval begins
at 2 (xc = 2)
b for this interval, the frequency
is 1 (Fc = 1).
To nd the frequencies and starting points of the other intervals, use the arrow (
move from interval to interval.
) to
Exercise 1C
1 The numbers of occupants in nine cars stopped at a trafc light were:
1 1 2 1 3 1 2 1 3
What is the mode of this data set? What does this tell us?
2 The number of surviving grandparents for 11 preschool children is listed below.
0 4 4 3 2 3 4 4 4 3 3
Form a frequency table to show the distribution of the number of surviving grandparents.
3 a Write down the missing information in the
frequency table.
b How many families had only one child?
c How many families had more than one
child?
d What percentage of families had no
children?
e What percentage of families had fewer
than three children?
No. of children
in family
ISBN 978-1-107-65590-4
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party.
0
1
2
3
4
Total
Frequency
Count
%
3
10
6
2
21
47.6
28.6
9.5
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
19
4 a Salaries of women teaching in a school range from $20 106 to $63 579. Group the salaries
into ve equal-sized categories that cover all teaching salaries.
b The number of students in VCE Further Mathematics classes ranges from 6 to 33. Group
the class sizes into six equal-sized categories that cover all Further Mathematics class
sizes.
c The amount of money carried by a sample of 23 students ranges from nothing to $8.75.
Group the amount of money carried by the students into ve equal-sized categories that
cover all amounts of money carried by the students.
5 The histogram opposite was formed by recording the
number of words in 30 randomly selected sentences.
30
Frequency (%)
35
25
20
15
10
5
0
5 10 15 20 25 30
Number of words in sentence
Population density
0199
200399
400599
600799
800999
Total
Frequency (count)
11
4
4
2
1
22
Number of rooms
4
5
6
7
8
Total
Frequency (count)
3
0
1
3
4
11
82
78
96
69
71
77
90
64
78
80
68
83
71
78
68
88
88
70
76
86
74
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
20
11:21
a Use a graphics calculator to construct a histogram so that the rst column starts at 63 and
the column width is two.
b For this histogram:
i what is the starting point of the third column?
ii what is the count for the third column? What actual data values does this include?
c Redraw the histogram so that the column width is ve and the rst column starts at 60.
d For this histogram, what is the count in the interval 65 to <70?
9 The following data values are the numbers of children in the families of 25 VCE
students:
1 6 2 5 5 3 4 1 2 7 3 4 5 3 1 3 2 1 4 4 3 9 4 3 3
a Use a graphics calculator to construct a histogram so that the column width is one and the
rst column starts at 0.5.
b For this histogram, what is the starting point for the fourth column and what is the count?
c Redraw the histogram so that the column width is two and the rst column starts at 0.
d For this histogram:
i what is the count in the interval from 6 to less than 8?
ii what actual data value(s) does this interval include?
Shape
How is the data distributed? Is the histogram peaked; that is, do some data values tend to occur
much more frequently than others, or is it relatively at, showing that all values in the
distribution occur with approximately the same frequency?
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
21
10
8
6
4
2
0
lower tail
peak
upper tail
Frequency
Frequency
Symmetric distributions
If a histogram is single-peaked, does the histogram region tail off evenly on either side of the
peak? If so, the distribution is said to be symmetric (see Histogram 1).
10
8
6
4
2
0
Histogram 1
peak
peak
Histogram 2
10
Frequency
Frequency
peak
10
8
6
4
2
0
peak
ve skew
6
4
2
0
Histogram 3
Histogram 4
If a histogram tails off to the left we say that it is negatively skewed (Histogram 4). The
distribution of age at death tends to be negatively skewed. Most people die in old age, a few in
middle age and even fewer in childhood.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
22
11:21
Outliers
Frequency
Outliers are any data values that stand out from the main body of data. These are data values
that are atypically high or low. See for example, Histogram 5, which shows an outlier. In this
case it is a data value that is atypically low compared to the rest of the data values.
outlier
main body of data
Outliers can indicate errors made collecting
10
or processing data; for example, a persons
8
age recorded as 365. Alternatively, they may
6
indicate data values that are very different
4
2
from the rest of the values. For example,
0
compared to her students ages, a teachers
Histogram 5
age is an outlier.
Centre
Frequency
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
23
Frequency
For skewed distributions, it is more difcult to estimate the middle of a distribution by eye.
The middle is not halfway between the extremes because, in a skewed distribution, the scores
tend to bunch up at one end. However, if we
5
line that divides
imagine a cardboard cut-out of the histogram,
the area of the
4
the midpoint lies on the line that divides the
histogram in half
histogram into two equal areas (Histogram 9).
3
Using this method, we would estimate the
2
centre of the distribution to lie somewhere
between 35 and 40, but closer to 35, so we
1
might opt for 37. However, remember that
0
15 20 25 30 35 40 45 50
this is only an estimate.
Histogram 9
Spread
10
8
6
4
2
0
Frequency
Frequency
If the histogram is single peaked, is it narrow? This would indicate that most of the data values
in the distribution are tightly clustered in a small region. Or is the peak broad? This would
indicate that the data values are more widely spread out. Histograms 10 and 11 are both single
peaked. Histogram 10 has a broad peak, indicating that the data values are not very tightly
clustered about the centre of the distribution. In contrast, Histogram 11 has a narrow peak,
indicating that the data values are tightly clustered around the centre of the distribution.
8 10 12 14 16 18 20 22
Histogram 10
20
16
12
8
4
0
2
4 6
8 10 12 14 16 18 20 22
Histogram 11
But what do we mean by the spread of a distribution? We will return to this in more detail
later. For a histogram we will take it to be the maximum range of the distribution.
Range
Range = largest value smallest value
For example, Histogram 10 has a spread (maximum range) of 22 (22 0) units, which is
considerably greater than the spread of Histogram 11, which has a spread of 12 (18 6) units.
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Example 8
35
30
Frequency (count)
24
25
20
15
10
5
0
170 340 510 680 850 1020
Number of phones (per 1000 people)
Solution
a Shape and outliers
b Centre Count up the frequencies from
either end to nd the middle interval.
c Spread Use the maximum range to
estimate the spread.
It should be noted that, with grouped data, it is difcult to precisely determine the location of
the centre of a distribution from a histogram. So, when working with grouped data, it is
acceptable to state that the centre of a distribution lies in the interval 170340. We will learn
how to solve this problem later in the chapter.
If you were using the histogram above to describe the distribution in a form suitable for a
statistical report, you might write as follows.
Report
For the 85 countries, the distribution of the number of phones per 1000 people is positively
skewed. The centre of the distribution lies somewhere in the interval 170340 phones/1000
people. The spread of the distribution is 1020 phones/1000 people. There are no outliers.
Exercise 1D
1 Label each of the following histograms as approximately symmetric, positively skewed or
negatively skewed, and identify the following:
i the mode
a
80
20
15
Frequency
Frequency
10
60
40
20
0
Histogram A
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Histogram B
Cambridge University Press
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
15
20
Frequency
Frequency
20
10
25
15
10
5
0
Histogram D
Histogram C
Frequency
10
2 These three histograms show
the marks obtained by a group
9
8
of students in three subjects.
7
a Are each of the distributions
6
approximately symmetric or
5
skewed?
4
b Are there any clear outliers?
3
c Determine the interval
2
containing the central mark
1
for each of the three subjects.
0
d In which subject was the
2 6 10 14 18 22 26 30 34 38 42 46 50
spread of marks the least? Use
Subject A
Subject B
Subject C
Marks
the range to estimate the spread.
e In which subject did the marks vary most? Use the range to estimate the spread.
20
Frequency
Frequency
i the mode(s)
a
15
10
5
0
80
60
40
20
0
Histogram B
Histogram A
20
15
10
5
0
Frequency
Frequency
20
15
10
5
0
Histogram D
Histogram C
20
15
Frequency
Frequency
10
5
0
80
60
40
20
0
Histogram E
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
Histogram F
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
6
5
Frequency (count)
26
4
3
2
1
0
60 65 70 75 80 85 90 95 100 105 110 115
Pulse rate (beats per minute)
Report
For the
students, the distribution of pulse rates is
with an outlier. The
beats per minute and the spread of the
centre of the distribution lies in the interval
beats per minute. The outlier lies in the interval
beats per minute.
distribution is
Stem Leaf
2 5
13 2
and so on.
To construct a stem plot, enter the stems to the left of a vertical dividing line, and the leaves
for each data point to the right. Usually we rst construct an unordered stem plot by
systematically plotting each data point as listed in the data set. From the unordered
stem-and-leaf plot an ordered stem plot is then easily obtained. In an ordered stem plot the
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
27
leaves increase in value as they move away from the stem. It is usually the ordered stem plot
that we want, because an ordered stem plot makes it easy to nd the key values.
Example 9
0
1
2
3
4
5
0
1
2
3
4
5
0
1
2
3
4
5
3
6
0 3 1 9 1 7
1 2 3 5 7
2 6 0 5 6 6
3 6 0 7
4
5 5
unordered stem plot
0 1 1 3 3 7
1 2 3 5 7
2 0 1 2 5 6
3 0 6 7
4
5 5
ordered stem plot
8 3
7 1 2
8 9
6 6 7
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
28
11:21
Methods for determining the centre, spread and outliers from a stem plot
Centre (middle) Count up from either end of the distribution until you nd the middle
value; the value that has an equal number of data values either side.
n+1
th
For an odd number of data values, n, the middle value is the
2
value. Thus, the median will be an actual data value.
n+1
th
For an even number of data values, n, the middle value is the
2
value. Thus, the median will lie between two data values.
Spread (range)
Subtract the smallest data value from the largest data value.
Range = largest value smallest value
Outliers
Data values that stand out from the main body of data are called outliers.
Their values can be read directly from the stem plot.
Example 10
If you were using the stem plot to describe the distribution in a form suitable for a statistical
report, you might write as follows.
Report
For the 23 students, the distribution of marks is approximately symmetric with an outlier.
The centre of the distribution is at 30 marks and the distribution has a spread of 45
marks. The outlier is a mark of 60.
Split stems
In some instances, using the simple process outlined above produces a stem plot that is too
bunched up to give us a good overall picture of the variation in the data. This is often the case
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
29
when the data values all have the same rst digit or the same one or two rst digits. For
example, a group of 17 VCE students recently sat for a statistics test marked out of 20. The
results are as shown below.
2
12
13
18
17
16
12
10
16
14
11
15
16
15
17
Using the process described in Example 10 to form a stem plot, we end up with a
bunched-up plot like the one below.
0
1
2
0
7
1
9
2
When this happens, the stem plot scale can be stretched out by splitting the stems. Generally
the stem is split into halves or fths. For example, for the interval 1019, the split stem system
works as follows.
1 (1019)
1 (1014)
1 (1519)
1 (1011)
1 (1213)
1 (1415)
1 (1617)
1 (1819)
Single stem
In a stem plot with a single stem, the 1 represents the interval 1019.
In a stem plot with its stem split into halves, the top 1 represents the interval 1014,
while the bottom 1 represents the interval 1519.
In a stem plot with its stem split into fths, the top 1 represents the interval 1011, the
second 1 represents the interval 1213, the third 1 represents the interval 1415, the
fourth 1 represents the interval 1617, while the bottom 1 represents the interval 1819.
Comparison of stem plots with different split stems
Using a split stem plot to display the test marks can show features not revealed by a standard
plot. This can be seen in the next plot with the stem split into fths, indicating that a mark of 2
is an outlier.
0 2 7 9
1 0 1 2 2 3 4 5 5 6 6 6 7 7 8
Single stem
0
0
1
1
2
7 9
0 1 2 2 3 4
5 5 6 6 6 7 7 8
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
0
0
0
0
0
1
1
1
1
1
2
7
9
0
2
4
6
8
1
2 3
5 5
6 6 7 7
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
30
11:21
Test 1
Test 2
9
8 6 6 5
9 8 7 6 5 5
8 7 5 3 3 2 2
9
0
0
0
0
1
2
3
4
5
8
9
0 4 5 7 8 8
0 3 5 5 6 8 9
1 2 3 3 4 5
0
Solution
Report
The distribution of the Test 1 marks is negatively skewed while the distribution of the
Test 2 marks is approximately symmetric. The two distributions have similar centres;
36.5 and 35. The spread of the Test 1 marks is less than the Test 2 marks; 29
compared to 42. There are no outliers.
Dot plots
The simplest way to display numerical data is to form a dot plot. A dot plot consists of a
number line with each data point marked by a dot. When several data points have the same
value, the points are stacked on top of each other. Like stem plots, dot plots are a great way of
displaying small data sets and have the advantage of being very quick to construct by hand.
They are best when the data values are relatively close together.
Example 12
17
18
19
20
21
22
23 24
25 26 27
28
29 30
Age (years)
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
17
18
19
20 21
22 23 24 25 26 27
31
28 29 30
Age (years)
Which graph?
One of the issues that you will face is choosing a suitable graph to display a distribution. The
following guidelines might help you in your decision-making. They are guidelines only,
because in some instances there may be more than one suitable graph.
Type of data
Graph
Qualications on use
Categorical
Bar chart
Segmented bar chart
Histogram
Stem plot
Dot plot
Numerical
Exercise 1E
1 The data below give the urbanisation rates (%) in 23 countries.
54 99 22 20 31
3 22 9 25 3 56 12
16
9 29
6 28 100 17 9 35 27 12
a Construct an ordered stem plot.
b What advantage does a stem plot have over a histogram?
2 For each of the following stem plots (A, B and C):
a name its shape and note outliers (if any)
b locate the centre of the distribution
Stem plot A
Stem plot B
Stem plot C
0
1
2
3
4
5
6
0
1
2
3
4
5
6
0
1
2
3
4
5
6
0
2
0
2
0
2
0 1 1 2 6 7 7 9
2 3 5 5 5 5 6
1 4 7
2
0
1
0
2
1
2
2
3
0
2
2
3
6
1
2
4
9
5 6 8 8
4 5 9 9 9
4 6
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
1 3
2
0 2 4
1 1 3 5 8 8
0 0 4 4 4 7 7 8 9
Cambridge University Press
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
32
11:21
3 The data below give the wrist circumference (in cm) of 15 men.
16.9 17.3 19.3 18.5 18.2 18.4 19.9 16.7 17.1 17.6 17.7 16.5 17.0 17.2 17.6
a Construct a stem plot for wrist circumference using:
i stems 16, 17, 18, 19
ii these stems split into halves
b Which stem plot appears to be more appropriate for the data?
c Use the stem plot with split stems to help you complete the report below.
Report
For the
men, the distribution of their wrist circumference is
. The centre of
cm and it has a spread of
cm. There are no outliers.
the distribution is at
4 The data below give the weight (in kg) of 22 students.
57 58 62
84 64 74 57 55 56 60 75
68 59 72 110 56 69 56 50 60 75 58
a Construct a stem plot for weight using:
i stems 5, 6, 7, 8, 9, 10 and 11
ii these stems split into halves
b Use the stem plot with a split stem to write a brief report on the distribution of the
weights of the students in terms of shape (and outliers), centre and spread. Use the report
from Question 3 as a model.
5 The number of possessions (kicks, mark, handballs, knockouts etc.) recorded for players in a
football game between Carlton and Essendon is shown below.
Carlton
Essendon
10 44 32 44 19 35 11 5 24 28 21 32 21 59 21 12 19 26 23 22 29 34
22 34 36 20 14 25 16 19 32 32
14 29 8 22 21 26 44 19 21 22
a Display the data in the form of an ordered back-to-back stem plot.
b Complete the following report comparing the two distributions in terms of shape (and
outliers), centre and spread.
Report
The distribution of the number of possessions is
for both teams. The two
and
possessions, respectively. The spread of
distributions have similar centres, at
possessions, compared to
possessions for
the distribution is less for Carlton,
Essendon.
6 The following data give the number of children in the families of 14 VCE students:
1 6 2 5 5 3 4 4 2 7 3 4 3 4
a Construct a dot plot.
b What is the mode?
c What is:
i the centre?
ii the spread?
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
33
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
Review
34
11:21
Frequency table
Categorical data
Bar chart
The mode is the value or group of values that occurs most often
(frequently) in a data set. For example, for the data 2 1 1 3 3 2 5 1 6 1 1
2 1 1, the mode is 1, because it is the data value that occurs most often.
Numerical data
Histogram
Stem plot
Dot plot
A dot plot consists of a number line with each data point marked by a
dot; suitable for small sets of data only.
Describing the
distribution of a
numerical variable
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Review
Skills check
35
Multiple-choice questions
The following information relates to Questions 1 to 3
A survey collected information about the number of cars owned by a family and the car
size (small, medium, large).
1 The variables Number of cars owned and car Size are:
A both categorical variables
B both numerical variables
C a categorical and a numerical variable respectively
D a numerical and a categorical variable respectively
E neither numerical nor categorical variables
2 To graphically display the information about car size you could use a:
A dot plot
B stem plot
C histogram
D segmented bar chart
E back-to-back stemplot
3 The Number of cars owned is:
A a continuous numerical variable
C a continuous categorical variable
E none of the above
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
Review
36
11:21
Frequency
Count Percentage
73
70
29.2
19.2
23.6
59
250
4 The percentage of students who said that listening to music was their favourite
leisure activity is:
A 17.5
B 28.0
C 29.2
D 50.0
E 70.0
5 The number of students who said watching TV was their favourite leisure activity
is:
A 19
B 48
C 62
D 125
E 70.0
6 For the students surveyed, the most popular leisure activity is:
A sport
B listening to music
C watching TV
D other
E cant tell
Frequency
D 17
E 28
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
E 1820
1
1
2
2
3
0
5
3
5
0
2
5
3
7
1
6 9
4
9 9 9
2 4
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
Percentage
Other
Red
Black
60
50
40
30
20
10
0
Brown
Blonde
D red
Review
37
E other
15 The ages of 11 primary school children were collected. The best graph to display
the distribution of ages of these children would be a:
A bar chart
B dot plot
C histogram
D segment bar chart
E stem plot
Extended-response questions
1 One hundred and twenty-one students were
asked to identify their preferred leisure activity.
The results of the survey are displayed in a
bar chart.
30
Percentage
25
20
15
10
5
TV
M
us
i
M c
ov
i
Re es
ad
in
g
O
th
er
Sp
or
t
2 The number of people killed in natural and non-natural disasters in 1997 by world
region is shown in the table below.
a Construct a bar chart.
Region
Number killed
b In which region was the:
Europe
874
i greatest number of people killed?
Africa
8 327
ii least number of people killed?
Asia
10 551
Oceania
457
The Americas
1 581
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
Review
38
11:21
Legalise
Agree
Disagree
Dont know
Total
Frequency
Count Per cent
18
26
8
52
Report: In response to the question, `Do you agree that the use of marijuana
. Of the remaining
should be legalised?', 50% of the 52 students
% agreed, while
% said that they
.
students,
4 The table below gives the distribution of the number of children in 50 families.
a Is the number of children in a
family a numerical or categorical
variable?
b Write down the missing information.
c What is the mode?
d Determine the number of
families with:
i three children
ii two or three children
iii less than three children
e Determine the percentage of
families with:
i six children
ii more than six children
iii less than six children
Number of children
in family
0
1
2
3
4
5
6
7
8
Total
5
6
19
7
10
2
3
0
1
50
4
6
0
2
100
38
14
Frequency
10
5 Students were asked how much they
spent on entertainment each month. The
8
results are displayed in the histogram.
Use this information to answer the
6
following questions.
4
a How many students:
i were surveyed?
2
ii spent $100105 per month?
0
90
b What is the mode?
c How many students spent $110 or more per month?
d What percentage spent less than $100 per month?
Frequency
Count Per cent
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
100
110 120
Amount ($)
130
140
P1: FXS/ABE
P2: FXS
9780521740517c01.xml
CUAU031-EVANS
11:21
6 This stem plot displays the ages (in years) of a group of women.
a What was the age of the youngest woman?
b In terms of age, one of the women is a
Note: 17
possible outlier. What is her age?
17 2
c How many women were aged between
17 5
18 0
17.0 and 17.4 years, inclusive?
18 5
d How many women were 19 years old
19 1
or older?
19 8
e What is the modal age category?
f What percentage of women were younger 20
20 6
than 20 years old?
g i Name the shape of the distribution
of ages, noting outliers.
ii Locate the centre of the distribution.
iii Determine the spread of the distribution.
2 = 17.2 years
3 4
6 6 8 8 9 9
1 3 3 3 4
5 5 5 5 5 6 7 8 8 8 9
2 2 3 3
Review
39
10
Frequency
8
6
4
2
0
5 10 15 20 25 30 35 40 45 50 55
Waiting time (seconds)
8 Use a graphics calculator to construct histograms for the following sets of data.
a Use intervals of width 5 starting at 90.
Monthly expenditure on entertainment (in dollars)
110 115 105
95 114 104
97 130 122
93
65
68
74
73
73
75
71
72
61
67
66
50
66
64
72
74
48
41
44
44
49
48
48
ISBN: 9781107655904
Peter Jones, Michael Evans, Kay Lipson 2012
Photocopying is restricted under law and this material must not be transferred to another party
37