Académique Documents
Professionnel Documents
Culture Documents
1.1. Most students will prefer to work in seconds, to avoid having to work with decimals or
fractions.
1.2. Who? The individuals in the data set are students in a statistics class. What? There are
eight variables: ID (a label, with no units); Exam1, Exam2, Homework, Final, and Project
(in units in points, scaled from 0 to 100); TotalPoints (in points, computed from the other
scores, on a scale of 0 to 900); and Grade (A, B, C, D, and E). Why? The primary purpose
of the data is to assign grades to the students in this class, and (presumably) the variables
are appropriate for this purpose. (The data might also be useful for other purposes.)
1.3. Exam1 = 79, Exam2 = 88, Final = 88.
1.4. For this student, TotalPoints = 2 86 + 2 82 + 3 77 + 2 90 + 80 = 827, so the grade is B.
1.5. The cases are apartments. There are ve variables: rent (quantitative), cable (categorical),
pets (categorical), bedrooms (quantitative), distance to campus (quantitative).
1.6. (a) To nd injuries per worker, divide the rates in Example 1.6 by 100,000 (or, redo the
computations without multiplying by 100,000). For wage and salary workers, there are
0.000034 fatal injuries per worker. For self-employed workers, there are 0.000099 fatal
injuries per worker. (b) These rates are 1/10 the size of those in Example 1.6, or 10,000
times larger than those in part (a): 0.34 fatal injuries per 10,000 wage/salary workers, and
0.99 fatal injuries per 10,000 self-employed workers. (c) The rates in Example 1.6 would
probably be more easily understood by most people, because numbers like 3.4 and 9.9 feel
more familiar. (It might be even better to give rates per million worker: 34 and 99.)
1.7. Shown are two possible stemplots; the rst uses split
stems (described on page 11 of the text). The scores are
slightly left-skewed; most range from 70 to the low 90s.
5
6
6
7
7
8
8
9
9
58
0
58
0023
5558
00003
5557
0002233
8
5
6
7
8
9
58
058
00235558
000035557
00022338
1.8. Preferences will vary. However, the stemplot in Figure 1.8 shows a bit more detail, which
is useful for comparing the two distributions.
1.9. (a) The stemplot of the altered data is shown on the right. (b) Blank stems
should always be retained (except at the beginning or end of the stemplot),
because the gap in the distribution is an important piece of information about
the data.
53
1
2
2
3
3
4
4
5
6
5568
34
55678
012233
8
1
Chapter 1
Frequency
54
9
8
7
6
5
4
3
2
1
0
50
Frequency
Looking at DataDistributions
60
90
100
18
16
14
12
10
8
6
4
2
0
40
60
80
First exam scores
100
7
6
Frequency
70
80
First exam scores
5
4
3
2
1
0
55
60
65
70 75 80 85 90
First exam scores
95 100
1.13. Using either a stemplot or histogram, we see that the distribution is left-skewed, centered
near 80, and spread from 55 to 98. (Of course, a histogram would not show the exact values
of the maximum and minimum.)
1.14. (a) The cases are the individual employees. (b) The rst four (employee identication
number, last name, rst name, and middle initial) are labels. Department and education level
are categorical variables; number of years with the company, salary, and age are quantitative
variables. (c) Column headings in student spreadsheets will vary, as will sample cases.
1.15. A Web search for city rankings or best cities will yield lots of ideas, such as crime
rates, income, cost of living, entertainment and cultural activities, taxes, climate, and school
system quality. (Students should be encouraged to think carefully about how some of these
might be quantitatively measured.)
Solutions
55
1.16. Recall that categorical variables place individuals into groups or categories, while
quantitative variables take numerical values for which arithmetic operations. . . make sense.
Variables (a), (d), and (e)age, amount spent on food, and heightare quantitative. The
answers to the other three questionsabout dancing, musical instruments, and broccoliare
categorical variables.
1.18. Student answers will vary. A Web search for college ranking methodology gives
some ideas; in recent year, U.S. News and World Report used 16 measures of academic
excellence, including academic reputation (measured by surveying college and university
administrators), retention rate, graduation rate, class sizes, faculty salaries, student-faculty
ratio, percentage of faculty with highest degree in their elds, quality of entering students
(ACT/SAT scores, high school class rank, enrollment-to-admission ratio), nancial resources,
and the percentage of alumni who give to the school.
brown
gray
white
red
black
blue
yellow
orange
black
red
purple
green
40
35
30
25
20
15
10
5
0
blue
Percent
1.19. For example, blue is by far the most popular choice; 70% of respondents chose 3 of the
10 options (blue, green, and purple).
Favorite color
30
25
Percent
20
15
10
5
white
green
gray
yellow
purple
brown
orange
1.21. (a) There were 232 total respondents. The table that follows gives the percents; for
10 .
= 4.31%. (b) The bar graph is on the following page. (c) For example, 87.5%
example,
232
of the group were between 19 and 50. (d) The age-group classes do not have equal width:
The rst is 18 years wide, the second is 6 years wide, the third is 11 years wide, etc.
Note: In order to produce a histogram from the given data, the bar for the rst age
group would have to be three times as wide as the second bar, the third bar would have to
be wider than the second bar by a factor of 11/6, etc. Additionally, if we change a bars
56
Chapter 1
Looking at DataDistributions
width by a factor of x, we would need to change that bars height by a factor of 1/x.
70 and over
51 to 69
36 to 50
25 to 35
1 to 18
19 to 24
Percent
4.31%
41.81%
30.17%
15.52%
6.03%
2.16%
Percent
Age group
(years)
1 to 18
19 to 24
25 to 35
36 to 50
51 to 69
70 and over
40
35
30
25
20
15
10
5
0
1.22. (a) & (b) The bar graph and pie charts are shown below. (c) A clear majority (76%)
agree or strongly agree that they browse more with the iPhone than with their previous
phone. (d) Student preferences will vary. Some might prefer the pie chart because it is more
familiar.
Strongly
disagree
Response percent
50
40
30
Mildly
disagree
20
Strongly
agree
Mildly
agree
10
0
Strongly
disagree
25
Replacement percent
20
15
10
5
g
thi
n
he
Ot
No
ian
mb
kic
Sy
de
Si
ry
er
kB
lm
Pa
Bl
ow
ind
bil
o
sM
ac
zr
0
Ra
ola
Mildly
disagree
tor
Mildly
agree
Mo
Strongly
agree
Solutions
57
10
Paper
Metals
Other
Metals
15
Glass
Food scraps
20
Wood
25
Glass Other
Wood
Rubber, leather,
textile
Paper, paperboard
Plastics
30
Yard trimmings
1.24. (a) The weights add to 254.2 million tons, and the percents add to 99.9.
(b) & (c) The bar graph and pie chart are shown below.
Plastics
Yard trimmings
Food scraps
0
Source
60
60
50
50
Percent recycled
40
30
20
10
0
30
20
10
0
r
pe
s
ng
im
mi
Pa
tal
Me
Tr
mi
im
r
the
Tr
Ru
d
oo
ng
e
bb
s
tic
as
Material
as
Gl
be
b
Ru
Material
s
tal ape
P
Me
Pl
s
las
ps
ra
sc
od
Fo
40
Pl
Fo asti
od cs
sc
ra
ps
Percent recycled
1.25. (a) & (b) Both bar graphs are shown below. (c) The ordered bars in the graph from (b)
make it easier to identify those materials that are frequently recycled and those that are not.
(d) Each percent represents part of a different whole. (For example, 2.6% of food scraps are
recycled; 23.7% of glass is recycled, etc.)
oo
he
Ot
80
70
60
50
40
30
20
10
0
Google Yahoo
MSN
Other
58
Chapter 1
Looking at DataDistributions
20
15
15
10
10
0
Adult
Financial Health
Products Financial
Adult
Scams
Leisure
Health
Type of spam
Type of spam
10
8
6
4
2
rk
Au ey
str
a
Co lia
lom
bia
Ch
ile
Fra
nc
No e
rw
a
Sw y
ed
en
Me
Ve xico
ne
So zue
uth la
A
Ho frica
ng
Ko
ng
Eg
De ypt
nm
ark
Sp
ain
Ind
Ge ia
rm
an
y
Isr
ae
l
Ita
ly
Tu
do
na
Ca
ing
dK
Un
ite
da
0
m
1.28. (a) The bar graph is below. (b) The number of Facebook users trails off rapidly after the
top seven or so. (Of course, this is due in part to the variation in the populations of these
countries. For example, that Norway has nearly half as many Facebook users as France is
remarkable, because the 2008 populations of France and Norway were about 62.3 million
and 4.8 million, respectively.)
Country
1.29. (a) Most countries had moderate (single- or double-digit) increases in Facebook usages. Chile (2197%) is an extreme outlier, as are (maybe) Venezuela
(683%) and Colombia (246%). (b) In the stemplot on the right, Chile and
Venezuela have been omitted, and stems are split ve ways. (c) One observation is that, even without the outliers, the distribution is right-skewed. (d) The
stemplot can show some of the detail of the low part of the distribution, if the
outliers are omitted.
0
0
0
0
0
1
1
1
1
1
2
2
2
000
2333
4444
6
99
33
59
70
60
50
40
30
20
10
Theology
M.B.A.
M.D.
Law
Other M.S.
Other Ph.D.
Ed.D.
Other M.A.
0
M.Ed.
Solutions
Yel
low
Oth
er
ld
/go
Re
e
Blu
ite
Wh
rl
Gra
er
pea
ite
Wh
Silv
Bla
ck
Color
25
20
15
10
5
d
/go
l
rl
low
ite
ite
Wh
pea
Yel
Color
Re
e
Gra
y
Bla
ck
0
Blu
ld
er
Oth
/go
Re
low
10
Wh
Color
Yel
Blu
e
ite
Wh
rl
Gra
er
pea
ite
Wh
Silv
Bla
ck
15
er
Intermediate cars
Oth
10
20
er
15
Luxury cars
Silv
20
25
Percent
Graduate degree
1.32. This distribution is skewed to the right, meaning that Shakespeares plays contain many
short words (up to six letters) and fewer very long words. We would probably expect most
authors to have skewed distributions, although the exact shape and spread will vary.
60
Chapter 1
Looking at DataDistributions
1.33. Shown is the stemplot; as the text suggests, we have trimmed numbers (dropped the last digit) and split stems. 359 mg/dl appears to be
an outlier. Overall, glucose levels are not under control: Only 4 of the
18 had levels in the desired range.
0
1
1
2
2
3
3
Individual
22
99866655
22222
8
0
1
1
2
2
3
3
799
0134444
5577
0
57
5
Class
799
0134444
5577
0
57
5
1.35. The distribution is roughly symmetric, centered near 7 (or between 6 and 7), and
spread from 2 to 13.
1.36. (a) Totals emissions would almost certainly be higher for
0 00000000000000011111
0 222233333
very large countries; for example, we would expect that even
0 445
with great attempts to control emissions, China (with over
0 6677
1 billion people) would have higher total emissions than the
0 888999
1 001
smallest countries in the data set. (b) A stemplot is shown; a
1
histogram would also be appropriate. We see a strong right
1
skew with a peak from 0 to 0.2 metric tons per person and a
1 67
smaller peak from 0.8 to 1. The three highest countries (the
1 9
United States, Canada, and Australia) appear to be outliers;
apart from those countries, the distribution is spread from 0 to 11 metric tons per person.
1.37. To display the
0 000000000000000000000000000000000000011111111111111111111
0 2222222222222222233333333333333333333333
distribution, use
0 444444444444444444445555555555555555555
either a stemplot
0 666666666666666666667777777777777
or a histogram. DT
0 888888888888888999999999999999999
1 000000000000111111111
scores are skewed to
1 22222222222233333333333
the right, centered
1 444444455
near 5 or 6, spread
1 66666777
from 0 to 18. There
1 8
are no outliers. We
might also note that only 11 of these 264 women (about 4%) scored 15 or higher.
Solutions
61
Frequency
1.38. (a) The rst histogram shows two modes: 55.2 and 5.65.8. (b) The second histogram
has peaks in locations close to those of the rst, but these peaks are much less pronounced,
so they would usually be viewed as distinct modes. (c) The results will vary with the
software used.
18
16
14
12
10
8
6
4
2
0
4.2
4.6
5.4
5.8
6.2
Rainwater pH
6.6
18
16
14
12
10
8
6
4
2
0
4.14
4.54
4.94
6.54
6.94
1.39. Graph (a) is studying time (Question 4); it is reasonable to expect this to be right-skewed
(many students study little or not at all; a few study longer).
Graph (d) is the histogram of student heights (Question 3): One would expect a fair
amount of variation but no particular skewness to such a distribution.
The other two graphs are (b) handedness and (c) genderunless this was a particularly
unusual class! We would expect that right-handed students should outnumber lefties
substantially. (Roughly 10 to 15% of the population as a whole is left-handed.)
1.40. Sketches will vary. The distribution of coin years would be left-skewed because newer
coins are more common than older coins.
Women
Men
1.41. (a) Not only are most responses multiples of 10;
0 033334
many are multiples of 30 and 60. Most people will
96 0 66679999
round their answers when asked to give an estimate
22222221 1 2222222
888888888875555 1 558
like this; in fact, the most striking answers are ones
4440 2 00344
such as 115, 170, or 230. The students who claimed 360
2
3 0
minutes (6 hours) and 300 minutes (5 hours) may have
6 3
been exaggerating. (Some students might also consider
suspicious the student who claimed to study 0 minutes per night. As a teacher, I can easily
believe that such students exist, and I suspect that some of your students might easily accept
that claim as well.) (b) The stemplots suggest that women (claim to) study more than men.
The approximate centers are 175 minutes for women and 120 minutes for men.
62
Chapter 1
Looking at DataDistributions
1.42. The stemplot gives more information than a histogram (since all the
original numbers can be read off the stemplot), but both give the same impression. The distribution is roughly symmetric with one value (4.88) that
is somewhat low. The center of the distribution is between 5.4 and 5.5 (the
median is 5.46, the mean is 5.448); if asked to give a single estimate for the
true density of the earth, something in that range would be the best answer.
48
49
50
51
52
53
54
55
56
57
58
8
7
0
6799
04469
2467
03578
12358
59
5
1.43. (a) There are four variables: GPA, IQ, and self-concept are quantitative, while gender
is categorical. (OBS is not a variable, since it is not really a characteristic of a student.)
(b) Below. (c) The distribution is skewed to the left, with center (median) around 7.8. GPAs
are spread from 0.5 to 10.8, with only 15 below 6. (d) There is more variability among the
boys; in fact, there seems to be a subset of boys with GPAs from 0.5 to 4.9. Ignoring that
group, the two distributions have similar shapes.
0
1
2
3
4
5
6
7
8
9
10
5
8
4
4689
0679
1259
0112249
22333556666666788899
0000222223347899
002223344556668
01678
Female
4
7
952
4210
98866533
997320
65300
710
0
1
2
3
4
5
6
7
8
9
10
Male
5
8
4
689
069
1
129
223566666789
0002222348
2223445668
68
7
7
8
8
9
9
10
10
11
11
12
12
13
13
24
79
69
0133
6778
0022333344
555666777789
0000111122223334444
55688999
003344
677888
02
6
Solutions
63
2
2
3
3
4
4
5
5
6
6
7
7
8
01
8
0
5679
02344
6799
1111223344444
556668899
00001233344444
55666677777899
0000111223
0
190
180
170
160
150
140
1970 1975 1980 1985 1990 1995 2000 2005
Year
1.47. The total for the 24 countries was 897 days, so with Suriname, it is 897 + 694 = 1591
days, and the mean is x = 1591
25 = 63.64 days.
1.48. The mean score is x =
821
= 82.1.
10
1.49. To nd the ordered list of times, start with the 24 times in Example 1.23, and add 694 to
the end of the list. The ordered times (with median highlighted) are
4, 11, 14, 23, 23, 23, 23, 24, 27, 29, 31, 33, 40 ,
42, 44, 44, 44, 46, 47, 60, 61, 62, 65, 77, 694
The outlier increases the median from 36.5 to 40 days, but the change is much less than the
outliers effect on the mean.
1.50. The median of the service times is 103.5 seconds. (This is the average of the 40th and
41st numbers in the sorted list, but for a set of 80 numbers, we assume that most students
will compute the median using software, which does not require that the data be sorted.)
1.51. In order, the scores are:
55, 73, 75, 80, 80 , 85 , 90, 92, 93, 98
The middle two scores are 80 and 85, so the median is M =
80 + 85
= 82.5.
2
64
Chapter 1
Looking at DataDistributions
2
25
56
76
106
141
203
386
2
30
57
76
115
143
211
438
3
35
59
77
116
148
225
465
4
40
64
80
118
148
274
479
9
44
67
88
121
157
277
700
9
48
68
89
126
178
289
700
9
51
73
90
128
179
290
951
11
52
73
102
137
182
325
1148
19
54
75
103
138
199
367
2631
This conrms the ve-number summary (1, 54.5, 103.5, 200, and 2631 seconds)
given in Example 1.26. The sum of the 80 numbers is 15,726 seconds, so the mean is
x = 15,726
80 = 196.575 seconds (the value 197 in the text was rounded).
Note: The most tedious part of this process is sorting the numbers and adding them
all up. Unless you really want to conrm that your students can sort a list of 80 numbers,
consider giving the students the sorted list of times, and checking their ability to identify the
locations of the quartiles.
1.54. The median and quartiles were found earlier; the minimum and maximum are easy to
locate in the ordered list of scores (see the solutions to Exercises 1.51 and 1.52), so the
ve-number summary is Min = 55, Q 1 = 75, M = 82.5, Q 3 = 92, Max = 98.
1.55. Use the ve-number summary from the solution to Exercise 1.54:
95
90
85
80
75
70
65
60
55
50
1.56. The interquartile range is IQR = Q 3 Q 1 = 92 75 = 17, so the 1.5 IQR rule would
consider as outliers scores outside the range Q 1 25.5 = 49.5 to Q 3 + 25.5 = 117.5.
According to this rule, there are no outliers.
1.57. The variance can be computed from the formula s 2 =
1
(xi x)2 ; for
n1
example, the rst term in the sum would be (80 82.1)2 = 4.41. However, in practice,
1416.9
= 157.43 and
software or a calculator is the preferred approach; this yields s 2 =
9
.
s = s 2 = 12.5472.
Solutions
65
950
= 237.5 points.
4
Q1
4589
M
7558.5
Q3
13,416
Max
66,667
0
0
1
1
2
2
3
3
4
4
5
5
6
6
333333333333333333444444444444
55555555566666677777777778888889
00001112223333333
79
01111233
559
114
5
66
Chapter 1
All points
0 4
0
1
1
2
2
3
3 88
4 11111122222222223334444
4 555555666667777777778889999999999
5 000000011224
5 5666688999999
6 1
6 5
Looking at DataDistributions
Without ODouls
3 88
4 111111
4 2222222222333
4 4444555555
4 66666777777777
4 8889999999999
5 000000011
5 22
5 45
5 6666
5 88999999
6 1
6
6 5
1.63. All of these numbers are given in the table in the solution to the previous exercise.
(a) x changes from 4.76% (with) to 4.81% (without); the median (4.7%) does not change.
(b) s changes from 0.7523% to 0.5864%; Q 1 changes from 4.3% to 4.35%, while Q 3 = 5%
does not change. (c) A low outlier decreases x; any kind of outlier increases s. Outliers
have little or no effect on the median and quartiles.
1.64. (a) A stemplot or histogram can be used to display
the distribution. Students may report either mean/standard
deviation or the ve-number summary (in units of calories):
x
141.06
s
27.79
Min
70
Q1
113
M
145.5
Q3
157
Max
210
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
0
4556889
2458
00000000334
08
0235558
22333444555666788899
0012233356777
00012336669
01459
8
5
00
0
1.65. Use a small data set with an odd number of points, so that the median is the middle
number. After deleting the lowest observation, the median will be the average of that middle
number and the next number after it; if that latter number is much larger, the median will
change substantially. For example, start with 0, 1, 2 , 998, 1000; after removing 0, the
median changes from 2 to 500.
1.66. Salary distributions (especially in professional sports) tend to be skewed to the right. This
skew makes the mean higher than the median.
Solutions
67
1.67. (a) The distribution is left-skewed. While the skew makes the
ve-number summary is preferable, some students might give the
mean/standard deviation. In ounces, these statistics are:
x
6.456
s
1.425
Min
3.7
Q1
4.95
M
6.7
Q3
7.85
3
4
4
5
5
6
6
7
7
8
Max
8.2
7
3
7777
23
0033
7
03
668899999
2
(b) The numerical summary does not reveal the two weight clusters (visible in a stemplot or histogram). (c) For small potatoes (less than 6 oz),
n = 8, x = 4.662 oz, and s = 0.501 oz. For large potatoes, n = 17,
x = 7.300 oz, and s = 0.755 oz. Because there are clearly two groups, it seems appropriate
to treat them separately.
70
60
50
40
30
Frequency
1.68. (a) The ve-number summary is Min = 2.2 cm, Q 1 = 10.95 cm, M = 28.5 cm, Q 3 =
41.9 cm, Max = 69.3 cm. (b) & (c) The boxplot and histogram are shown below. (Students
might choose different interval widths for the histogram.) (d) Preferences will vary. Both
plots reveal the right-skew of this distribution, but the boxplot does not show the two peaks
visible in the histogram.
20
10
9
8
7
6
5
4
3
2
1
0
0
10
20 30 40 50 60 70
Diameter at breast height (cm)
80
70
30
60
25
50
40
30
Frequency
CRP (mg/l)
1.69. (a) The ve-number summary is Min = 0 mg/l, Q 1 = 0 mg/l, M = 5.085 mg/l, Q 3 =
9.47 mg/l, Max = 73.2 mg/l. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) Preferences will
vary. Both plots reveal the sharp right-skew of this distribution, but because Min = Q 1 , the
boxplot looks somewhat strange. The histogram seems to convey the distribution better.
20
15
10
20
10
10
20
30
40 50 60
CRP (mg/l)
70
80
90
1.70. Answers depend on whether natural (base-e) or common (base-10) logarithms are used. Both sets of answers
are shown here. If this exercise is assigned, it would
probably be best for the sanity of both instructor and
students to specify which logarithm to use.
(a) The ve-number summary is:
Logarithm
Natural
Common
Min
0
0
Q1
0
0
M
1.8048
0.7838
Q3
2.3485
1.0199
Max
4.3068
1.8704
Looking at DataDistributions
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
2
Base-10 log of (1+CRP)
Chapter 1
68
1.5
1
0.5
0
16
14
12
10
8
6
4
2
0
Frequency
Frequency
.
(The ratio between these answers is roughly ln 10 = 2.3.)
(b) & (c) The boxplots and histograms are shown below. (Students might choose different
interval widths for the histograms.) (d) As for Exercise 1.69, preferences will vary.
0.5
4.5
16
14
12
10
8
6
4
2
0
0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2
Base-10 log of (1+CRP)
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
14
12
Frequency
1.71. (a) The ve-number summary (in units of mol/l) is Min = 0.24, Q 1 = 0.355, M =
0.76, Q 3 = 1.03, Max = 1.9. (b) & (c) The boxplot and histogram are shown below.
(Students might choose different interval widths for the histogram.) (d) The distribution is
right-skewed. A histogram (or stemplot) is preferable because it reveals an important feature
not evident from a boxplot: This distribution has two peaks.
10
8
6
4
2
0
0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8
Retinol level (mol/l)
2.2
Solutions
69
1.72. The mean and standard deviation for these ratings are
.
x = 5.9 and s = 3.7719; the ve-number summary is
Min = Q 1 = 1, M = 6.5, Q 3 = Max = 10. For a graphical
presentation, a stemplot (or histogram) is better than a boxplot
because the latter obscures details about the distribution. (With
a little thought, one might realize that Min = Q 1 = 1 and
Q 3 = Max = 10 means that there are lots of 1s and lots
of 10s, but this is much more evident in a stemplot or histogram.)
1
2
3
4
5
6
7
8
9
10
0000000000000000
0000
0
0
00000
000
0
000000
00000
000000000000000000
1.73. The distribution of household net worth would almost surely be strongly skewed to the
right: Most families would generally have accumulated little or modest wealth, but a few
would have become rich. This strong skew pulls the mean to be higher than the median.
1.74. See also the solution to Exercise 1.36. (a) The venumber summary (in units of metric tons per person) is:
Min = 0, Q 1 = 0.75, M = 3.2, Q 3 = 7.8, Max = 19.9
The evidence for the skew is in the large gaps between the
higher numbers; that is, the differences Q 3 M and Max Q 3
are large compared to Q 1 Min and M Q 1 . (b) The IQR
is Q 3 Q 1 = 7.05, so outliers would be less than 9.825 or
greater than 18.375. According to this rule, only the United
States qualies as an outlier, but Canada and Australia seem
high enough to also include them.
0
0
0
0
0
1
1
1
1
1
00000000000000011111
222233333
445
6677
888999
001
67
9
.
1.75. The total salary is $690,000, so the mean is x = $690,000
= $76,667. Six of the nine
9
employees earn less than the mean. The median is M = $35,000.
1.76. If three individuals earn $0, $0, and $20,000, the reported median is $20,000. If the two
individuals with no income take jobs at $14,000 each, the median decreases to $14,000.
The same thing can happen to the mean: In this example, the mean drops from $20,000 to
$16,000.
1.77. The total salary is now $825,000, so the new mean is x =
median is unchanged.
1.78. Details at right.
11,200
= 1600
7
214,872
= 35,812 and
s2 =
6
.
x=
s=
35,812 = 189.24
$825,000
9
xi
1792
1666
1362
1614
1460
1867
1439
11200
.
= $91,667. The
xi x
192
66
238
14
140
267
161
0
(xi x)2
36864
4356
56644
196
19600
71289
25921
214872
70
Chapter 1
Looking at DataDistributions
1.79. The quote describes a distribution with a strong right skew: Lots of years with no losses
to hurricane ($0), but very high numbers when they do occur. For example, if there is one
hurricane in a 10-year period causing $1 million in damages, the average annual loss for
that period would be $100,000, but that does not adequately represent the cost for the year
of the hurricane. Means are not the appropriate measure of center for skewed distributions.
Women
Men
1.80. (a) x and s are appropriate for symmetric disx
s
x
s
tributions with no outliers. (b) Both high numbers
Before
165.2
56.5
117.2
74.2
are agged as outliers. For women, IQR = 60,
After
158.4 43.7
110.9 66.9
so the upper 1.5 IQR limit is 300 minutes. For
men, IQR = 90, so the upper 1.5 IQR limit is 285 minutes. The table on the right shows
the effect of removing these outliers.
1.81. (a) & (b) See the table on the right. In both cases,
the mean and median are quite similar.
pH
Density
x
5.4256
5.4479
s
0.5379
0.2209
M
5.44
5.46
1.82. See also the solution to Exercise 1.43. (a) The mean of
x
s
M
IQ
108.9 13.17
110
this distribution appears to be higher than 100. (There is
GPA 7.447 (2.1) 7.829
no substantial difference between the standard deviations.)
(b) The mean and median are quite similar; the mean is slightly smaller due to the slight left
skew of the data. (c) In addition to the mean and median, the standard deviation is shown for
reference (the exercise did not ask for it).
Note: Students may be somewhat puzzled by the statement in (b) that the median is
close to the mean (when they differ by 1.1), followed by (c), where they differ a bit
(when M x = 0.382). It may be useful to emphasize that we judge the size of such differ.
1.1
ences relative to the spread of the distribution. For example, we can note that 13.17
= 0.08
.
for (b), and 0.382
2.1 = 0.18 for (c).
1.83. With only two observations, the mean and median are always equal because the median
is halfway between the middle two (in this case, the only two) numbers.
1.84. (a) The mean (green arrow) moves along with the moving point (in fact, it moves in
the same direction as the moving point, at one-third the speed). At the same time, as long
as the moving point remains to the right of the other two, the median (red arrow) points to
the middle point (the rightmost nonmoving point). (b) The mean follows the moving point
as before. When the moving point passes the rightmost xed point, the median slides along
with it until the moving point passes the leftmost xed point, then the median stays there.
1.85. (a) There are several different answers, depending on the conguration of the rst ve
points. Most students will likely assume that the rst ve points should be distinct (no
repeats), in which case the sixth point must be placed at the median. This is because the
median of 5 (sorted) points is the third, while the median of 6 points is the average of the
third and fourth. If these are to be the same, the third and fourth points of the set of six
must both equal the third point of the set of ve.
The diagram below illustrates all of the possibilities; in each case, the arrow shows the
Solutions
71
location of the median of the initial ve points, and the shaded region (or dot) on the line
indicates where the sixth point can be placed without changing the median. Notice that there
are four cases where the median does not change, regardless of the location of the sixth
point. (The points need not be equally spaced; these diagrams were drawn that way for
convenience.)
(b) Regardless of the conguration of the rst ve points, if the sixth point is added so as to
leave the median unchanged, then in that (sorted) set of six, the third and fourth points must
be equal. One of these two points will be the middle (fourth) point of the (sorted) set of
seven, no matter where the seventh point is placed.
Note: If you have a student who illustrates all possible cases above, then it is likely that
the student either (1) obtained a copy of this solutions manual, (2) should consider a career
in writing solutions manuals, (3) has too much time on his or her hands, or (4) both 2 and
3 (and perhaps 1) are true.
1.86. The ve-number summaries (all in millimeters) are:
Q1
46.71
38.07
35.45
M
47.12
39.16
36.11
Q3
48.245
41.69
36.82
Max
50.26
43.09
38.13
x
47.5975
39.7113
36.1800
s
1.2129
1.7988
0.9753
bihai
46 3466789
47 114
48 0133
49
50 12
48
Length (mm)
bihai
red
yellow
Min
46.34
37.40
34.57
50
46
44
42
40
38
36
34
bihai
red
yellow
Heliconia variety
red
37
38
39
40
41
42
43
4789
0012278
167
56
4699
01
0
yellow
34 56
35 146
36 0015678
37 01
38 1
(b) Bihai and red appear to be right-skewed (although it is difcult to tell with such small
samples). Skewness would make these distributions unsuitable for x and s.
72
Chapter 1
Looking at DataDistributions
.
1.88. (a) The mean is x = 15, and the standard deviation is s = 5.4365. (b) The mean is still
15; the new standard deviation is 3.7417. (c) Using the mean as a substitute for missing data
will not change the mean, but it decreases the standard deviation.
1.89. The minimum and maximum are easily determined to be 1 and 12 letters, and the
quartiles and median can be found by adding up the bar heights. For example, the rst
two bars have total height 22.3% (less than 25%), and adding the third bar brings the total
to 45%, so Q 1 must equal 3 letters. Continuing this way, we nd that the ve-number
summary, in units of letters, is:
Min = 1, Q 1 = 3, M = 4, Q 3 = 5, Max = 12
Note that even without the frequency table given in the data le, we could draw the same
conclusion by estimating the heights of the bars in the histogram.
1.90. Because the mean is to be 7, the ve numbers must add up to 35. Also, the third number
(in order from smallest to largest) must be 10 because that is the median. Beyond that, there
is some freedom in how the numbers are chosen.
Note: It is likely that many students will interpret positive numbers as meaning
positive integers only, which leads to eight possible solutions, shown below.
1 1 10 10 13
1 3 10 10 11
1 1 10 11 12
1 4 10 10 10
1 2 10 10 12
2 2 10 10 11
1 2 10 11 11
2 3 10 10 10
1.91. The simplest approach is to take (at least) six numberssay, a, b, c, d, e, f in increasing
order. For this set, Q 3 = e; we can cause the mean to be larger than e by simply choosing
f to be much larger than e. For example, if all numbers are nonnegative, f > 5e would
accomplish the goal because then
e+ f
e + 5e
a+b+c+d +e+ f
>
>
= e.
x=
6
6
6
1.92. The algebra might be a bit of a stretch for some students:
=
(x1 x) +
(x2 x) +
(x3 x) + + (xn1 x) +
(xn x)
x1 x +
x2 x +
x3 x + + xn1 x +
xn x
x1 + x2 + x3 + + xn1 + xn
x x x x x
x1 + x2 + x3 + + xn1 + xn
nx
Solutions
73
rst squared deviation 152 , but the other three are only 52 . Our best choice is two at each
extreme, which makes all four squared deviations equal to 102 .
1.94. Answers will vary. Typical calculators will carry only about 12 to 15 digits; for example,
a TI-83 fails (gives s = 0) for 14-digit numbers. Excel (at least the version I checked) also
fails for 14-digit numbers, but it gives s = 262,144 rather than 0. The (very old) version of
Minitab used to prepare these answers fails at 20,000,001 (eight digits), giving s = 2.
1.95. The table on the right reproduces the
(in mm)
(in inches)
Variety
x
s
x
s
means and standard deviations from the
bihai
47.5975
1.2129
1.874
0.04775
solution to Exercise 1.87 and shows those
red
39.7113 1.7988 1.563 0.07082
values expressed in inches. For each converyellow 36.1800 0.9753 1.424 0.03840
sion, multiply by 39.37/1000 = 0.03937 (or
divide by 25.4an inch is dened as 25.4 millimeters). For example, for the bihai variety,
x = (47.5975 mm)(0.03937 in/mm) = (47.5975 mm) (25.4 mm/in) = 1.874 in.
1.96. (a) x = 5.4479 and s = 0.2209. (b) The rst measurement corresponds to
5.50 62.43 = 343.365 pounds per cubic foot. To nd x new and snew , we similarly multiply
.
.
by 62.43: x new = 340.11 and snew = 13.79.
Note: The conversion from cm to feet is included in the multiplication by 62.43; the
step-by-step process of this conversion looks like this:
(1 g/cm3 )(0.001 kg/g)(2.2046 lb/kg)(30.483 cm3/ft3 ) = 62.43 lb/ft3
.
1.97. Convert from kilograms to pounds by multiplying by 2.2: x = (2.42 kg)(2.2 lb/kg) =
.
5.32 lb and s = (1.18 kg)(2.2 lb/kg) = 2.60 lb.
1.98. Variance is changed by a factor of 2.542 = 6.4516; generally, for a transformation
xnew = a + bx, the new variance is b2 times the old variance.
1.99. There are 80 service times, so to nd the 10% trimmed mean, remove the highest and
lowest eight values (leaving 64). Remove the highest and lowest 16 values (leaving 48) for
the 20% trimmed mean.
The mean and median for the full data set are x = 196.575 and M = 103.5 minutes. The
.
.
10% trimmed mean is x = 127.734, and the 20% trimmed mean is x = 111.917 minutes.
Because the distribution is right-skewed, removing the extremes lowers the mean.
74
Chapter 1
Looking at DataDistributions
12
25
10
20
Frequency
1.100. After changing the scale from centimeters to inches, the ve-number summary values
change by the same ratio (that is, they are multiplied by 0.39). The shape of the histogram
might change slightly because of the change in class intervals. (a) The ve-number
summary (in inches) is Min = 0.858, Q 1 = 4.2705, M = 11.115, Q 3 = 16.341, Max =
27.027. (b) & (c) The boxplot and histogram are shown below. (Students might choose
different interval widths for the histogram.) (d) As in Exercise 1.56, the histogram reveals
more detail about the shape of the distribution.
15
10
8
6
4
2
0
0
10
15
20
25
30
Diameter at breast height (in)
35
1.101. Take the mean plus or minus two standard deviations: 572 2(51) = 470 to 674.
1.102. Take the mean plus or minus three standard deviations: 572 3(51) = 419 to 725.
1.103. The z-score is z =
620 572
51
.
= 0.94.
572 .
1.104. The z-score is z = 510 51
= 1.22. This is negative because an ISTEP score of 510 is
below average; specically, it is 1.22 standard deviations below the mean.
.
1.105. Using Table A, the proportion below 620 (z = 0.94)
is 0.8264 and the proportion at or above is 0.1736; these
two proportions add to 1. The graph on the right illustrates this with a single curve; it conveys essentially the
same idea as the graphical subtraction picture shown in
Example 1.36.
.
1.106. Using Table A, the proportion below 620 (z = 0.94)
.
is 0.8264, and the proportion below 660 (z = 1.73) is
0.9582. Therefore:
620
0.8264
419
470
0.1736
521
572
623
674
725
620 660
0.8264
0.9582
area between
area left
area left
=
0.9582
419
470
521
572
623
674
0.8264
The graph on the right illustrates this with a single curve; it conveys essentially the same
idea as the graphical subtraction picture shown in Example 1.37.
725
Solutions
75
.
1.107. Using Table A, this ISTEP score should correspond to a standard score of z = 0.67
.
(software gives 0.6745), so the ISTEP score (unstandardized) is 572 + 0.67(51) = 606.2
(software: 606.4).
.
1.108. Using Table A, x should correspond to a standard score of z = 0.84 (software gives
.
0.8416), so the ISTEP score (unstandardized) is x = 572 0.84(51) = 529.2 (software:
529.1).
1.109. Of course, student sketches will not
be as neat as the curves on the right,
but they should have roughly the correct
shape. (a) It is easiest to draw the curve
1
4
7
10
13
16
19
22
25
28
rst, and then mark the scale on the
axis. (b) Draw a copy of the rst curve, with the peak over 20. (c) The curve has the same
shape, but is translated left or right.
1.110. (a) As in the previous exercise, draw the curve
rst, and then mark the scale on the axis. (b) In order
to have a standard deviation of 1, the curve should be
1/3 as wide, and three times taller. (c) The curve is
centered at the same place (the mean), but its height
and width change. Specically, increasing the standard
deviation makes the curve wider and shorter; decreasing the standard deviation makes the curve narrower
and taller.
10
13
16
19
76
Chapter 1
Looking at DataDistributions
Women
Men
1.113. (a) Ranges are given in the table on
68%
8489
to
20,919
7158
to 22,886
the right. In both cases, some of the lower
95%
2274
to
27,134
706
to
30,750
limits are negative, which does not make
99.7%
3941
to
33,349
8,570
to
38,614
sense; this happens because the womens
distribution is skewed, and the mens distribution has an outlier. Contrary to the conventional
wisdom, the mens mean is slightly higher, although the outlier is at least partly responsible
for that. (b) The means suggest that Mexican men and women tend to speak more than people of the same gender from the United States.
68
54
92
75
73
98
64
55
80
70
0.2
1.6
2.2
0.5
0.3
2.8
0.6
1.5
1
0
0.35
0.35 0.65
1.118. The mean and median both equal 0.5; the quartiles are Q 1 = 0.25 and Q 3 = 0.75.
1.119. (a) Mean is C, median is B (the right skew pulls the mean to the right). (b) Mean A,
median A. (c) Mean A, median B (the left skew pulls the mean to the left).
Solutions
1.120. Hint: It is best to draw the curve rst, then place
the numbers below it. Students may at rst make mistakes like drawing a half-circle instead of the correct
bell-shaped curve, or being careless about locating the
standard deviation.
77
218
234
250
266
282
298
314
1.121. (a) The applet shows an area of 0.6826 between 1.000 and 1.000, while the
689599.7 rule rounds this to 0.68. (b) Between 2.000 and 2.000, the applet reports
0.9544 (compared to the rounded 0.95 from the 689599.7 rule). Between 3.000 and
3.000, the applet reports 0.9974 (compared to the rounded 0.997).
1.122. See the sketch of the curve in the solution to Exercise 1.120. (a) The middle 95% fall
within two standard deviations of the mean: 266 2(16), or 234 to 298 days. (b) The
shortest 2.5% of pregnancies are shorter than 234 days (more than two standard deviations
below the mean).
1.123. (a) 99.7% of horse pregnancies fall within three standard deviations of the mean: 336 3(3), or 327 to 325
days. (b) About 16% are longer than 339 days since 339
days or more corresponds to at least one standard devia327 330 333 336 339 342 345
tion above the mean.
Note: This exercise did not ask for a sketch of the Normal curve, but students should be
encouraged to make such sketches anyway.
1.124. Because the quartiles of any distribution have 50% of
observations between them, we seek to place the ags so
that the reported area is 0.5. The closest the applet gets
is an area of 0.5034, between 0.680 and 0.680. Thus,
the quartiles of any Normal distribution are about 0.68
standard deviations above and below the mean.
Note: Table A places the quartiles at about 0.67;
other statistical software gives 0.6745.
1.125. The mean and standard deviation are x = 5.4256 and s = 0.5379. About 67.62%
.
(71/105 = 0.6476) of the pH measurements are in the range x s = 4.89 to 5.96. About
95.24% (100/105) are in the range x 2s = 4.35 to 6.50. All (100%) are in the range
x 3s = 3.81 to 7.04.
78
Chapter 1
(a)
Looking at DataDistributions
(b)
1.65
1.65
0.76
(c)
0.76
(d)
1.65
(a)
(d)
1.6
1.8
(c)
(b)
1.8
1.6
1.8
(b)
0.22
0.40
(b)
0.65
0.45
1.130. 70 is two standard deviations below the mean (that is, it has standard score z = 2), so
about 2.5% (half of the outer 5%) of adults would have WAIS scores below 70.
1.131. 130 is two standard deviations above the mean (that is, it has standard score z = 2), so
about 2.5% of adults would score at least 130.
1509 .
1.132. Tonyas score standardizes to z = 1820321
= 0.9688, while Jermaines score
.
29 21.5
corresponds to z = 5.4 = 1.3889. Jermaines score is higher.
.
1.133. Jacobs score standardizes to z = 16 5.421.5 = 1.0185, while Emilys score corresponds
.
1509
to z = 1020321
= 1.5234. Jacobs score is higher.
1509 .
1.134. Joses score standardizes to z = 2080321
= 1.7788, so an equivalent ACT score is
.
21.5 + 1.7788 5.4 = 31.1. (Of course, ACT scores are reported as whole numbers, so this
would presumably be a score of 31.)
Solutions
79
.
= 1.5741, so an equivalent SAT score is
30 21.5
5.4
2090 1509
321
19 21.5
5.4
.
= 1.81, for which Table A gives 0.9649.
.
= 0.4630, for which Table A gives 0.3228.
1.138. 1920 and above: The top 10% corresponds to a standard score of z = 1.2816, which in
.
turn corresponds to a score of 1509 + 1.2816 321 = 1920 on the SAT.
1.139. 1239 and below: The bottom 20% corresponds to a standard score of z = 0.8416,
.
which in turn corresponds to a score of 1509 0.8416 321 = 1239 on the SAT.
1.140. The quartiles of a Normal distribution are 0.6745 standard deviations from the mean,
.
so for ACT scores, they are 21.5 0.6745 5.4 = 17.9 to 25.1.
1.141. The quintiles of the SAT score distribution are 1509 0.8416 321 = 1239,
1509 0.2533 321 = 1428, 1509 + 0.2533 321 = 1590, and 1509 + 0.8416 321 = 1779.
1.142. For a Normal distribution with mean 55 mg/dl and standard deviation 15.5 mg/dl:
55 .
(a) 40 mg/dl standardizes to z = 4015.5
= 0.9677. Using Table A, 16.60% of women fall
55 .
= 0.3226.
below this level (software: 16.66%). (b) 60 mg/dl standardizes to z = 6015.5
Using Table A, 37.45(c) Subtract the answers from (a) and (b) from 100%: Table A gives
45.95% (software: 45.99%), so about 46% of women fall in the intermediate range.
1.143. For a Normal distribution with mean 46 mg/dl and standard deviation 13.6 mg/dl:
46 .
(a) 40 mg/dl standardizes to z = 4013.6
= 0.4412. Using Table A, 33% of men fall below
46 .
this level (software: 32.95%). (b) 60 mg/dl standardizes to z = 6013.6
= 1.0294. Using
Table A, 15.15(c) Subtract the answers from (a) and (b) from 100%: Table A gives 51.85%
(software: 51.88%), so about 52% of men fall in the intermediate range.
1.144. (a) About 0.6% of healthy young adults have osteoporosis (the cumulative probability
below a standard score of 2.5 is 0.0062). (b) About 31% of this population of older
women has osteoporosis: The BMD level which is 2.5 standard deviations below the young
adult mean would standardize to 0.5 for these older women, and the cumulative probability
for this standard score is 0.3085.
1.145. (a) About 5.2%: x < 240 corresponds to z < 1.625. Table A gives 5.16% for
1.63 and 5.26% for 1.62. Software (or averaging the two table values) gives 5.21%.
(b) About 54.7%: 240 < x < 270 corresponds to 1.625 < z < 0.25. The area to the
left of 0.25 is 0.5987; subtracting the answer from part (a) leaves about 54.7%. (c) About
279 days or longer: Searching Table A for 0.80 leads to z > 0.84, which corresponds to
x > 266 + 0.84(16) = 279.44. (Using the software value z > 0.8416 gives x > 279.47.)
80
Chapter 1
Looking at DataDistributions
1.146. (a) The quartiles for a standard Normal distribution are 0.6745. (b) For a N (, )
distribution, Q 1 = 0.6745 and Q 3 = + 0.6745 . (c) For human pregnancies,
.
.
Q 1 = 266 0.6745 16 = 255.2 and Q 3 = 266 + 0.67455 16 = 276.8 days.
1.147. (a) As the quartiles for a standard Normal distribution are 0.6745, we have
IQR = 1.3490. (b) c = 1.3490: For a N (, ) distribution, the quartiles are
Q 1 = 0.6745 and Q 3 = + 0.6745 .
1.148. In the previous two exercises, we found that for a N (, ) distribution,
Q 1 = 0.6745 , Q 3 = + 0.6745 , and IQR = 1.3490 . Therefore,
1.5 IQR = 2.0235 , and the suspected outliers are below Q 1 1.5 IQR = 2.698 ,
and above Q 3 + 1.5 IQR = + 2.698 . The percentage outside of this range is
2 0.0035 = 0.70%.
1.149. (a) The rst and last deciles for a standard Normal distribution are 1.2816. (b) For
.
a N (9.12, 0.15) distribution, the rst and last deciles are 1.2816 = 8.93 and
.
+ 1.2816 = 9.31 ounces.
1.150. The shape of the quantile plot suggests that the data are right-skewed (as was observed
in Exercises 1.36 and 1.74). This can be seen in the at section in the lower leftthese
numbers were less spread out than they should be for Normal dataand the three apparent
outliers (the United States, Canada, and Australia) that deviate from the line in the upper
right; these were much larger than they would be for a Normal distribution.
1.151. (a) The plot is reasonably linear except for the point in the upper right, so this
distribution is roughly Normal, but with a high outlier. (b) The plot is fairly linear, so
the distribution is roughly Normal. (c) The plot curves up to the rightthat is, the large
values of this distribution are larger than they would be in a Normal distributionso the
distribution is skewed to the right.
5.8
5.6
Density
5.4
5.2
5
4.8
3
1
0
1
Normal score
Solutions
81
1.153. (a) All three quantile plots are below; the yellow variety is the nearest to a straight line.
(b) The other two distributions are slightly right-skewed (the lower-left portion of the graph
is somewhat at); additionally, the bihai variety appears to have a couple of high outliers.
H. caribaea red
43
38 H. caribaea yellow
42
49
37
41
48
36
40
39
47
35
38
46
37
3
1
0
1
Normal score
34
3
1
0
1
Normal score
1
0
1
Normal score
1.154. Shown are a histogram and quantile plot for one sample of 200 simulated N (0, 1)
points. Histograms will vary slightly but should suggest a bell curve. The Normal quantile
plot shows something fairly close to a line but illustrates that, even for actual Normal data,
the tails may deviate slightly from a line.
3
Simulated values
50
Frequency
40
30
20
2
1
0
10
2
3
1
0
1
Normal score
1.155. Shown are a histogram and quantile plot for one sample of 200 simulated uniform data
points. Histograms will vary slightly but should suggest the density curve of Figure 1.34
(but with more variation than students might expect). The Normal quantile plot shows that,
compared to a Normal distribution, the uniform distribution does not extend as low or as
high (not surprising, since all observations are between 0 and 1).
Simulated values
25
Frequency
50 H. bihai
20
15
10
5
0
0
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
3
1
0
1
Normal score
82
Chapter 1
Looking at DataDistributions
Hatchback
00
Hatchback
x
22.548
s
3.423
Min
16
Q1
20
M
21.5
Q3
25
Max
30
Large
sedan
16.571
1.425
13
16
17.0
17
19
0
000
00000000
0000000
00
00
00000
000
0
00000
0
0
0
1.157. (a) The distribution appears to be roughly Normal. (b) One could
justify using either the mean and standard deviation or the ve-number
summary:
x
15.27%
s
3.118%
Min
8.2%
Q1
13%
M
15.5%
Q3
17.6%
Max
22.8%
(c) For example, binge drinking rates are typically 10% to 20%. Which
states are high, and which are low? One might also note the geographical
distribution of states with high binge-drinking rates: The top six states
(Wisconsin, North Dakota, Iowa, Minnesota, Illinois, and Nebraska) are
all adjacent to one another.
1.158. (a) The stemplot on the right suggests that there are two groups of
states: the under-23% and over-23% groups. Additionally, while they do
not qualify as outliers, Oklahoma (16.3%) and Vermont (30%) stand out
as notably low and high. (b) One could justify using either the mean and
standard deviation or the ve-number summary:
x
23.71%
s
3.517%
Min
16.3%
Q1
20.8%
M
24.3%
Q3
26.4%
Max
30%
Neither summary reveals the two groups of states visible in the stemplot.
(c) One could explore the connections (geographical, socioeconomic, etc.)
between the states in the two groups; for example, the top group includes
many northeastern states, while the bottom group includes quite a few
southern states.
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Large sedan
00
00
00000000
0000000000
0000
00
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
28
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
58
34
023689
015788
0077
13466889
01567
45677789
8
148
2
6
8
14678
4679
268
346899
3488
12446
023468
02346
0455
355679
0
Solutions
83
Percent
North America
25
South America
Percent
20
15
Europe
10
China
South Korea
Japan
Silver
White
Gray
Black
Blue
Red
Brown
Other
Q1
3
M
12.5
Q3
34
Max
86
Frequency
Color
80
70
60
50
40
30
20
10
0
84
Chapter 1
Looking at DataDistributions
s
22.05
Min
1.32
Q1
18.68
M
43.185
Q3
54.94
Max
85.65
Baltimore
Boston
Chicago
Long Beach
Los Angeles
Miami
Minneapolis
New York
Oakland
Philadelphia
San Francisco
Washington, D.C.
7.82
8.26
4.02
6.25
8.07
3.67
14.87
6.23
9.30
7.04
7.61
13.12
40000
30000
20000
10000
0
ore
Bos
t
Ch on
i
c
a
Lon
g
gB o
Los each
Ang
ele
s
Min Miam
nea i
po
Ne lis
wY
o
Oa rk
Phi kland
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.
tim
Bal
14
Acres of open space
per 1000 people
Bal
14
12
10
8
6
4
2
12
10
8
6
4
2
Wa
ore
tim
Min
n
shi eapo
ngt
on, lis
D.C
.
Oa
kla
nd
B
Los oston
Ang
ele
s
B
San altimo
Fra re
n
Phi cisco
lad
e
Lon lphia
gB
ea
Ne ch
wY
o
Ch rk
ica
go
Mia
mi
0
Bos
t
Ch on
i
c
a
Lon
g
gB o
Los each
Ang
ele
s
Min Miam
nea i
po
Ne lis
wY
o
Oa rk
Phi kland
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.
0
Bal
145789
23488889
5
0134467
124666669
022345688
223
026
15
50000
8000
7000
6000
5000
4000
3000
2000
1000
0
tim
ore
Bos
t
Ch on
Lon icago
gB
Los each
Ang
ele
s
M
Min iam
nea i
po
Ne lis
wY
o
Oa rk
k
l
and
Phi
la
San delph
Wa Fran ia
shi
ngt cisco
on,
D.C
.
Population (thousands)
1.164. (a) & (b) The graphs are below. Bars are shown in alphabetical order by city name (as the data were given in the table).
.
(c) For Baltimore, for example, this rate is 5091
651 = 7.82. The
complete table is shown on the right. (d) & (e) Graphs below.
Note that the text does not specify whether the bars should be
ordered by increasing or decreasing rate. (f) Preferences may
vary, but the ordered bars make comparisons easier.
0
1
2
3
4
5
6
7
8
Solutions
85
1.165. The given description is true on the average, but the curves (and a few calculations)
give a more complete picture. For example, a score of about 675 is about the 97.5th
percentile for both genders, so the top boys and girls have very similar scores.
1.166. (a) & (b) Answers will vary. Denitions might be as simple as free time, or time
spent doing something other than studying. For part (b), it might be good to encourage
students to discuss practical difculties; for example, if we ask Sally to keep a log of her
activities, the time she spends lling it out presumably reduces her available leisure time.
1.167. Shown is a stemplot; a histogram
should look similar to this. This distribution is relatively symmetric apart from
one high outlier. Because of the outlier,
the ve-number summary (in hours) is
preferred:
22 23.735 24.31 24.845 28.55
Alternatively, the mean and standard
deviation are x = 24.339 and s = 0.9239
hours.
22
22
23
23
24
24
25
25
26
26
27
27
28
28
013
7899
000011222233344444
55566666667777778888888999
00000011111112222222223333333333444444
555555666666666777777888888999999
00001111233344
56666889
2
56
2
5
Subscribers (millions)
1.168. Gender and automobile preference are categorical; age and household income are
quantitative.
25
20
15
10
5
mc
Co
AT
&T
Ro
a
adR st
unn
er
Am Veriz
eric
o
aO n
nlin
e
Ear
thL
ink
Ch
arte
r
Q
Ca west
ble
vis
Un
ited ion
On
line
Oth
er
1.170. Womens weights are skewed to the right: This makes the mean higher than the median,
and it is also revealed in the differences M Q 1 = 14.9 lb and Q 3 M = 24.1 lb.
1.171. (a) For car makes (a categorical variable), use either a bar graph or pie chart. For
car age (a quantitative variable), use a histogram, stemplot, or boxplot. (b) Study time is
quantitative, so use a histogram, stemplot, or boxplot. To show change over time, use a time
plot (average hours studied against time). (c) Use a bar graph or pie chart to show radio
station preferences. (d) Use a Normal quantile plot to see whether the measurements follow
a Normal distribution.
Chapter 1
Spam count
86
Looking at DataDistributions
1800
1600
1400
1200
1000
800
600
400
200
0
AA BB CC DD EE FF GG HH II JJ KK LL other
Account ID
1.173. No, and no: It is easy to imagine examples of many different data sets with mean 0 and
standard deviation 1for example, {1,0,1} and {2,0,0,0,0,0,0,0,2}.
Likewise, for any given ve numbers a b c d e (not all the same), we can
create many data sets with that ve-number summary, simply by taking those ve numbers
and adding some additional numbers in between them, for example (in increasing order):
10,
, 20,
,
, 30,
,
, 40,
, 50. As long as the number in the rst blank is
between 10 and 20, and so on, the ve-number summary will be 10, 20, 30, 40, 50.
1.174. The time plot is shown below; because of the great detail in this plot, it is larger than
other plots. Ruths and McGwires league-leading years are marked with different symbols.
(a) During World War II (when many baseball players joined the military), the best home
run numbers decline sharply and steadily. (b) Ruth seemed to set a new standard for other
players; after his rst league-leading year, he had 10 seasons much higher than anything that
had come before, and home run production has remained near that same level ever since
(even the worst post-Ruth year1945had more home runs than the best pre-Ruth season).
While some might argue that McGwires numbers also raised the standard, the change is
not nearly as striking, nor did McGwire maintain it for as long as Ruth did. (This is not
necessarily a criticism of McGwire; it instead reects that in baseball, as in many other
endeavors, rates of improvement tend to decrease over time as we reach the limits of human
ability.)
70
60
50
40
30
20
10
0
1880
1900
1920
1940
Year
1960
1980
2000
Solutions
1.175. Bondss mean changes from 36.56 to 34.41 home runs (a drop of 2.15),
while his median changes from 35.5 to 34 home runs (a drop of 1.5). This
illustrates that outliers affect the mean more than the median.
87
1
2
2
3
3
4
4
5
5
6
6
7
69
4
55
3344
77
02
5669
1.176. Recall the texts description of the effects of a linear transformation xnew = a + bx: The
mean and standard deviation are each multiplied by b (technically, the standard deviation
is multiplied by |b|, but this problem species that b > 0). Additionally, we add a to the
(new) mean, but a does not affect the standard deviation. (a) The desired transformation
is xnew = 40 + 2x; that is, a = 40 and b = 2. (We need b = 2 to double the standard
deviation; as this also doubles the mean, we then subtract 40 to make the new mean 100.)
.
1 .
(b) xnew = 45.4545 + 1.8182x; that is, a = 49 11
= 49.0909 and b = 20
11 = 1.8182.
5
(This choice of b makes the new standard deviation 20 and the new mean 145 11
; we then
subtract 45.4545 to make the new mean 100.) (c) Davids score2 72 40 = 104is
.
higher within his class than Nancys score1.8182 78 45.4545 = 96.4is
within her class. (d) A third-grade score of 75 corresponds to a score of 110 from the
100
N (100, 20) distribution, which has a standard score of z = 110 20
= 0.5. (Alternatively,
70
= 0.5.) A sixth-grade score of 75 corresponds to about 90.9 on the transformed
z = 75 10
100
80 .
scale, which has standard score z = 90.920
= 0.45. Therefore, about 69% of
= 75 11
third graders and 32% of sixth graders score below 75.
24 02
7 3
.
with mean 25 and standard deviation 8/ 30 = 1.46,
24 89
7
25 3
8 113
so that about 99.7% of the time, one should nd x
25 6799
8 789
between 20.6 and 29.4. Meanwhile, the theoretical dis26 124
9 000
tribution of s is nearly Normal (slightly skewed) with
26
9 556
59
.
.
mean = 7.9313 and standard deviation = 1.0458; about
27 4
10 2
99.7% of the time, s will be between 4.8 and 11.1.
Note: If we take a sample of sizen from a Normal distribution and compute the sample standard deviation S, then (S/ ) n 1 has a chi distribution with n 1 degrees of
freedom (which looks like a Normal distribution when n is reasonably large). You can learn
all you would want to knowand moreabout this distribution on the Web (for example, at
Wikipedia). One implication
of this is
that on the average, s underestimates ; specically,
2
(n/2)
the mean of S is n 1 (n/2 1/2) . The factor in parentheses is always less than 1, but
approaches 1 as n approaches innity. The proof of this fact is left as an exercisefor the
instructor, not for the average student!