Vous êtes sur la page 1sur 3

Page 1/3

STAT 1012: Statistics for Life Science


2013/14 Term 2
Solution to Practice Problem (Chapter 1)

Problem 1: a) Quantitative (Continuous) Variable, b) Quantitative (Discrete) Variable, c) and
d) are Categorical Variables

Problem 2: 1, 3 and 4 are continuous variables as they can be represented by infinite
decimals

Problem 3: a) Median for Men = Median for Women = 0
Mean for Men=(1541+2*43)/8658 = 0.1879, Mean for Women=(2773+2*105)/8739=0.3413
b) In this example, Median=0 only tells us that more than half of the observations are zero.
The mean values, however, are able to tell us the number of times married on average for
each gender.

Problem 4: (b) The transformation y=2x
3
is a strictly increasing function, and therefore the
mode of y
1
, y
2
, .. y
10
should be equal to the mode of x
1
, x
2
, .. x
10
under the same
transformation. That is, Mode
y
= 2(Mode
x
)
3
=2(0)=0

Problem 5: Data Set A: Mean is not good in the presence of outlier (70), Mode (=23) is not
good because it is also the minimum value. Median (=29.5) should therefore be the best
measure of location.
Data Set B: For a distribution which is unimodal and roughly symmetric, all the mean,
median and mode should be good as measures of location.
Data Set C: Both median and mode (=1) can only show that there are more students taking
STAT1012 than without. However, the sample mean (=0.8) is able to show the proportion of
students taking STAT1012, which is more informative.

Problem 6:
a) Histogram should be used, as the number
of children is a quantitative variable. In a
bar graph, the data values do not have to be
ordered, and therefore unable to figure out
the shape of the distribution.
c) right-skewed
d) 2 modes: 1 and 2


Page 2/3

Problem 7: (c) (i) is not true, with counterexample 20 20 20 20 20 20 21 21 21 21 21
(ii) is true, as no mode implies one data point in each of 11, 12, 21. Hence median =16
(iii) is true. In the extreme case, the maximum values of the data are 20 20 20 20 20 20 21
21 21 21 21, with mean <20.5.
Problem 8: a) 55 . 2 20 / ) 2 5 3 4 4 3 6 2 5 1 ( / ) (
1

n x x
n
i
i
, Mode = 2
Median = average of the 10
th
and 11
th
smallest observation = (2+2)/2=2
b) The distribution is right-skewed because the right-hand tail is longer.
c)
163 ) 2 5 3 4 4 3 6 2 5 1 (
2 2 2 2 2
1
2

n
i
i
x
,
317 . 1 19 / ] ) 55 . 2 ( 20 163 [ ) 1 /(
2 2
1
2

n x n x s
n
i
i

Problem 9: (d) Both Median and IQR are insensitive to outliers. However, both the mean
and the standard deviation are very sensitive to the outlying values, because their values
will be distorted greatly when averaging.

Problem 10: (a) IQR=Q
3
-Q
1
=245-145=100, hence the thresholds to define the outliers are
Q
3
+1.5IQR= 235+150=385 and Q
1
-1.5IQR=145-150=-5
Since all the data are within the two thresholds, there is no outlier in the data.
Problem 11: a) 715 . 5 9 / ] ) 10 ( 10 1294 [ ) 1 10 /( 10 , 10 10 / 100 10 / ) (
2 2
10
1
2
10
1



x x s x x
i
i
i
i

b) 422 . 5 10 / ] ) 10 ( 11 ) 100 1294 [( ) 1 11 /( 11 , 10 11 / ) 10 100 ( 11 / ) (
2 2
11
1
2
11
1



x x s x x
i
i
i
i

Problem 12:
Q
1
: (np/100)=(11)(25)/100=2.753=k.
=> Q
1
= 3
rd
smallest observation = 11.
Q
3
: (np/100)=(11)(75)/100=8.259=k.
=> Q
3
= 9
th
smallest observation = 15.
Median = (11+1)/2
th
smallest observation
= 6
th
smallest observation = 13.
IQR=15-11=4. The thresholds for the
outliers are therefore
Q
3
+1.5IQR= 15+1.5(4)=21 and
Q
1
-1.5IQR=11-1.5(4)=5
Based on the thresholds, we can see that
the data values 2 and 25 are outliers, with
the largest and smallest non-outlying values
given by 9 and 18.


Problem 13: a) is true because its a square root value. b) is false: s can actually be zero,
when all the data points have the same value. c) is false: With the presence of outliers, s is
not a good measure of spread. However, the interquartile range (IQR) is a better measure of
spread in that case, as it depends on the middle half of the data points only.

Page 3/3

Problem 14: a) Range = 6-0= 6
b) 52 6 4 , 25 . 1 8 / ) 6 4 (
2 2
1
2

n
i
i
x x , 375 . 2 7 / ) ) 25 . 1 ( 8 52 ( ) 1 /(
2 2
1
2

n x n x s
n
i
i

c) Range = 60, 3616 60 4 , 8 8 / ) 60 4 (
2 2
1
2

n
i
i
x x ,
058 . 21 7 / ) ) 8 ( 8 3616 ( ) 1 /(
2 2
1
2

n x n x s
n
i
i

Both the range and standard deviation are sensitive to outliers, with the presence of the
data point 60 increasing the values for both substantially.

Problem 15: Note that mean and mode are preserved under both translation and rescaling,
while standard deviation and range are preserved under rescaling only. Hence,

130 ) 13 ( 10 range 10 range , 40 ) 4 ( 10 10
122 2 ) 12 ( 10 2 mode 10 mode , 102 2 ) 10 ( 10 2 10


x y x y
x y
s s
x y

Problem 16: The sample mean =1. In this case, the alternative formula for variance is easier
to calculate, given by 56 . 365 9 / ) ] 1 [ 10 3300 ( ) 1 /(
2 2
1
2 2

n x n x s
n
i
i
. The original formula
for variance, ) 1 /( ) (
2
1
2

n x x s
n
i
i
, is much tedious to calculate in the problem.
Problem 17: 84 ) 21 ( 4 2
2 2 2

x y
s s , Range
y
=|-2|Range
x
= 2(10)=20.
Problem 18: (b) From the boxplots, IQR is roughly equal to 80-60=20. Hence the length of
the vertical bars cannot be longer than 1.5IQR=30, with (ii) violating the criterion. (i) is a
valid boxplot, with the absence of a vertical line suggesting that the largest 25% of the data
points are of the same value (=80!). (iii) is a valid boxplot with two outliers in the data.
Problem 19: a) Mean = 4.6375, Median = (1.2+1.8)/2=1.5
b) Q1 = average of 2
nd
and 3
rd
smallest values = (0.7+1.1)/2=0.9
Q3 = average of the 6
th
and 7
th
smallest values = (9.8+2.3)/2 = 6.05
IQR=Q3-Q1 = 5.15 => Lower Threshold = Q1-1.5(IQR)=0.9-9.075=-8.175
Upper Threshold = Q3 + 1.5(IQR)=6.05+9.075=15.125
Since 20 is the only data point outside the two thresholds, there is one outlier
(20) in the data set.

Problem 20: a) Stem-and-leaf plot right hand side
b) Q1: np/100=20(25)/100=5 which is an integer.
Q1 = average of the 5
th
and 6
th
smallest obs. = 2800+(38+41)/2=2839.5
Q3: np/100=20(75)/100=15 which is an integer.
Q3 = average of the 15
th
and 16
th
smallest obs. = (3323+3484)/2=3403.5

(Problem 20)

Vous aimerez peut-être aussi