Vous êtes sur la page 1sur 7

BES Tutorial Sample Solutions, S1/13

WEEK 10 TUTORIAL EXERCISES (To be discussed in the week starting


May 13)
1.

State whether the normal distribution, t distribution or neither would be


used to test hypotheses regarding the population mean in the following
situations:
(a) Population normally distributed, 2 unknown, sample size less than
30.
tdistribution
Population normally distributed, 2 unknown, sample size greater than
30.
tdistribution although as the sample size gets very large this effectively
becomes the same as using the normal.
(b)

(c) Population normally distributed, 2 known, sample size less than 30.
Normal distribution
Population not normally distributed, 2 unknown, sample size greater
than 30.
Because the sample size is large you can invoke the CLT and use the fact
that s2 is a consistent estimator of 2 to justify using the normal
distribution.

(d)

Population not normally distributed, 2 unknown, sample size less


than 30.
Here the sampling distribution is unknown and hence we dont know how to
test a hypothesis about in this circumstance. In practice you could either
assume the population is approximately normally distributed and proceed
as in (a); or alternatively invoke the CLT and proceed as in (d). How well
either of these solutions works ultimately depends on the (unknown) extent
of nonnormality of the population distribution.
(e)

2.

Reconsider Question 2 of the Week 9 exercises. In that exercise, a real


estate expert claimed the current mean value of houses in a particular area
was more than $250,000. A random sample of 150 recent sales prices in
the area yielded a sample mean of $265,000 and it is known that house
values in the area are approximately normally distributed with a standard
deviation of $50,000.
(a) If in fact the population mean house value in the area is $260,000,
what is the probability of committing a type II error in performing an
upper tail test of the null hypothesis that the mean house value price
in the area is $250,000, as in Question 1 part (a) of the Week 9
exercises? What is the power of the test in these circumstances?
State in words what the power of the test means.

Let X value of a house in the area

$265,000,
$50,000,
150, ~
:
250,000; :
250,000


Rejection region:

250,000

1.645

50,000
150

256,715.68


Thus Type II error (Probability of not rejecting H0 when it is false):

256,715.68|
260,000
256,715.68 260,000
0.8
0.2119

50,000 150


1
0.7881

The power of the test gives the probability of correctly rejecting the null
hypothesis when it is false.

(b)

Illustrate your answer to part (a) above by showing on a diagram the


areas representing the probability of a type II error and the power of
the test.

Under

250,000
1 power
under 260,000

250,000 260,000

$256,715.68

3.

A company running an urban rail service wishes to estimate its daily


average number of late running trains on week days. For 10 randomly
selected week days, it finds the following numbers of late running trains:
32, 10, 9, 18, 25, 15, 14, 18, 22, 16
(a)

Assuming the number of late running trains on a weekday is


approximately normally distributed, calculate a 90% confidence
interval for the mean number of late running trains on a week day.

Let X number of late trains on a weekday



48.32,
6.9514
0.1, 17.9,

Since 2 is unknown, n is small and the underlying distribution is normal, we
construct the confidence interval using the t distribution.

Required interval is
6.9514
17.9

. ,
,

10
6.9514
17.9 1.833

10
17.9 4.029
13.871,21.929
(b)

If we did not have the assumption of normality, could we still


calculate a confidence interval in this example? If not, suggest a way
of overcoming this problem.

Everything else the same, we could not construct a confidence interval in the
same way as in (a) since the t distribution is only valid if the underlying
distribution is normal. This problem could be overcome by obtaining a larger
sample size and then making use of the central limit theorem (and replacing
by s).

4.

Reconsider Question 2 of the Week 8 exercises. Would normality be a good


approximation for the population distribution of distance traveled by used
passenger cars? (Hint: look at the summary statistics and a histogram.) Do
you need to assume normality? Redo the 95% confidence interval for the
population mean distance traveled by used passenger cars without assuming
a known population standard deviation.

EXCEL summary statistics and histogram for distance traveled indicate non
normality. The distribution is skewed to the right, the median is much less than
the mean, and the sample mean is only 1.35 standard deviations from zero:

Odometer (km)
Mean
Standard Error
Median
Mode
Standard Deviation
Sample Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

78560.83
5384.86
67980
147000
58246.19
3392618896
3.426
1.528
315597
403
316000
9191617
117

Frequency histogram for odometer readings for cars in


Anzac Garage data
45
40
35

Frequency

30
25
20
15
10
5
0
20000

60000

100000

140000

180000

Odometer (kms)

220000

260000

300000


While the population distribution seems nonnormal, the sample size is large
enough to invoke the CLT and hence to assume the sample mean is
approximately normally distributed.

In Question 2 of the Week 8 we assumed known but here we consider the
more likely situation where it is unknown and we replace by s as calculated
by EXCEL. The 95% confidence interval is given by

5.

58,246

117
78561 10,554
68,007,89,115
78,561

1.96

It is known that 80% of people suffering from a particular disease are cured
by a certain medication. Test the claim of the developers of a new
medication that their product is more effective in curing the disease, using a
5% significance level and a random sample of 400 people with the disease
where 330 are cured by using the new medication. (Hint: Use the normal
approximation and ignore the continuity correction.)
:

0.8,

0.8,

400,

0.05 &

0.825


Therefore we can use the normal approximation to the binomial & under H0:
1
0.8 0.2
~
,
~ 0.8,


400

So, ignoring the continuity correction, calculate the empirical significance
level or pvalue:
0.825 0.8
1.25
0.1056
0.825
0.8 0.2 /400

Because pvalue > (0.1056 > 0.05) we do not reject H0 & conclude there is not
enough evidence to support the developers claim of a more effective cure.

(Alternatively rejection region: z >1.645 or 0.8329

6.

SIA: Crime statistics (Note: You can check your answers in the NSW
BOCSAR report on driving under the influence of cannabis used in
Question 4, Week 9.)
A recent study of driving under the influence of cannabis, reports a
confidence interval for the population proportion of people who have ever
used cannabis as (0.539, 0.627). This is based on a sample of 502.
(a)

What is the sample proportion on which the reported confidence


interval is based?


Since the confidence interval for the population proportion is always centered
on the point estimate, is always the middle point, i.e.

0.539 0.627
0.583

2

(b)

What level of confidence was used in the calculation of the


confidence interval?


Assuming
~

then we have (replacing p by ):


0.539, 0.627
Thus 0.044

0.583

/
.

and
/

0.0220

0.044
0.022

2.00

0.583 0.417

502


implying /2 =0.0228 & hence =0.0456 or 4.56%.

(Report indicates =5% with the difference due to rounding.)

7