Vous êtes sur la page 1sur 16

9231/2 Simon See (019-9251098)

Chapter 7: Inferences Using Normal and t-Distribution

Recall the unbiased estimators for:

 population mean, 𝜇, is 𝜇̂ , where


∑𝑥
𝜇̂ = 𝑥̅ =
𝑛
 population variance, 𝜎 2 , is 𝜎̂ 2 where
1 (∑𝑥)2 1 𝑛
𝜎̂ 2 = (∑𝑥 2 − )= ∑(𝑥 − 𝑥̅ )2 = × 𝑠2
𝑛−1 𝑛 𝑛−1 𝑛−1

7.1 The 𝑡-Distribution


The 𝑡-distributions are symmetric about zero has have a single parameter 𝜈 (a Greek letter
pronounced as “nu”), which is a positive integer.

𝜈 is known as degree of freedom of the distributions and if, for example, 𝑇 has a 𝑡-
distribution with five degree of freedom, we would write 𝑇~𝑡(5).

The diagram below shows two curves, 𝑡(2) and 𝑡(5).

As 𝜈 increases, it resembles the standardized normal distribution 𝑁~(0, 1). In fact when
𝜈 ≥ 30, the 𝑡(𝜈) distribution is very close to the standard normal distribution.

For sample size of 𝑛,


𝑋̅ − 𝜇
𝑇=
𝜎̂/√𝑛
follows a 𝑡-distribution with (𝑛 − 1) degree of freedom.

1
9231/2 Simon See (019-9251098)

7.2 Hypothesis Testing of Population Mean with 𝑡-Distribution


We use 𝑡-distribution for hypothesis testing when:

 sample is taken from a population with normal distribution,


 sample size is small (𝑛 < 30),
 population variance is unknown

The test statistics is given by

𝑋̅ − 𝜇
𝑇=
𝜎̂/√𝑛

and 𝑇~𝑡(𝑛 − 1).

Example 7.1
5 readings of the resistance 𝑋, in ohms, of a piece of wire are summarized as ∑𝑥 = 7.6,
∑𝑥 2 = 11.5538.
If the wire is pure, the resistance is 1.50 ohms. If the wire is impure, its resistance is higher
than 1.50 ohms. Assuming that the resistance can be modelled by a normal variable with
mean 𝜇 and standard deviation 𝜎, calculate
(a) the sample mean 𝑥̅
(b) an unbiased estimator of 𝜎.
Is there any evidence, at the 5% level of significance, that the wire is impure.
Let 𝑋 be the resistance of the wire in ohms.
∑𝑥 7.6
(a) 𝑥̅ = = = 1.52
𝑛 5
(b)
1 (∑𝑥)2 1 (7.6)2
𝜎̂ 2 = 𝑛−1 (∑𝑥 2 − ) = 5−1 (11.5538 − )
𝑛 5
𝜎̂ = 0.0212

Null and alternative hypothesis:


𝐻0 : 𝜇 = 1.50
𝐻1 : 𝜇 > 1.50

Test statistic:
𝑋̅ − 𝜇 1.52 − 1.50
𝑡= = = 2.109
𝜎̂/√𝑛 0.0212/√5

Critical value:
The degree of freedom, 𝑣 = 5 − 1 = 4
This is a right-tailed test.
So the critical value is 𝑡0.95,4 = 2.132

Since 𝑡 = 2.109 < 2.132, do not reject 𝐻0 .


There is not enough evidence, at 5% significance level, to indicate wire is impure.

2
9231/2 Simon See (019-9251098)

Example 7.2
Brilliant firework is intended to burn for 40 seconds. A random sample of 15 Brilliant
firework is selected. Each firework in the sample is ignited and the burning time, 𝑥 seconds
is measured. The results are summarized by ∑(𝑥 − 40) = −18 and ∑(𝑥 − 40)2 = 90.
Test, at the 10% significance level, whether or not the mean burning time is less than 40
seconds.
∑(𝑥 − 40) 40 −18
𝑥̅ = + = + 40 = 38.8
𝑛 𝑛 15
2
1 (∑(𝑥−40)) 1 (−18)2
𝜎̂ 2 = 𝑛−1 (∑(𝑥 − 40)2 − ) = 15−1 (90 − )
𝑛 15
𝜎̂ = 2.21

Null and alternative hypothesis:


𝐻0 : 𝜇 = 40
𝐻1 : 𝜇 < 40

Test statistic:
𝑋̅ − 𝜇 38.8 − 40
𝑡= = = −2.103
𝜎̂/√𝑛 2.21/√15

Critical value:
The degree of freedom, 𝑣 = 15 − 1 = 14
This is a left-tailed test.
So the critical value is 𝑡0.10,14 = −1.345

Since 𝑡 = −2.103 < −1.345, reject 𝐻0 .


There is enough evidence, at 10% significance level, to indicate the burning time of Brilliant
firework is less than 40 seconds.

Example 7.3
A machine is supposed to produce paper with a mean thickness of 0.05 mm. Eight random
measurements of the paper gave a mean of 0.047 mm with a standard deviation of 0.002
mm. Assuming that the thickness of the paper produced by the machine is normally
distributed, test at the 1% level whether the output from the machine is different from
expected.
Let 𝑋 be the thickness of the paper.
𝑥̅ = 0.047

𝑛 8
𝜎̂ 2 = 𝑛−1 × 𝑠 2 = 7 × 0.0022
𝜎̂ = 0.00214

3
9231/2 Simon See (019-9251098)

Null and alternative hypothesis:


𝐻0 : 𝜇 = 0.05
𝐻1 : 𝜇 ≠ 0.05

Test statistic:
𝑋̅ − 𝜇 0.05 − 0.047
𝑡= = = −3.96
𝜎̂/√𝑛 0.00214/√8

Critical value:
The degree of freedom, 𝑣 = 8 − 1 = 7
This is a two-tailed test.
So the critical value is 𝑡0.99,7 = 3.499

Since |𝑡| = 3.96 > 3.499, reject 𝐻0 .


There is enough evidence, at 1% significance level, that the output from the machine is
different from expected.

Exercise 7a
1. An athlete finds that her times for running a race are normally distributed with mean
10.8 seconds. She trains intensively for a week and records her time in the next 5 races.
Her times, in seconds, are 10.70, 10.65, 10.75, 10.80, 10.60.
Is there evidence, at the 5% level, that training intensively has improved her times?
[𝑡 = −2.828, evidence of improved times]
2. It is thought that a normal population has mean 1.6. A random sample of 10
observations gives a mean of 1.49 and standard deviation of 0.3.
Does this provide evidence, at the 5% level, that the population mean is less than 1.6?
[𝑡 = −1.1, no]
3. A random sample of 8 observations of a normal variable gave
∑𝑥 = 36.5, ∑(𝑥 − 𝑥̅ )2 = 0.74.
Test, at the 5% level, the hypothesis that the mean of the distribution is 4.3 against the
alternative hypothesis that the mean is greater than 4.3.
[𝑡 = 2.284, evidence mean greater than 4.3]
4. A firm of solicitors claims that, on average, interviews with clients last 50 minutes. A
random sample of 15 interviews is chosen, and the time taken for each interview, 𝑥
minutes, is noted. The results are summarized by ∑𝑥 = 746 and ∑𝑥 2 = 37 180.
Assuming that the time for an interview has a normal distribution, use a 𝑡-test to
determine, at the 5% significance level, whether the firm is overstating the average
interview time.
[𝑡 = −0.435, not overstating]
5. Haemoglobin levels in females may be modelled by a normal distribution with mean
14.2 (grams per decilitre). 10 randomly chosen females students from a college had their
haemoglobin levels, ℎ measured. Results summaries are ∑ℎ = 147.9 and ∑ℎ2 =
2203.19. Test, at the 5% significance level, whether the mean haemoglobin level of the
female students in the college differs from the mean level of all females.
[𝑡 = 1.410, no evidence of different mean]

4
9231/2 Simon See (019-9251098)

7.3 Confidence Interval of Population Mean with 𝑡-Distribution


Consider the following sample with

 sample is taken from a population with normal distribution,


 sample size is small (𝑛 < 30),
 population variance is unknown

The confidence interval for the population mean, 𝜇 is given by

𝜎̂
𝑥̅ ± 𝑡
√𝑛

where 𝑡 is the value from 𝑡(𝑛 − 1) distribution where 𝑃(−𝑡 ≤ 𝑇 ≤ 𝑡) = 𝛼.

Example 7.4
The mass, in grams, of a packet of biscuits of a particular brand, follows a normal
distribution with mean 𝜇. Ten packets of biscuits are chosen at random and their masses
noted. The results, in grams, are summarised as ∑𝑥 = 3978.8, ∑𝑥 2 = 1 583 098.3.
Calculate a 95% confidence interval for 𝜇.
Let 𝑋 be the mass of a packet of biscuit in grams.
∑𝑥 3978.7
𝑥̅ = 𝑛 = 10 = 397.87

1 (∑𝑥)2 1 3978.82
𝜎̂ 2 = 𝑛−1 (∑𝑥 2 − ) = 9 (1 583 098.3 − )
𝑛 10
𝜎̂ = 3.213

The degree of freedom, 𝑣 = 10 − 1 = 9


The 𝑡-value is 𝑡0.975,9 = 2.262

The confidence limits are


̂
𝜎 3.213
𝑥̅ ± 𝑡 𝑛 = 397.87 ± 2.262 ×
√ √10
= 397.87 ± 2.298

95% confidence interval for 𝜇 is (395.6, 400.3)

Example 7.5
A student, studying the height of a particular plant, knows that it follows a normal
distribution with mean 𝜇 and variance 𝜎 2 , but he does not know the value of either of these
parameters. He selects 15 plants at random, measures their heights and calculates that the
mean height of the sample is 12.2 cm and the standard deviation is 1.4 cm. Using these
values calculate a 90% confidence interval for 𝜇. Calculate also the width of this interval.
Let 𝑋 be the height of a plant in cm.

𝑥̅ = 12.2

5
9231/2 Simon See (019-9251098)

𝑛 15
𝜎̂ 2 = × sample variance = × 1.42
𝑛−1 14
𝜎̂ = 1.449

The degree of freedom, 𝑣 = 15 − 1 = 14


The 𝑡-value is 𝑡0.95,14 = 1.761

The confidence limits are


̂
𝜎 1.449
𝑥̅ ± 𝑡 𝑛 = 12.2 ± 1.761 ×
√ √14
= 12.2 ± 0.682

90% confidence interval for 𝜇 is (11.52, 12.88)

The width of the interval is 2 × 0.682 = 1.364

Exercise 7b
1. The heights, in metres, of a random sample of 6 policemen from a particular station
were as follows:
1.80, 1.76, 1.79, 1.81, 1.83, 1.79.
Assuming that the heights if policamen from that station are normally distributed with
mean 𝜇, calculate a 95% confidence interval for 𝜇 and state the width of this interval.
[(1.77, 1.82); 0.049]
2. Twenty measurements of 𝑥, the life, in hours of a particular make of candle gave the
following data:
∑𝑥 = 172, ∑𝑥 2 = 1495.5.
Assuming that the length of life is modelled by a normal distribution with mean 𝜇, find a
98% confidence interval for 𝜇.
[(8.07, 9.13)]
3. A random sample of 8 observations of a normal variable gave
∑𝑥 = 261.2, ∑(𝑥 − 𝑥̅ )2 = 3.22.
Calculate a 95% confidence interval for the population mean.
If 400 such samples were taken, how many of these would you expect not to include the
population mean?
[(32.08, 33.22); 380]
4. The times, 𝑡 minutes, taken by 18 children in an infant reception class to complete a jig-
saw puzzle were measured. The results are summarized by ∑𝑡 = 75.6 and ∑𝑡 2 = 338.1.
Stating your assumption, calculate a 95% confidence interval for the population mean
time for children to complete the puzzle.
[Assumes normal, (3.65, 4.75)]
5. The acceleration due to gravity, 𝑔 is determined experimentally. In 5 independent
determination in values, in m s-2, are 9.79, 9.82, 9.80, 9.78, 9.84. It may be assumed that
these values are observations from a normal distribution whose mean is 𝑔.
Calculate a 99% confidence interval for 𝑔, giving endpoints to 3 decimal places.
[(9.756, 9.856)]

6
9231/2 Simon See (019-9251098)

7.4 Difference of Two Population Means (Hypothesis Testing and Confidence


Interval)
This test is used when you have two normal populations 𝑋1 and 𝑋2 with unknown means 𝜇1
and 𝜇2 , we want to test the difference between the means of these populations.

Consider 𝑋1 ~𝑁(𝜇1 , 𝜎12 ) and 𝑋2 ~𝑁(𝜇2 , 𝜎22 ).

The hypotheses might be:

𝐻0 : 𝜇1 − 𝜇2 = ⋯

𝐻1 : 𝜇1 − 𝜇2 > ⋯ or 𝜇1 − 𝜇2 < ⋯ or 𝜇1 − 𝜇2 ≠ ⋯

The random sample of size 𝑛1 is taken from 𝑋1 with sample mean 𝑥̅1 and random sample of
size 𝑛1 is taken from 𝑋2 with sample mean 𝑥̅2 .

The two samples are independent.

The test statistics is 𝑋̅1 − 𝑋̅2. The mean and variance of this distribution is given by:

𝐸(𝑋̅1 − 𝑋̅2 ) = 𝐸(𝑋̅1 ) − 𝐸(𝑋̅2 ) = 𝜇1 − 𝜇2

𝜎12 𝜎22
𝑉𝑎𝑟(𝑋̅1 − 𝑋̅2 ) = 𝑉𝑎𝑟(𝑋̅1 ) + 𝑉𝑎𝑟(𝑋̅2 ) = +
𝑛1 𝑛2

Type 1: The population variances 𝝈𝟐𝟏 and 𝝈𝟐𝟐 are known

Since

𝜎12 𝜎22
𝑋̅1 − 𝑋̅2 ~𝑁 (𝜇1 − 𝜇2 , + )
𝑛1 𝑛2

The test statistics is

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑍=
𝜎12 𝜎22

𝑛1 + 𝑛2

The confidence interval for 𝜇1 − 𝜇2 are

𝜎12 𝜎22
(𝑥̅1 − 𝑥̅2 ) ± 𝑧√ +
𝑛1 𝑛2

7
9231/2 Simon See (019-9251098)

Example 7.6
Due to differences in the environment, the masses of a certain species of small animals are
believed to be greater in Region A than in Region B. It is known that the masses in both
regions are normally distributed, with masses in Region A having standard deviation of 0.04
kg and masses in Region B having a standard deviation of 0.09 kg.
Random samples are taken: 60 animals from Region A had a mean mass of 3.03 kg and 50
animals from Region B had a mean mass of 3.00 kg.
Does this provide evidence, at the 1% level that the animals of this species in Region A have
a greater mass than those in Region B.
Let 𝑋1 be the mass of an animal in Region A in kg.
Let 𝑋2 be the mass of an animal in Region B in kg.

𝑥̅1 = 3.03 and 𝑥̅2 = 3.00


𝜎12 = 0.042 and 𝜎22 = 0.092

Null and alternative hypotheses:


𝐻0 : 𝜇1 − 𝜇2 = 0
𝐻1 : 𝜇1 − 𝜇2 > 0

Test statistic:
(𝑥̅1 − 𝑥2 ) − (𝜇1 − 𝜇2 ) (3.03 − 3.00) − 0
𝑧= = = 2.184
0.042 0.092
𝜎2 𝜎2 √
√ 1 + 2 60 + 50
𝑛1 𝑛2

This is right-tailed test.


Critical value is 𝑧0.99 = 2.326

Since 𝑧 = 2.184 < 2.326, do not reject 𝐻0 .


There is not enough evidence, at 1% significance level, that the animals in Region A have a
greater mass than those in Region B.

Type 2: The populations have a known common variance where 𝝈𝟐 = 𝝈𝟐𝟏 = 𝝈𝟐𝟐

The test statistics is

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑍=
1 1
𝜎√ +
𝑛1 𝑛2

The confidence interval for 𝜇1 − 𝜇2 are

1 1
(𝑥̅1 − 𝑥̅2 ) ± 𝑧𝜎√ +
𝑛1 𝑛2

8
9231/2 Simon See (019-9251098)

Example 7.7
The same physical fitness test was given to a group of 100 scouts and to a group of 144
guides. The maximum score was 30. The guides obtained a mean score of 26.81 and the
scouts obtained a mean score if 27.53. Assuming that the fitness scores are normally
distributed with a common population standard deviation of 3.48, calculate a 90%
confidence interval of the difference between the mean scores.
Hence, test at 10% level, whether the scouts and guides have different performance.
Let 𝑋1 be the guide’s score with population mean 𝜇1 .
Let 𝑋2 be the scout’s score with population mean 𝜇2 .

𝑥̅1 = 26.81 and 𝑥̅2 = 27.53, 𝜎 = 3.48

𝑧0.95 = 1.645
Confidence limits:
1 1 1 1
(𝑥̅1 − 𝑥̅2 ) ± 𝑧𝜎√ + = (26.81 − 27.53) ± 1.645(3.48)√ +
𝑛1 𝑛2 100 144
= −0.72 ± 0.745

90% confidence interval for 𝜇1 − 𝜇2 is (−1.465, 0.025)

𝐻0 : 𝜇1 − 𝜇2 = 0
𝐻1 : 𝜇1 − 𝜇2 ≠ 0

Since the 90% confidence interval contains 0, do not reject 𝐻0 .


There is no enough evidence, at 10% level to indicate the scouts and guides have difference
performance.

Type 3: The populations have an unknown common variance 𝝈𝟐

The unbiased estimator 𝜎̂ 2 is used to estimate unknown common population variance, 𝜎 2 .

This is known as pooled two-sample variance.

For two samples with sample size and sample variance, 𝑛1 and 𝑠12 , and 𝑛2 and 𝑠22 :

2
𝑛1 𝑠12 + 𝑛2 𝑠22
𝜎̂ =
𝑛1 + 𝑛2 − 2

An alternative form is

∑(𝑥1 − 𝑥̅1 )2 + ∑(𝑥2 − 𝑥̅2 )2


𝜎̂ 2 =
𝑛1 + 𝑛2 − 2

The distribution of 𝑋̅1 − 𝑋̅2 depending on the sample sizes.

9
9231/2 Simon See (019-9251098)

Large samples:

The distribution of 𝑋̅1 − 𝑋̅2 is approximately normal.

Therefore the test statistic is

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑍=
1 1
𝜎̂√𝑛 + 𝑛
1 2

The confidence interval is given by

1 1
(𝑥̅1 − 𝑥̅2 ) ± 𝑧𝜎̂√ +
𝑛1 𝑛2

Small samples: (A guideline is 𝑛1 + 𝑛2 < 30)

The distribution of 𝑋̅1 − 𝑋̅2 follows a 𝑡-distribution with degree of freedom 𝜈 = 𝑛1 + 𝑛2 − 2

The test statistic is

(𝑋̅1 − 𝑋̅2 ) − (𝜇1 − 𝜇2 )


𝑇=
1 1
𝜎̂√𝑛 + 𝑛
1 2

where 𝑇~𝑡(𝑛1 + 𝑛2 − 2)

The confidence interval is given by

1 1
(𝑥̅1 − 𝑥̅2 ) ± 𝑡𝜎̂√ +
𝑛1 𝑛2

Example 7.8
Two statistics teacher, Mr Chalk and Mr Talk, argue about their ability at golf. Mr Chalk
claims that with a number of 7 iron he can hit the ball, on average, at least 10 m further
than Mr Talk. They conducted an experiment, measuring the distances for several shots.
Denoting the distance Mr Chalk hits the ball by 𝑥 metres, the following results were
obtained: 𝑛1 = 40, ∑𝑥 = 4080, ∑(𝑥 − 𝑥̅ )2 = 1132.
Denoting the distance Mr Talk hits the ball by 𝑦 metres, the following results were obtained:
𝑛2 = 35, ∑𝑦 = 3325, ∑(𝑦 − 𝑦̅)2 = 1197.
Assuming the populations have a common variance, test whether there is evidence, at the
1% level, Mr Talk challenges Mr Chalk that actually the distance is less than 10 m.
Let 𝑋 be the distance, in metres, for Mr Chalk with population mean 𝜇1
Let 𝑌 be the distance, in metres, for Mr Talk with population mean 𝜇2 .

The population variances are common and unknown, so we have to use pooled variance.
Since sample size is large enough, we will use 𝑧-test.

10
9231/2 Simon See (019-9251098)

∑𝑥 4080 ∑𝑦 3325
𝑥̅ = = = 102 and 𝑦̅ = = = 95
𝑛1 40 𝑛2 35

∑(𝑥 − 𝑥̅ )2 + ∑(𝑦 − 𝑦̅)2 1132 + 1197


𝜎̂ 2 = =
𝑛1 + 𝑛2 − 2 40 + 35 − 2
𝜎̂ = 5.648

Null and alternative hypotheses:


𝐻0 : 𝜇1 − 𝜇2 = 10
𝐻1 : 𝜇1 − 𝜇2 < 10

Test statistic:
(𝑥̅ − 𝑦̅) − (𝜇1 − 𝜇2 ) (102 − 95) − 10
𝑧= = = −2.29
1 1 1 1
𝜎̂√ +
𝑛1 𝑛2 5.648√40 +
35

This is left-tailed test.


Critical value is 𝑧0.01 = −2.326

Since 𝑧 = −2.29 > −2.326, do not reject 𝐻0 .


There is not enough evidence, at 1% significance level, that distance of Mr Chalk is less than
10 m Mr Talk.

Exercise 7c
1. A botanist believes that the moisture content of the soil in the northern half of a large
field is significantly different from that in the southern half. To test this belief he
measures the moisture content at five randomly chosen points in the northern half of
the field and four randomly chosen points in the southern half.
The results are as follows:
Northern half (%) 8.7 9.3 10.1 9.0 10.3
Southern half (%) 7.4 9.1 8.6 8.2
Stating your assumptions, test, at the 5% significance level, whether the mean moisture
content of the southern half of the field is less than that of the northern half.
[Assume two populations are normal with common variance, 𝑡 = 2.444, enough
evidence to show the mean of southern half is less than the northern half]
2. Mr Brown and Mr Green work at the same office and live next door to each other. Each
day they leave for work together but travel by different routes. Mr Brown maintains that
his route is quicker, on average, by at least four minutes. Both men time their journeys
in minutes over a period of ten weeks. The results obtained were:
Mr Brown: 𝑛1 = 50, 𝑥̅1 = 21, 𝑠12 = 10.24
Mr Green: 𝑛2 = 50, 𝑥̅2 = 24, 𝑠12 = 7.84
Assuming that the times are normally distributed and that they have a common
population variance, test at the 5% level whether Mr Brown’s claim can be accepted.
[𝑧 = −1.646, reject Mr Brown’s claim]

11
9231/2 Simon See (019-9251098)

3. A random sample of size 100 is taken from a normal population with variance 𝜎12 = 40.
The sample mean 𝑥̅1 is 38.3. Another random sample, of size 80, is taken from a normal
population with variance 𝜎22 = 30. The sample mean 𝑥̅ 2 is 40.1. Test, at the 5% level,
whether there is a significant difference in the population means 𝜇1 and 𝜇2 .
[𝑧 = −2.04; evidence of difference]
4. The heights (measured to the nearest centimeter) of a random sample of six policeman
from a certain force in Wales were found to be:
176, 180, 179, 181, 183, 179
The heights (measured to the nearest centimeter) of a random sample of 11 policemen
from a certain force in Scotland gave the following data:
∑𝑦 = 1991, ∑(𝑦 − 𝑦̅)2 = 54
Test at the 5% level, the hypothesis that Welsh policemen are shorter then Scottish
policemen. Assume that the heights of policemen in both forces are normally distributed
and have a common population variance.
[𝑡 = −1.13, not enough evidence]
5. Mr Mean notes the time, in minutes, that it takes him to drive to work in the mornings.
The results are:
𝑛1 = 8, ∑𝑥1 = 120, ∑𝑥12 = 1827
For this return journey in the rush hour, Mr Mean notes that:
𝑛2 = 10, ∑𝑥2 = 230, ∑𝑥22 = 5436
He maintains that, on average, it takes him at least ten minutes longer to drive home.
Assuming that the times of all journeys are normally distributed, use the two-sample 𝑡-
test at the 5% level to test Mr Mean’s claim.
[𝑡 = −1.282, do not reject claim]
6. Hischi and Taschi are two makes of video tapes. They are both advertised as having a
recording time of 3 hours. A sample of 49 Hischi tapes was tested and denoting the
actualy recording time by ℎ minutes, the following results were obtained:
2
∑ℎ = 8673, ∑(ℎ − ℎ̅) = 12 720
A sample of 81 Taschi tapes was also tested. Denoting the actual recording time by 𝑡
minutes, the results obtained were:
∑𝑡 = 14 904, ∑(𝑡 − 𝑡̅)2 = 33 488
If the recording times for the two makes are normally distributed and have a common
variance, show that the unbiased estimate of this common variance is 361. Test whether
there is significant evidence, at the 5% level, of a difference in the mean recording times.
[𝑧 = 2.036, significant]
7. Kapil believes that the carrots he grows in his garden are heavier, on average, than those
grown by his friend Jack. To confirm his belief they both select 8 carrots, chosen at
random from their corps, whose weights 𝑥𝐾 grams and 𝑥𝐽 grams are summarized by
∑𝑥𝐾 = 1510, ∑𝑥𝑘2 = 285 351, ∑𝑥𝐽 = 1406, ∑𝑥𝐽2 = 247 512
Test, at 5% significance level, whether Kapil’s carrots are heavier, on average, than Jack’s
by more than 10 grams.
[𝑡 = 0.822, no evidence]

12
9231/2 Simon See (019-9251098)

7.5 Paired-𝑡 Test


In a case of two dependent samples, two data values—one for each sample—are collected
from the same source (or element) and, hence, these are also called paired or matched
samples.

For example, we may want to make inferences about the mean weight loss for members of
a health club after they have gone through an exercise program for a certain period of time.
To do so, suppose we select a sample of 15 members of this health club and record their
weights before and after the program. In this example, both sets of data are collected from
the same 15 persons, once before and once after the program. Thus, although there are two
samples, they contain the same 15 persons. This is an example of paired (or dependent or
matched) samples.

In paired samples, the difference between the two data values for each element of the two
samples is denoted by 𝑑. This value of 𝑑 is called the paired difference. We then treat all
the values of 𝑑 as one sample and make inferences applying procedures similar to the ones
used for one-sample cases. Note that because each source (or element) gives a pair of
values (one for each of the two data sets), each sample contains the same number of values.
That is, both samples are the same size. Therefore, we denote the (common) sample size by
𝑛, which gives the number of paired difference values denoted by 𝑑. The degrees of
freedom for the paired samples are (𝑛 − 1). Let

𝜇𝑑 = the mean of the paired differences for the population

𝜎𝑑 = the standard deviation of the paired differences for the population (usually unknown)

𝑑̅ = the mean of the paired differences for the sample

𝑠𝑑 = the standard deviation of the paired differences for the sample

𝑛 = the number of paired difference values

The values of the mean and standard deviation, 𝑑̅ and 𝑠𝑑 , respectively, of paired differences
for two samples are calculated as

∑𝑑
𝑑̅ =
𝑛
1 (∑𝑑)2 1 2
𝜎̂ 2 = (∑𝑑2 − )= ∑(𝑥 − 𝑑̅ )
𝑛−1 𝑛 𝑛−1

Given paired sample where 𝐷 = 𝑋 − 𝑌, the sampling distribution of the mean of paired
differences 𝜇𝑋 − 𝜇𝑌 follows 𝑡-distribution with degree of freedom (𝑛 − 1).

13
9231/2 Simon See (019-9251098)

For small sample, the test statistic is

̅ − (𝜇𝑋 − 𝜇𝑌 )
𝐷
𝑇=
𝜎̂/√𝑛

Confidence interval of the paired difference is given by

𝜎̂
𝑑̅ ± 𝑡
√𝑛

Example 7.9
A company wanted to know if attending a course on “how to be a successful salesperson”
can increase the average sales of its employees. The company sent six of its salespersons to
attend this course. The following table gives the 1-week sales of these salespersons before
and after they attended this course.
Before 12 18 25 9 14 16
After 18 24 24 14 19 20
Using the 1% significance level, can you conclude that the mean weekly sales for all
salespersons increase as a result of attending this course? Assume that the population of
paired differences has a normal distribution.
Let 𝑑 be (Weekly sales after the course) − (Weekly sales before the course)

Before 12 18 25 9 14 16
After 18 24 24 14 19 20
𝑑 6 6 -1 5 5 4
𝑑2 36 36 1 25 25 16

∑𝑑 25
𝑑̅ = = = 4.17
𝑛 6
1 (∑𝑑)2 1 (25)2
𝜎̂ 2 = (∑𝑑2 − ) = (139 − )
𝑛−1 𝑛 5 6
𝜎̂ = 2.639

Null and alternative hypotheses:


𝐻0 : 𝜇𝑑 = 0
𝐻1 : 𝜇𝑑 > 0

Test statistic:
𝑑̅ − (𝜇𝑋 − 𝜇𝑌 ) 4.17
𝑡= = = 3.870
𝜎̂/√𝑛 2.639√6

This is right-tailed test.


Critical value is 𝑡0.01 = 3.365

Since 𝑡 = 3.870 > 3.365, reject 𝐻0 .


There is enough evidence, at 1% significance level, the mean weekly sales for all
salespersons increase as a result of attending this course.

14
9231/2 Simon See (019-9251098)

For large sample, the test statistic is

̅ − (𝜇𝑋 − 𝜇𝑌 )
𝐷
𝑍=
𝜎̂/√𝑛

Confidence interval of the paired difference is given by

𝜎̂
𝑑̅ ± 𝑧
√𝑛

Example 7.10
To investigate the difference in wear on front and rear type of motorcycles, 50 motorcycles
of the same model were fitted with new tyres of the same brand. After the motorcycles had
been driven for 2000 miles the depth of tread on the front and rear tyres were measured in
mm. For each motorcycle the value of 𝑑 =(depth of front tread – depth of rear tread) was
calculated. The results can be summarized by ∑𝑑 = 4.7 and ∑𝑑 2 = 0.79. Test at the 5%
significance level, whether there is a difference in wear on the front and rear tyres.
∑𝑑 4.7
𝑑̅ = = = 0.094
𝑛 50

2
1 2
(∑𝑑)2 1 (4.7)2
𝜎̂ = (∑𝑑 − )= (0.79 − )
𝑛−1 𝑛 49 50
𝜎̂ = 0.0843

Null and alternative hypotheses:


𝐻0 : 𝜇𝑑 = 0
𝐻1 : 𝜇𝑑 ≠ 0

Test statistic:
𝑑̅ − (𝜇𝑋 − 𝜇𝑌 ) 0.094 − 0
𝑧= = = 7.88
𝜎̂/√𝑛 0.0843/√50

This is two-tailed test.


Critical value is |𝑧0.975 | = 1.96

Since 𝑧 = 7.88 > 1.96, reject 𝐻0 .


There is enough evidence, at 5% significance level, there is a difference in wear on the front
and rear tyres.

Exercise 7d
1. Blood pressure data were obtained from a larger set of 97 people with diabetes. The
values of 𝐷 are summarized by ∑𝑑 = 4092, ∑𝑑2 = 187 948. Carry out a test of the
hypothesis 𝜇𝐷 > 40 which does not rely on 𝐷 having a normal distribution. Use a 5%
significance level.
[𝑧 = 1.704,do not reject 𝜇𝐷 > 40]

15
9231/2 Simon See (019-9251098)

2. Some psychologists believe that the IQ of the first-born child in a family is significantly
greater than the IQ of the last born. In order to investigate this belief, a random sample
of 8 families with more than one child agreed to allow their children’s IQ to be
measured, with the following results.
Family 1 2 3 4 5 6 7 8
IQ of first born 97 121 89 112 138 125 104 114
IQ of last born 101 116 97 108 130 121 101 105
Assuming that the differences has a normal distribution test the psychologists’ belief
using a 5% significance level.
[𝑡 = 1.279, no enough evidence to show the belief]
3. A person’s systolic blood pressure is a measure of the pressure exerted by the heart
when it contracts and pushes blood around the body. When the heart has just ceased to
contract and is dilating ready for the next contraction, the blood pressure drops and is
called the diastolic pressure.
The following table gives the systolic and diastolic blood pressure (measure in mm of
mercury) of 6 randomly chosen people with diabetes.
Patient 1 2 3 4 5 6
Systolic pressure 141 129 117 115 93 101
Diastolic pressure 83 76 71 59 51 64
Let 𝐷 denote the amount by which the systolic pressure exceeds the diastolic pressure
of a randomly chosen with diabetes, and let 𝜇𝐷 denote the mean of 𝐷. Assuming that 𝐷
has a normal distribution, test the hypothesis 𝜇𝐷 > 40 at the 5% significance level.
[𝑡 = 2.547; evidence that 𝜇𝐷 > 40]
4. An experiment was carried out to compare the difference in the effects of organic and
chemical fertilisers on potato yields. Eleven plots of land were selected and two seed
potatoes were grown on each plot at a distance of 10 m apart. On one potato an organic
fertiliser was used, and on the other, a chemical fertiliser. The choice of which to use
was decided by tossing a coin. The differences in yields, 𝑑 grams, where 𝑑 =(mass of
organic crop – mass of chemical crop), are summarized by
∑𝑑 = −310 and ∑𝑑 2 = 208 702
Assuming that the differences have a normal distribution, test, at the 5% significance
level, whether there is a difference between the population mean yields.
[𝑡 = −0.661, no evidence to show difference]

References:

Chambers, J., Crawshaw, J., & Balaam, P. (2001). A Concise Course in Advanced Level
Statistics with Worked Examples: The Core Course for A-level: Nelson Thornes
Limited.
Miller, J. (2005). Cambridge Advanced Mathematics: Statistics 3&4: Cambridge University
Press.

16

Vous aimerez peut-être aussi