Vous êtes sur la page 1sur 25

The t (Student) Distribution

Applied Statistics and Probability

2014/2015

Ira M. Anjasmara

Program Studi Magister Geomatika


Teknik Geomatika - ITS
Small Samples

Recall the central limit theorem:


the sampling distribution of the mean approaches a normal probability
distribution when the sample size is greater than or equal to 30.
Often, our sample size will be less than 30. This means:
the central limit theorem no longer applies;
we can’t use normal probability tables;
we can’t approximate σ by s, because for large n, σ ≈ s, but not so
for small n.
For small samples we have to consider other methods of estimation and
hypothesis testing.

Applied Statistics and Probability 2/22 The t (Student) Distribution


The t Distribution
The t distribution was developed in early 1900’s by William Gosset who
worked for Guinness Breweries. He published under the pseudonym
“Student”.

The t distribution is concerned with small samples (n < 30) drawn from a
population that has a normal distribution.

Importantly, we don’t need to know the value of σ

Unlike the normal distribution, there is a unique t distribution for each


sample size. Each t distribution:
has a mean of zero;
is symmetrical about the mean;
has a variance is greater than 1 (cf. the standard normal distribution,
where σ 2 = 1);
is less peaked at the mean, and has thicker tails than the normal
distribution.
Applied Statistics and Probability 3/22 The t (Student) Distribution
The t Distribution

As the sample size increases, the t distribution approximates to the normal


distribution, until at n = 30, the two are almost visually indistinguishable.

As with all probability density functions, the total area beneath the t-curve
is 1.

The probability density function for the t distribution is:


−(v+1)/2
Γ( v+1 x2

2 )
f (x, v) = 1 1+
Γ( v2 )(πv) 2 v

where Γ is the “gamma function”

Applied Statistics and Probability 4/22 The t (Student) Distribution


Degrees of Freedom

When using the t distribution, we need to know the degrees of freedom


(ν) of the sample.

Technically, degrees of freedom are the difference between the number of


observations (sample size) and the number of independent parameters
estimated from that sample.
From a sample of size n we estimate the standard deviation, s, and
the mean, x̄.
So, the number of independent parameters estimated = 1 (because
the standard deviation is dependent on the mean)
Hence:
ν =n−1

Applied Statistics and Probability 5/22 The t (Student) Distribution


Degrees of Freedom
As mentioned, the number of degrees of freedom dictates the shape of the
t-curve. Here are some curves of different ν:

Applied Statistics and Probability 6/22 The t (Student) Distribution


Probability Estimation

The procedure for estimating probabilities from the tdistribution is similar


to normal distribution estimation, but we use a different table.

The t distribution table shows the t-score corresponding to a particular


area (from a choice of 5) in the upper tail:
these scores are shown for different degrees of freedom;
they are usually denoted tν,α , showing its dependence on ν and α.

Note: as mentioned, as the degree of freedom increases, the t distribution


approximates to a normal distribution:
i.e., for ν = ∞, the t-score = z-score.

Applied Statistics and Probability 7/22 The t (Student) Distribution


The tTable
The tables for the t distribution look something like this:

The numbers in the first column give the degrees of freedom; the numbers
in the first row represent the area of the upper tail.

The numbers in the main body of the table give the t-score corresponding
to those particular values of ν and α., i.e., tν,α .
Applied Statistics and Probability 8/22 The t (Student) Distribution
The t Table

The highlighted value in the table gives the t-score for 14 degrees of
freedom, and an area in the upper tail of 0.05 (5%), i.e. t14,0.05 = 1.761

Applied Statistics and Probability 9/22 The t (Student) Distribution


Hypothesis Testing

The small sample procedure for hypothesis testing about a population


mean follows the same 8-step procedure as for a normal distribution,
except:
we use tν,α in place of zα as the critical value;
we use the following test statistic:
x̄ − µ
t=
sx̄
where
s
sx̄ = √
n

Applied Statistics and Probability 10/22 The t (Student) Distribution


Example

A manufacturer of calculator batteries claims his batteries last an average


life of 500 hours. From a test of 25 batteries, the mean life was 518 hours
with a standard deviation of 40 hours. At the 0.05 level of significance, is
the average battery life at least 500 hours?

Applied Statistics and Probability 11/22 The t (Student) Distribution


Example

A manufacturer of calculator batteries claims his batteries last an average


life of 500 hours. From a test of 25 batteries, the mean life was 518 hours
with a standard deviation of 40 hours. At the 0.05 level of significance, is
the average battery life at least 500 hours?

Take 500 hrs as the population mean.


We therefore want to test whether the sample mean from the new data
(518) indicates that this value is too low.
We have: µ = 500, x̄= 518, s = 40, n = 25, α = 0.05.

Applied Statistics and Probability 11/22 The t (Student) Distribution


Example

Step 1

Formulate alternative hypothesis: Ha : µ > 500


i.e., test whether the true population mean is actually more than the
established value.
Formulate null hypothesis: H0 : µ ≤ 500
i.e., assume the given population mean is correct, and the sample
data are misleading.

Step 2

Determine number of tails.


This is a 1-tailed test, because the null hypothesis has an inequality.

Applied Statistics and Probability 12/22 The t (Student) Distribution


Example

Step 3

Determine level of significance and degrees of freedom:


We are told that the significant level is α = 0.05.
From n=25, we get v=25-1=24

Step 4

Determine the critical value of t:


We have a 1-tailed test, so we need to find tν,α = t24,0.05
From the t distribution table, we have:
t24,0.05 =1.711

Applied Statistics and Probability 13/22 The t (Student) Distribution


Example

Step 5

Determine the rejection region:


The null hypothesis will be rejected if µ > 500, so we have the following
situation:

Since we are testing µ > 500, we are in the RHS of the t curve, therefore
the rejection region is t > 1.711.

Applied Statistics and Probability 14/22 The t (Student) Distribution


Example

Step 6
Determine the test statistic (t-score) from the sample data:
x̄ − µ 518 − 500
t= = 40 √ = 2.25 (1)
σx̄ / 25

Step 7
Compare the test statistic against its critical value: 2.25 > 1.711, therefore
t, and hence x̄, the sample mean, do lie in the rejection region.
Hence, we reject H0 at the 0.05 significance level.

Step 8
Our sample measurement is incompatible with the supposed population
mean at 95% confidence level. Therefore it follows that the battery life is
at least 500 hours at this level.

Applied Statistics and Probability 15/22 The t (Student) Distribution


Confidence Intervals

As with the normal distribution, we can estimate a confidence interval for


the population mean from a small sample, without having any information
about the population.

For a significance level α:

CI = x̄ ± tν,α/2 sx̄

Now, because we are using small samples, we can no longer approximate



the population variance by the sample variance, and must use sx̄ = s/ n in
place of σx̄ .

tν,α/2 is the critical t value, providing an area of α/2 in the upper tail of a t
distribution with ν = n − 1 degrees of freedom.

Applied Statistics and Probability 16/22 The t (Student) Distribution


Example

An angle is measured 20 times with a mean of 30◦ 00 12.500 and a standard


deviation of 4.1”. Develop a 95% confidence interval estimate for the true
value of the mean (µ).

Applied Statistics and Probability 17/22 The t (Student) Distribution


Example

An angle is measured 20 times with a mean of 30◦ 00 12.500 and a standard


deviation of 4.1”. Develop a 95% confidence interval estimate for the true
value of the mean (µ).

At 95% confidence, α = 0.05. Since n = 20, we have ν = 19. From the


tables, tν,α/2 = t19,.0025 = 2.093. Therefore,
 00

CI = 30◦ 00 12.500 ± 2.093 × 4.1

20
= 30◦ 00 12.500 ± 1.9200

So we are 95% confident that the true angle is within these limits.

Applied Statistics and Probability 17/22 The t (Student) Distribution


P -Values

In contrast to the normal distribution, determining P -values for the


t-distribution is hard. This is because of the contrasting arrangement of
the normal and t tables.

Remember that in order to determine a P -value with the normal


distribution, you need to first find the z-score from the data, and then use
the table to find a corresponding probability (upper- or lower-tail area).
That is, the normal tables are arranged so that the probability is given as a
function of z-score.

In contrast, the t tables are arranged with t-score given as a function of


probability (i.e., upper-tail area). Importantly, only a few values of
probability (upper-tail area) are given. So once you have worked out your
t-score from the data, it is almost impossible to work back through the t
table to find the corresponding upper- or lower-tail area.

Applied Statistics and Probability 18/22 The t (Student) Distribution


P -Values

In order to work out P -values from the t distribution, you need to use a
computer program. Fortunately, a search of the Web will provide many
websites to do this, e.g.:

www.tutor-homework.com/statistics tables/statistics tables.html

Alternatively, Microsoft Excel has the function TDIST to work out


P -values for the t distribution, where:

p(t > t0 ) = TDIST(t0 , ν, nt )

for some numerical value t0 , and nt tails (1 or 2).

Applied Statistics and Probability 19/22 The t (Student) Distribution


Example

A certain quantity has an accepted value of 645. A new experiment of 20


observations finds that the sample mean is 655 with a sample standard
deviation of 20. What is the P -value for these data?

Applied Statistics and Probability 20/22 The t (Student) Distribution


Example

A certain quantity has an accepted value of 645. A new experiment of 20


observations finds that the sample mean is 655 with a sample standard
deviation of 20. What is the P -value for these data?

ν=19
x̄ − µ 655 − 645
t= = 20 √ = 2.236
sx̄ / 20
Using Excel (or the website shown above), we find:

P = p(x̄ ≥ 655) = p(t ≥ 2.236) = TDIST(2.236, 19, 1) = 0.0188

Applied Statistics and Probability 20/22 The t (Student) Distribution


Summary of Means Testing

Applied Statistics and Probability 21/22 The t (Student) Distribution


Although the crucial question you have to ask is “is my sample size less
than 30?”, it can be seen that sometimes this requirement is not enough
for use of the t distribution.

This is because the t distribution requires that the population that the
small sample was drawn from must be normally-distributed:
if it is, then you can use the t distribution
if it isn’t, you can’t and must increase the sample size to ≥30, and
then use the normal distribution.

Applied Statistics and Probability 22/22 The t (Student) Distribution

Vous aimerez peut-être aussi