Vous êtes sur la page 1sur 7

Week 8: Lecture 1

Parametric Confidence Intervals


A. Two Sided Confidence Intervals.

A confidence interval for the population mean is an interval that contains plausible
values of the parameter . A confidence interval is associated with a confidence level, which is
usually written as 1 , and it measures the probability that the confidence interval actually
contains the un-kown population mean. Confidence levels of 90%, 95% and 99% are typically
used, which correspond to values of 0.1, 0.05 and 0.01 respectively. In terms of the
terminology used above when discussing the t table, F(t).

It is quite straight forward to derive a confidence intervals for the population mean .
Consider a researcher working with a sample size of n. We know from the last weeks lectures
that the statistic
( )
t= /

has a student t distribution if either the random variable X is normally distributed, or if it is not
normally distributed, that x and s are calculated from large samples (n >30) - for then the central
limit theorem holds. We also know from the student t table above that there is only a (100)%
chance of a t value being in less than or equal to t/2 (with v degrees of freedom). This is the
critical value in hypothesis testing covered in last weeks lecture. Now because the t distribution
is symmetric around the mean of zero, there is also a (100)% chance of a t value being more
than t1-/2. Thus, there is a 95% chance of a t value being between t/2 and t1-/2. For example,
if n = 10, there is a 95% chance of a t value being between -2,26 and +2.26. Expressing this as
a formula

(x )
P (t /2 t1/2 ) = 1
s

With n = 10 and = 0.05, this can be written as

(x )
P (2.26 2.26) = 0.95
s

The inequality

(x )
t /2
s
can be written as
t /2 s
x +

given that t/2 is a negative number. Next, the inequality

(x )
t1/2
s
can be written as
t1/2 s
x

Thus a (100)% confidence interval for the population mean is given by


t /2 s t1/2 s
P (x + x + )=1

The 95% confidence interval for the unknown value of the population mean , for a
researcher that has estimated x and s from a sample of size 10 is therefore

(x 2.26 , x + 2.26 )

More generally, a (1 - % confidence interval for the unknown value of the


population mean , for a researcher that has estimated x and s from a sample of size n, and so
v = n - 1, is therefore

(x v,/2 , x + v,1/2 )

For n > 30, the central limit approximation is good and this confidence interval is
correct irrespective of how the random variable X is distributed. In smaller samples, this
interval is only valid if the assumption that X is normally distributed is also valid.

It follows from the above formula that:

i. The confidence interval length


= 2(v,1 )
2

decreases as you work with larger and larger samples (i.e. L is inversely proportional to the
square root of the sample size n). So to be more certain about the value for the true mean, you
have to collect more data. (Further, we know from the t table that tv,1-/2 gets larger the smaller
the sample size. However, this is a more minor influence compared to the sample size itself.

ii. We also know from the t table that tv,1-/2 gets larger, the larger is the value for 1 - .
So, to be more certain about the range within which the true population mean lies, the wider
that interval or range has to be.

iii. We have demonstrated in past lectures, that as the sample size increases the student t
distribution converges onto the standard normal distribution, and in practice the two
distributions are quite similar for n > 30. That is, for such large samples (i.e. large v), tv,/2
Z. Thus, the above formula for a confidence interval can be replaced with


(x + /2 , x + 1/2 )

Given Z/2 is negative in value. Just as importantly, in such a large sample the central limit
theorem guarantees that x will have a normal distribution. So for n > 30, the confidence interval
given above is approximately correct irrespective of whether the random variable X is normally
distributed.

Many of the above points are well illustrated by taking an example.

Example: Porosities of Battery Plates

Sheet Battery Plates of Excel file Confidence limits Excel Workings contains all the
workings for this example. Nickel-hydrogen batteries use a nickel plate as an anode. A critical
quality characteristic is the plates porosity, which controls the interface of the anode with the
potassium hydroxide electrolyte solution. For this battery cell, the manufacturer has set a target
porosity of 80% as measured by a particular test. The sintering process, whereby the plates are
fired at high temperatures, essentially controls the plates porosity. The production engineers
have expressed concerns that the plate is being over fired and thus is not sufficiently porous.
They therefore take a random sample of 10 recently produced plates and test their porosity.
The results of this experiment, and the subsequent statistical analysis, are shown in the
following screen shot of the above mentioned Excel file.

The 95% confidence interval does not contain the value of 80% for porosity. We thus
have evidence to suggest that the true (population) mean porosity is less than 80%, which
supports the contention that the sintering process is over firing the plates. The plausible values
for the true mean porosity are between 79.02% and 79.4%, meaning we can be 95% certain the
true mean porosity figure is within this range. Put differently, we can state that the true mean
porosity is between 79.02% and 79.4% with only a 5% chance of being proved wrong in this
assertion. The process therefore requires adjusting to increase the porosity, on average, by
0.8%. These conclusions assume that the porosity of the plates under the given operating
conditions of the sintering process are normally distributed as the sample size is too small to
invoke the central limit theorem.
B. One Sided Confidence Intervals

One sided confidence intervals can be useful if only an upper bound or a lower bound
on the population mean is of interest. For example, researchers that are routinely
administering radioactive tracers to patients need to know the safe distance between themselves
and their patient. It is of little interest to know how far way they can be, but it is of greater
interest to know exactly how close they can get before endangering themselves by regular
exposure at close distance.

A (1 - % confidence interval for the unknown value of the population mean ,


for a researcher that has estimated x and s from a sample of size n is therefore

(-, x + v,1 )

This confidence limit provides an upper bound on the population mean . The following
confidence limit provides a lower bound on the population mean


(x + v, , )

The above points are well illustrated by taking an example.

Example: Galvanized Coatings.

Sheet Coatings of Excel file Confidence limits Excel Workings contains all the
workings for this example. Consider a galvanized coating process for large pipes. Standards
call for an average weight of at least 91 kg. An experiment is carried out where by the weight
for a random sample of 30 pipes are measured. The results of this experiment, and the
subsequent statistical analysis, are shown in the following screen shot and in the above
mentioned Excel file.

A 90% one sided confidence interval providing a lower bound for , is 93.19 to . The
average weight value of 91 kg is not within this interval. We thus have evidence to suggest that
the true (population) mean weight is never below 93.19 kg with 90% certainty. This supports
the contention that the coating process is capable of producing pipes whose average coating
weight is at least 91 kg. The conclusion arrived at using the above procedures requires no
distributional assumptions as the sample large enough to invoke the central limit theorem.

C. Comparing Two Population Means.

More typically research engineers are interested in the characteristics of two or more
distinct populations. For example, do two different levels of paint viscosity produce different
mean coating thicknesses? Do different heat treatments applied to the same steel alloy produce
different average tensile strengths? Do different cement mixes produces concrete with different
mean levels of slump? These questions are easily answered through a straightforward extension
of the above procedures. A confidence interval for each population mean is formed and if they
overlap, the hypothesis that the two population means are the same cannot be rejected. If they
dont overlap the two population means are different, i.e. the hypothesis that the two population
means are the same can be rejected. An example, can be used to illustrate this simple extension.

Example: Cylinder Heads in liquid cooled aircraft engines.


Sheet Engines of Excel file Confidence limits Excel Workings contains all the
workings for this example. Eck Industries incorporated manufactures cast aluminum cylinder
heads used for liquid cooled aircraft engines. The wall thicknesses are critical for high altitude
applications and so the company frequently carries out thickness tests by sectioning the heads
which is clearly a destructive and therefore expensive procedure. Sectioning is known to
produce very accurate thickness measurements but has the disadvantage of being slow leading
to difficulties in shipping orders on time. The tested parts are also scrapped and so the test is
very expensive.

Ultrasound is a much quicker and cheaper alternative approach to measuring thickness


as it is non-destructive. The company is considering replacing sectioning with ultrasound
testing but needs to be assured that this new test procedure is as accurate as sectioning. The
company therefore undertakes an experiment where by 18 randomly chosen cylinders are
subjected to both measurement procedures. The results of this experiment, and the subsequent
statistical analysis, are shown in the following screen shot.

The 90% confidence intervals for the two testing methods do overlap. There is therefore
no evidence to suggest that the true (population) mean thicknesses associated with each test
method are significantly different from each other. The ultrasound technique produces a
thickness measurement which on the average is not different to that obtained from sectioning.
The company can therefore safely (i.e. with 90% confidence or certainty) introduce the non-
destructive test procedure and thereby speed up delivery times and remove the scrapping cost
associated with the sectioning technique. This conclusion requires the assumption that the
thickness measurements from the two different test methods are both normally distributed as
the sample size is too small to invoke the central limit theorem.

D. Calculating Requirement for the Sample Size

Engineers often need to collect data to estimate an unknown parameter of interest within
a given precision. For example, what is the mean force required to break a polymer filament to
within 100 lb. A reasonable question is therefore how many polymer filaments do we need
to test to obtain this given level of precision?

The confidence interval length provides a neat answer to this question. From the above
definition of a confidence interval, the length of that interval is


= 2(v,1 )
2

If a confidence interval with a length no longer than Lo is required, then the required
sample size is found as


= 2(v,1 )
2

which can be solved for n


2
v,1
2
4( )

Example: Metal Cylinder Production.

With a sample of n = 60 metal cylinders, a 99% confidence interval (49.953, 50.045)


has been calculated with a length of L = 50.045 49.953 = 0.092 mm. The sample standard
deviation is 0.134 mm. How much additional sampling is required to provide the increased
precision of a confidence interval with a length of just L0 = 0.08 mm at the same confidence
level?

Using = t1- = t19,

2
v,1
0.134 2.861 2
2
4( ) 4( ) 92
0.08

A sample size of at least 92 is required. Therefore, the engineers can anticipate that an
additional sample of at least 92 60 = 32 cylinders is needed to meet this specified goal.

Vous aimerez peut-être aussi