Vous êtes sur la page 1sur 10

HACS Hapuarachchi@2012

Estimation and Hypotheses Testing


Statistical Estimation
A form of Inferential Statistics which consists in estimating a parameter of a population from
a corresponding sample statistic is called estimation. There are two types of estimation:
Point estimation
Which consists of the use of a single sample statistics to determine a single value of a
population, and which is to be used as an estimate of a population parameter.
Interval estimation
Interval estimation involves the determination of an interval of values within which the
population parameter must lie with a given confidence.
Estimator and Estimate
An estimator is a statistics used to estimate population parameter; it is a function of the
sample observations.
An estimate is a particular value, which the estimator takes on a set of data.
Suppose that we are interested in estimating the mean of a certain population. Let

1 , 2 , 3 , 4 , 5 be a random sample of size 5 from this population. Let 3, 5, 2, 1, 2 be


one observed sample from this population.

= = is an estimator of a mean , and


=
Then

of mean .

= . is the estimate

= .

Examples:
Parameter

Estimator

(++++)

Estimate

= .

Page 1 of 10

HACS Hapuarachchi@2012

Point Estimation
The value of a sample statistics must contain evidence about the value of the corresponding
population value, and a central problem in inferential statistics is the use of sample statistics
as estimators of population values.
The use of sample a sample value to infer the population is called point estimation, since a
single (or point in the space of all possible values) is taken as the estimate. Estimation
methods:
1. Methods of moments
2. Method of Maximum Likelihood
3. Method of Least Square
Properties of Point Estimators
1. Unbiased
2. Efficiency
3. Consistency
4. Sufficiency
Un-biasedness
We would obliviously like the sampling distribution to cluster around the true value of the
parameter.
An estimator is unbiased if the mean (the expected value) of its sampling distribution equals
the parameter estimating.
An estimator of the parameter made from the sample statistic is said to be unbiased
estimate if

E =

For example, consider the mean of a sample as an estimator of the mean of the population.
Here,

= = =1
and =

E[] =

Therefore, the sample mean is an unbiased estimator of the population mean .


Page 2 of 10

HACS Hapuarachchi@2012

Efficiency
Suppose that there were two different sample statistics 1 and 2 calculated from the same
data, and that these two statistics were each unbiased estimators of the same population
parameter .

The standard error of the 1 is 1 and the standard error of the 2 is 2 .


If 1 < 2 then, 1 is relatively efficient than 2 .
Interval Estimation

It is clear that sample statistic ordinary will not be equal to true population value because of
sampling error.
It is necessary to qualify our estimate in some way to indicate the general magnitude of this
error.
Usually this is done by showing a confidence interval, which is an estimated range of values
with a given probability of covering the true population value.
( < < ) =

Confidence Interval for a Population Mean


Confidence Intervals for the mean of a population with known Variance
2
Let X be a random variable and ~, 2 , then ~ , .
2
If population distribution is unknown, for large n by CLT, ~ , . Thus

~(, )

Now we have to find L and U such that P(L < < U) = 1 where (1 ) is known as
level of confidence.

Page 3 of 10

HACS Hapuarachchi@2012

Since the quantity =

~0, 1, then

Pr(L < < U) = 1



X

Pr a <

Pr Z2 <
Pr

< b = 1

< Z2 = 1

2 < < +

2 = 1

+
and =

( )% Confidence interval for is


Interpreting Confidence Intervals

Probabilistic Interpretation
In repeated sampling, from a normally distributed population (or when sample is large for a
unknown distribution) with a known variance, 100(1 )% of all intervals of the form

2 in the long run will include the population mean .

Practical Interpretation

When sampling from a normally distributed population (or when sample is large for a
unknown distribution) with a known variance, we are 100(1 )% confident that the single

computed interval 2 contains the population mean .

Components of the Confidence Intervals

In general, the 100(1 )% interval estimate may be expressed as follows


Note:

( ) ( )

Page 4 of 10

HACS Hapuarachchi@2012

When 1 = 0.95, the interval is called the 95% confidence interval. Then the
reliability factor0.052 is 1.96.

When 1 = 0.99, the interval is called the 99% confidence interval. Then the
reliability factor0.012 is 2.58.

Example 1:

A Physical therapist wished to estimate, with 99% confidence, the mean maximal strength of
a particular muscle in a certain group of individuals. He is willing to assume that strength
scores are approximately normally distributed with a variance of 144. A sample of 15 objects
who participated in the experiment yielded a mean of 84.3.
Confidence Interval for the Mean of a Population with unknown Variance

Let X be a random variable and ~, 2 , and variance is unknown. Then

~1

Now we have to find L and U such that P(L < < U) = 1 where (1 ) is known as
level of confidence.

Since the quantity =


Pr(L < < U) = 1

X

Pr a < S

(1), 2

~1, then

< b = 1

Pr (1), 2 < S
Pr

< (1), 2 = 1

< < +

(1), 2

+
(), and =

=1

(),

( )% Confidence interval for is


Example 2:

(),

A sample of 16 ten year old girls had a mean weight of 71.5 pounds and standard deviation of
12 founds, respectively. Assuming Normality, find the 90% CI for .
Note:

For large sample size, 1, can be approximated by 2


2

Page 5 of 10

HACS Hapuarachchi@2012

Confidence Interval for a Population Proportion


To estimate a population proportion a sample is drawn from the population of interest, and
the sample proportion, p is computed. This sample proportion is used as the point estimator
of the population proportion.
If p is the proportion of success in a fixed number n of binomial trials, then for large [when
both and (1 ) are greater than 5] the distribution of ,
~ ,

Where is true population proportion.

(1 )

(1)

The standard error of p is equal to =

Since , the parameter (true population proportion) we trying to estimate, is unknown, we


(1)

must use p as an estimate. Thus, we estimate by , =


100(1 )% Confidence Interval for is given by

Example 3:

, and;

(1 )
2

Suppose 1600 of 2000 union members sampled said they plan to vote for the proposal to
merge with the UMA. Using 0.95 level of confidence, find the interval estimate for the
population proportion.
Confidence Interval for the Difference between two Population Means
Here two samples are from independent populations and we consider the distribution of
difference between two sample means.
An unbiased point estimator of the difference between tow population means is provided by
the difference between two sample means.
Sampling from Populations with Known Variances
The sampling distribution of difference between two sample means when both populations
are normal, or for large samples taken from populations with unknown distributions;
Page 6 of 10

HACS Hapuarachchi@2012

Therefore,

1 2 ~ 1 2 ,

12 22
+
1 2

+

( )% Confidence Interval for ( ) is

Example 4:

ABC Company produces a synthetic fiber at two factories located in different parts of the
country. Every effort is made to maintain uniformity of production between the two factories
with respect to the mean breaking strength of the fiber. To determine whether the two
factories are maintaining uniformity of production, the manufacturer selects a sample of 25
specimens from factory 1 and a sample of 16 specimens from factory 2.
The objective is to construct a CI for the difference between the two population means. The
mean breaking strength of the sample from factory 1 is 22 pounds. The mean breaking
strength of the sample from factory 2 is 20 pounds. The variance in both factories is known to
be 10lb2. The populations are normally distributed. The desired confidence coefficient is
0.95.
Sampling from Populations with Unknown Variances
When population variances are unknown we wish to find the distribution of difference
between two sample means, we can use the t distribution for the sample mean difference
.

We must know, or willing to assume, that the two sampled populations are normally
distributed or sample sizes should be large.

With regard to the population variances, we distinguish between two situations:


1. Population variances are equal
2. Population variances are not equal
Case I: Assumptions
1. Two populations are normally distributed
2. Populations are independent
3. Population variances are equal
Page 7 of 10

HACS Hapuarachchi@2012

If the assumption of equal variances is justified, the two sample variances that we compute
from two samples may be considered as estimates of the same quantity, the common
variance. We can obtain a pooled estimate of the common variance as follows.
Pooled Variance: 2 =

(1 1)1 2 +(2 1)2 2


1 +2 2

Then the standard error of 1 2 can be given by:


1 2
a. For Small sample

2 2
1
1

=
+
= +
1
2
1 2

Then the quantity,

Thus we have

(1 2 ) (1 2 )
~1 +22
1 2

( )% CI for ( ) is (
+ , +

b. For large sample


Then the quantity,

Thus we have

(1 2 ) (1 2 )
~(0, 1)
1 2


) +
( )% CI for ( ) is (

Case II: Assumptions

1. Two populations are normally distributed


2. Populations are independent
3. Population variances are unequal
Then the standard error of can be given by:

1 2 2 2
=
+
1
2

Page 8 of 10

HACS Hapuarachchi@2012

a. For small sample


(1 2 ) (1 2 )
2

1 + 2
1
2

Where

=
Then,

2 2
1 + 2
1
2
2

2
2
1
2
1
2
1 1 + 2 1


) +
( )% CI for ( ) is (
,

b. For large sample


(1 2 ) (1 2 )
2

Then,

Example 5:

1 + 2
1
2

~(0, 1)


) +
( )% CI for ( ) is (

Experimenters with Bancroft Chemicals test two types of fertilizer for possible use in the
cultivation of cabbages. They grow the cabbage in two different fields. One of the two
fertilizers is applied in each field. At harvest time they select a random sample of 15
cabbages from the crop grown with fertilizer 1. They randomly select 12 cabbages from the
crop grown with fertilizer 2. The sample mean and variance of the weight of cabbages grown
with fertilizer 1 are 44.1 oz and 36 oz2. The mean weight computed from the second sample
is 31.7 oz and the variance is 44 oz2.
The experimenters assume that the two populations of weights are normally distributed. They
also assume that the two population variances are equal. Find a 95% CI for the difference
between mean weights of cabbages grown with two fertilizers.

Page 9 of 10

HACS Hapuarachchi@2012

Confidence Interval for the Difference between Two Population Proportions


We may want to compare, for example, men and women, two age groups, two socio
economics groups, or two diagnostic groups with respect to the proportion possessing some
characteristic of interest.
When n1 and n2 are large and population proportions are not too close to 0 or 1, the central
limit theorem is applied and normal distribution may be employed to obtain confidence
intervals.
The standard error of the estimate usually must be estimated by
1 (1 1 ) 2 (1 2 )
1 2 =
+
1
2

Since, as a rule, the population proportions are unknown.

Then the confidence interval for the difference between two population proportions is given
by,
1( )% CI for is

Example 6:

( )

( )

( )

A simple random sample of 150 industrial firms of Type A shows that 20% of them spend
more than 3% of their total sales on advertising. A similar independent sample of 150
industrial firms of Type B shows that 27% of them spend more than 3% of their total sales on
advertising. Construct 95% CI for the difference of proportions of Type A and Type B
industrial firms spend more than 3% their total sales on advertising.

Page 10 of 10

Vous aimerez peut-être aussi