Vous êtes sur la page 1sur 34

Sampling Distribution

Estimation and Testing of hypothesis


Sampling
The process of selecting a sample from the population
is referred to as sampling and number of units in the
sample is called the sample size.
If the sample size is less than 30 then it is called as a
small sample otherwise called as large sample.
The objective sampling is to estimate characteristics
(parameter values of distribution) of the whole or
population
The sampling process comprises of several stages:
i) Defining the population of concern
ii) Specifying a sampling frame or a set of items or
events possible to measure
iii) specifying sampling method for selecting items or
events from the frame
iv) Determining the sample size
v) Implementing the sampling plan
vi) Sampling and data collection
Sampling
Population: A population can be defined as including
all people or items with the characteristics one wish to
understand.
Sometimes that which defines a population is obvious.
For example , a manufacturer needs to decide whether
a batch of material from production is of high quality to
be released to the customer, or should be considered as
scrap or rework due to poor quality. In this case the
batch is the population ,
Sampling
Study population: The group from which we actually
draw sample is known as the study population
Sample frame : A list of elements from the study
population is called a sample frame.

Sampling frame is a property by which we can identify
every single item and include in our sample.
The most straightforward type of frame is a list of the
population of elements of the population with
appropriate contact information

For example in an opinion poll, possible sampling frames
include register and telephone directory.
Sampling schemes
Simple random sampling (SRS) : In simple
random sampling , units are independently
selected one at a time until the desired sample
size is achieved.
In this type of sampling every unit in the
population has an equal probability of being
selected in the sample.
SRS can be vulnerable to sampling errors
because the randomness of the selection may
result in a sample that does not reflect the make
up of the population

Sampling schemes
This SRS can be further classified as Simple Random
Sample With Replacements (SRSWR) and Simple
Random Sample Without Replacement (SRSWOR).

In SRSWR an item selected is replaced for the next draw
or selection . Thus all the outcomes are independent.
In this case we say that chosen items x
1
,x
2
,.....x
n
form
an independent and identical distribution.

In SRSWOR an item selected is not replaced back thus
for every draw one item is always less . Thus the
outcomes are not independent.

Sampling schemes
Case 1 : If N is the size of finite population and n is the
size of sample then if we follow SRSWR we have N
n

samples.
For example,

Let N={1,2,3} form a population.
If we draw a sample of size n= 2 with replacement then
number of samples is given by N =3
2
=9.
{(1,1),(1,2),(1,3);(2,1),(2,2),(2,3);(3,1),(3,2),(3,3)}

Case 2 : Suppose N={1,2,3} is the finite population and
n=2,however if we follow SRSWOR then the number of
samples=3C
2
=3.2/1.2=3
Thus (1,2),(2,3),and (3,1).

Sampling schemes
Stratified sampling: SRS does not take into
consideration any information that is known
about the elements of a population and might
affect the characteristics of interest
Under this sampling scheme, it is also possible
that a particular subgroup of the population
would not represented merely by chance.
In Stratified sampling we divide population into H
distinct subgroups, or strata such that the h
th

stratum has size N
h
. A separate SRS of size n
h

is then selected from each stratum.

.
Sampling Distribution
Consider all possible sample of size N which can be
drawn from a given population . For each sample we
compute some statistical constants say the mean or s.d
that is
x
1
,x
2
,......, x
k
or s
1
,s
2
,....s
k
and these constants vary
from sample to sample which in turn leads a distribution
called the sampling distribution.
If we compute the some statistic, say , x of these
sample means then it is called the sampling distribution
of the mean . Similarly we may have s.d. the proportions
etc sampling distribution.


Mean and s.d of sampling distribution
Mean and s.d of sampling distribution,
given a finite population N and sample
size n
Case1: SRSWOR: If we denote the mean
and s.d of the sampling distribution of
means by
x
and
x
and the population
mean and s.d by and respectively
then
x
= and
x
={/sqrt n}sqrt{(N-n)/N-1}
Mean and s.d of sampling distribution
Case(2) SRSWR: If the sampling is done
with replacement or if the population is
infinite then the above result reduces to

x
= and
x
={/sqrt (n)}
Sampling distribution of proportions
(Binomial Distribution)

From the population determine the proportion of success
p,(p=n/N) in each sample .
Thus we obtain a sampling distribution of proportion
whose mean
p
and s.d
p
are given by

x
=p and
p
=(p(1-p)/n)

Problem: Given N={1,2,3} find sample mean and
variance by taking 2 samples with and without
replacement

Example
Problem: Given N={1,2,3} find sample mean and variance by taking 2
samples with and without replacement

Case 1: With replacement : The various samples are
(1,1),(1,2),(1,3);(2,1),(2,2),(2,3);(3,1),(3,2),(3,3)
Mean of these samples are 1 , 1.5 , 2.5 , 2 , 1.5 , 2 , 2.5 , 2 , 2.5 , 3
The frequency distribution of these means are:

x 1.0 1.5 2.0 2.5 3.0
f 1 2 3 2 1
Example

x
= 1+3+6+5+3 =2
9


x
= (1/9)[1(1-2)+2(1.5-2)+3(2-2)+2(2.5-2)+1(3-2)]

=(1/3)

Thus we have
x
= 2=


x
= (1/3 ) and /n= (2/3)/2= (1/3)

Case 2: SRSWOR samples are (1,2),(2,3),(3,1) and mean of these
samples are 1.5, 2.5, 2

x
= (1.5+2.5+2) =2
3


Example

x
=(1/3)[(2-1.5)+(2-2.5)+(2-2)]=(2/3)


x
= = 2

((N-n)/(N-1))/n = [(3-2)/(3-1)](2/3)/2] = (1/6)


x
=(1/6)
NOTE: In case SRSWR
x
= = 2 and Variance = SE
and in case of SRSWOR
x
= = 2 but Variance SE
Example
A population consists of 4 numbers
{3,7,11,15}.Find the mean and s.d. of the
population. Also find (a) mean and s.d. of the
sampling distribution of means by considering
samples of size 2 with replacement. (b) If N and
n are respectively denote population size and
sample size , and
x
denote respectively
population s.d. and s.d. of sampling dist. of
means without replacement verify that

2
x
=
2
/n{N-n/N-1}
Theory of estimation
The theory of estimation is divided into two
groups: Point estimation and interval
estimation.
In point estimation of a single statistic
value is used to provide an estimate of the
population parameter, where as in an
interval estimation probable range is
specified with in which the true value of
the parameter might be expected to lie
Theory of estimation (contd)
Characteristics of a good estimator: A
good estimator is one which is as close to
the value of the parameter as possible.
Following are the Characteristics of a good
estimator
Un-biasedness, Consistency, Efficiency
and sufficiency
Theory of estimation (contd)

A good estimator is one which is as close to the
true value of the parameter as possible.
Following are the characteristics of a good
estimator
Un-biasedness
Consistency
Efficiency
Sufficiency

Characteristics of a good estimator

Unbiased :An estimation of a parameter is
said to be unbiased if
Consistent: An estimation of of is
consistent if converges to in probability.
Efficiency :Let
1
and
2
be two unbiased
estimators of a parameter .
Then
1
is said to be more efficient then

2
if
var
1
<
2
.

.

( E

Characteristics of a good estimator



Sufficiency :Suppose is a statistic which
is such that, given the conditional pdf of
any other characteristic
*
which is not a
function of does not involve , then is
called a sufficient estimator of .
The term sufficient is used because the
knowledge of gives all the knowledge
about

Example
A random sample of size 100 has mean
15, the population s.d. variance is 16. find
the interval estimate of the population
mean with a confidence level of (i) 95%
and (ii) 97%
Maximum Likelihood method
Most commonly used method for
estimating population parameters.
It consists of maximizing the likelihood of
probability of randomly obtaining a set of
sample values.
Mathematically let x
1
,x
2
,.x
n
be a random
sample of size n from a population with pdf
p(x, ) where is the unknown
parameter.
Maximum Likelihood method (contd.)
Then an estimate of is obtained on
maximizing the likelihood function
L= p(x
1
,x
2
,x
n
, ) =
Using the principle of maxima and minima in
calculus, maximum likelihood estimator (MLE)
is the solution if any of the function
) , (
1

i
n
i
x p

0 0
2
2


L
and
L
Maximum likelihood
To obtain MLE of the parameter p on a
sample of size n :
If x
1
,x
2
,.x
n
is a sample of size n drawn
from a population p(xi,) ,i=1,2,..n the
likelihood function based on these
observations can be written as
L() = p(x
1
,) p(x
2
,).. p(x
n
,)=p(x
i
,),i=1,2..n
=nC
x
p
x
(1-p)
n-x

=Log
(
nC
x
)
p x (1-p) n-x
Testing of statistical hypothesis
A statistical hypothesis is a statement about the probability law of
a random variable (which can be sampled)
The test of a hypothesis is to give a decision rule to accept or
reject the hypothesis on the basis of an observed sample of size,
say n
Examples of statistical hypothesis
(i) A normal distribution has a specified mean and variance.
(ii) A normal population has a specified mean (variance is not
specified)
(iii) Distribution is normal ;(neither mean nor variance is specified)
(iv) Two continuous distributions are identical (no parameter is
specified)
Hypothesis (i) and (ii) are examples of parametric hypotheses
wherein a statement was made regarding the values of one or
more parameters of a distribution. (iii) and (iv) are examples of
non-parametric hypotheses.
Statistical hypothesis.

Let X
1
,X
2
,..X
n
be a sample of size n of the random
variable X. This is represented by the point (X
1
,X
2
,X
n
)
in n-dimensional sample space. Let us denote this point
by .In principle we can always select some region
in the sample space such that when the sampling
distribution is known it is possible for us to determine the
probability that the sample point lies in .
Any hypothesis concerning P( ) is a statistical
hypothesis {prob. ( belongs to )}.

Classification of hypothesis
Simple hypothesis: A hypothesis is said to simple if it specifies all
the parameters of a distribution.
Composite hypothesis: A hypothesis is said to composite if it
specifies only a proper subset of the set of parameters of a
distribution.
Suppose the distribution has l parameters and a hypothesis
specifies k of these. The hypothesis is said to be simple if k=l and
composite if k < l.
Geometrically, the l parameters determine a parameter space of
dimensions l and in the case of a simple hypothesis a unique point
of the parameter space is selected.
In the case of a composite hypothesis , the hypothesis selects a
sub-space of the parameter space containing more than one point
l-k is known as the degrees of freedom of the hypothesis
Null hypothesis and alternative hypothesis
Null hypothesis (H
o
): The null hypothesis is a
statement made about the population and is
tested for possible rejection under the
assumption it is true.
A null hypothesis is denoted by H
o

Example H
o
: =
o
Alternative hypothesis (H
1
): A statement which is
complementary to null hypothesis
An alternative hypothesis is denoted by H
1

Example H
1
:
o
(two- tailed test)
H
1
: <
o
(left tailed test)
H
1
: >
o
(right tailed test)

Type I and Type II Errors
In testing of hypothesis, if we reject a hypothesis when it is true, we are
committing an error which is usually known as type I error and accept
when it is false we are committing an error known as type II error

Action
H
o
true H
1
true
Reject H
o
Type I error Right
decision
Accept H
O
Right
decision
Type II error
Type I and Type II Errors
Significance level: the probability used as
the criterion for rejection is called
Significance level, denoted
The value of the test statistic
corresponding to is termed as the
critical value of the test statistic
Power of the test: The power of a
statistical testing procedure is defined as
the probability that test will correctly reject
the null hypothesis

Critical and acceptance regions
In testing hypothesis, we divide the sample space into
two mutually disjoint regions , and .If the sample
point lies in the region , we reject the hypothesis If
the sample point lies in the region , we accept the
hypothesis
is called the region of rejection and is called
region of acceptance

acceptance
region
Critical
region
Two tailed and one One-tailed test
Two- tailed test or one-tailed test depends on the type of alternative
hypothesis we choose
H
o
: =
o
and
H
1
:
o
(two- tailed test)
H
1
: <
o
(left tailed test)
H
1
: >
o
(right tailed test)


If rejection region is considered on either side of acceptance region
then we have two-tailed test


If rejection region is considered only one side (either to left or right) of
acceptance region then we have one-tailed test
Given a level of significance the critical value of the statistic differs



Standard error (SE)
Standard error (SE) of a statistic is the
standard deviation of sampling distribution
of that statistic.
SE= /sqrt(n) {srswr and large population}
SE= /sqrt(n) { sqrt[N-n/N-1]}(srswor finite
population)

Vous aimerez peut-être aussi