Académique Documents
Professionnel Documents
Culture Documents
Confidence Interval
A. Ramesh
Department of Management Studies
Indian Institute of Technology Roorkee
Nonrandom Sampling
Every unit of the population does not have the same
probability of being included in the sample.
Open the selection bias
Not appropriate data collection methods for most statistical
methods
Also known as non-probability sampling
11 Madhya Pradesh
12 Uttar Pradesh
13 Bihar
14 Rajasthan
15 J & K
16 Tamil Nadu
17 Karantaka
18 Kerala
19 Orissa
20 Manipur
9
0
0
6
0
2
9
4
6
8
4
0
5
1
3
5
8
2
9
8
5
7
6
0
0
7
7
5
8
0
6
4
8
7
9
7
0
3
0
6
1
0
9
1
1
8
4
9
5
6
2
7
5
3
6
5
1
7
1
3
6
5
3
4
6
4
5
0
8
9
5
8
2
3
1
5
0
7
3
8
7
8
4
6
3
6
7
9
6
5
8
7
7
7
8
9
3
9
3
6
6
8
4
4
4
7
6
6
9
7
6
8
5
8
8
4
7
8
6
5
8
3
5
5
3
3
2
2
5
4
8
4
7
9
0
6
6
8
0
0
7
8
0
8
9
0
7
9
1
5
1
5
9
9
6
5
1
3
3
9
5
9
6
5
0
5
1
5
3
8
7
9
9
9
4
9
0
0
1
9
9
7
0
0
2
2
4
7
0
9
1
9
5
0
2
6
4
6
6
3
0
9
2
3
7
5
8
4
7
7
4
8
0
8
8
6
1
4
2
0
1
2
9
1
7
2
2
0
6
4
8
5
4
6
4
8
8
2
3
5
4
7
3
1
6
1
8
5
4
0
5
4
6
3
5
3
6
9
4
1
2
8
1
0
4
9
8
6
7
9
6
1
3
N = 20
n=4
11 Madhya Pradesh
12 Uttar Pradesh
13 Bihar
14 Rajasthan
15 J & K
16 Tamil Nadu
17 Karantaka
18 Kerala
19 Orissa
20 Manipur
Heterogeneous
(different)
between
Heterogeneous
(different)
between
Systematic Sampling
Convenient and relatively
easy to administer
Population elements are an
ordered sequence (at least,
conceptually).
The first sample element is
selected randomly from the
first k population elements.
Thereafter, sample elements
are selected at a constant
interval, k, from the ordered
sequence frame.
k =
n
where:
n = sample size
N = population size
k = size of selection interval
Cluster Sampling
Population is divided into non-overlapping
clusters or areas
Each cluster is a miniature of the
population.
A subset of the clusters is selected randomly
for the sample.
If the number of elements in the subset of
clusters is larger than the desired value of n,
these clusters may be subdivided to form a
new set of clusters and subjected to a
random selection process.
Cluster Sampling
N
Advantages
More convenient for geographically dispersed
populations
Reduced travel costs to contact sample elements
Simplified administration of the survey
Unavailability of sampling frame prohibits using
other random sampling methods
Disadvantages
Statistically less efficient when the cluster elements
are similar
Costs and problems of statistical analysis are
greater than for simple random sampling
Nonrandom Sampling
Convenience Sampling: Sample elements are selected for
the convenience of the researcher
Judgment Sampling: Sample elements are selected by
the judgment of the researcher
Quota Sampling: Sample elements are selected until the
quota controls are satisfied
Snowball Sampling: Survey subjects are selected based
on referral from other survey respondents
Errors
N
N
N
Sampling Distribution of
(parameter)
to estimate
Process of
Inferential Statistics
Sample
x
(statistic )
Select a
random sample
Distribution
of a Small Finite Population
Population Histogram
N=8
Frequency
3
2
1
0
52.5
57.5
62.5
67.5
72.5
Sample
(54,54)
(54,55)
(54,59)
(54,63)
(54,64)
(54,68)
(54,69)
(54,70)
(55,54)
(55,55)
(55,59)
(55,63)
(55,64)
(55,68)
(55,69)
(55,70)
Mean
54.0
54.5
56.5
58.5
59.0
61.0
61.5
62.0
54.5
55.0
57.0
59.0
59.5
61.5
62.0
62.5
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Sample
(59,54)
(59,55)
(59,59)
(59,63)
(59,64)
(59,68)
(59,69)
(59,70)
(63,54)
(63,55)
(63,59)
(63,63)
(63,64)
(63,68)
(63,69)
(63,70)
Mean
56.5
57.0
59.0
61.0
61.5
63.5
64.0
64.5
58.5
59.0
61.0
63.0
63.5
65.5
66.0
66.5
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
Sample
(64,54)
(64,55)
(64,59)
(64,63)
(64,64)
(64,68)
(64,69)
(64,70)
(68,54)
(68,55)
(68,59)
(68,63)
(68,64)
(68,68)
(68,69)
(68,70)
Mean
59.0
59.5
61.5
63.5
64.0
66.0
66.5
67.0
61.0
61.5
63.5
65.5
66.0
68.0
68.5
69.0
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Sample
(69,54)
(69,55)
(69,59)
(69,63)
(69,64)
(69,68)
(69,69)
(69,70)
(70,54)
(70,55)
(70,59)
(70,63)
(70,64)
(70,68)
(70,69)
(70,70)
Mean
61.5
62.0
64.0
66.0
66.5
68.5
69.0
69.5
62.0
62.5
64.5
66.5
67.0
69.0
69.5
70.0
20
Frequency
15
10
5
0
53.75
56.25
58.75
61.25
63.75
66.25
68.75
71.25
450
400
350
300
250
200
150
100
50
0
Means of 60 Samples (n = 2)
from an Exponential Distribution
F
r
e
q
u
e
n
c
y
9
8
7
6
5
4
3
2
1
0
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
Means of 60 Samples (n = 5)
from an Exponential Distribution
F
r
e
q
u
e
n
c
y
10
9
8
7
6
5
4
3
2
1
0
0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 3.50 3.75 4.00
F
r
e
q
u
e
n
c
y
14
12
10
8
6
4
2
0
0.00
0.25
0.50
0.75
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
F
r
e
q
u
e
n
c
y
250
200
150
100
50
0
0.0
0.5
1.0
1.5
2.0
2.5
3.0
X-bar
3.5
4.0
4.5
5.0
Means of 60 Samples (n = 2)
from a Uniform Distribution
F 10
r 9
e 8
q 7
u
6
e
n 5
c 4
y 3
2
1
0
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00 4.25
Means of 60 Samples (n = 5)
from a Uniform Distribution
F 12
r
e 10
q
u 8
e
n 6
c
y 4
2
0
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
4.25
25
20
15
10
5
0
1.00
1.25
1.50
1.75
2.00
2.25
2.50
2.75
3.00
3.25
3.50
3.75
4.00
4.25
Marquis de Laplace
30),
, is approximately
Distributions of Samples..
Sampling distributions drawn from a
uniformly distributed population start to
look like normal distributions even with a
sample size as small as 2.
If the sample size is large enough they form
nearly perfect normal distributions
http://www.statisticalengineering.com/central_limit_theorem.htm
Uniform
Population
n=2
n=2
n=5
n=5
n = 30
n = 30
Normal
Population
n=2
n=2
n=5
n=5
n = 30
n = 30
and
Example
Population Parameters: 85, 9
Sample Size: n 40
87 X
P ( X 87) P Z
87
PZ
87 85
P Z
9
40
P Z 1.41
.5 ( 0 Z 1.41)
.5 .4201
.0793
Graphic Solution
to Example
40
1. 42
.5000
.5000
.4207
.4207
85
87
X- 87 85 2
Z=
1. 41
9
1. 42
n
40
Equal Areas
of .0793
1.41 Z
Modified Z Formula
Nn
N 1
X
Z
N n
n
N 1
Z= p p
pq
n
Sampling Distribution of
Sample Proportion
X
n
where :
p
Sampling Distribution
Approximately normal if nP > 5 and nQ > 5 (P is the
population proportion and Q = 1 - P.)
The mean of the distribution is P.
The standard deviation of the distribution is
p
pq
n
Estimation
Statistical Inference
Statistical inference is the process by which
we acquire information and draw conclusions
about populations from samples.
Statistics
Information
Data
Population
Sample
Inference
Statistic
Parameter
Estimation
There are two types of inference:
estimation and hypothesis testing;
estimation is introduced first.
The objective of estimation is to determine
the approximate value of a population
parameter on the basis of a sample
statistic.
E.g., the sample mean ( x ) is employed to
estimate the population mean ()
Estimation
The objective of estimation is to determine the
approximate value of a population parameter
on the basis of a sample statistic.
There are two types of estimators:
Point Estimator
Interval Estimator
Point Estimator
A point estimator draws inferences about a
population by estimating the value of an
unknown parameter using a single value or
point.
Point Estimator
Point probabilities in continuous distributions
were virtually zero.
Point estimator gets closer to the parameter
value with an increased sample size, but point
estimators dont reflect the effects of larger
sample sizes.
Hence we will employ the interval estimator to
estimate population parameters
Interval Estimator
An interval estimator draws inferences about a
population by estimating the value of an
unknown parameter using an interval.
point estimate
interval estimate
Estimator
An estimator of a population parameter is a sample statistic
used to estimate the parameter. The most commonly-used
estimator of the:
Population Parameter
Mean ()
Variance ( 2 )
Standard Deviation ()
Proportion (p)
Sample Statistic
is the
is the
is the
is the
Mean ( X )
Variance (s2 )
Standard Deviation (s)
Proportion ( p )
n
will have a standard normal (or
approximately normal) distribution.
P ( z
x z
n
) 1
n
the sample mean
is in the center of
the interval
x z 2
x z 2
, x z 2
n
n
n
x z 2
is called the
upper confidence
limit (UCL)
, x z 2
x z 2
n
n
n
lower
confidence
limit (LCL)
Graphically
here is the confidence interval for
x
x z
2z
width
x z
Graphically
the actual location of the population mean
may be here
or here
0.10
0.05
0.02
0.01
0.05
0.025
0.01
0.005
/2
1.645
1.96
2.33
2.575
Interval Width
The width of the confidence interval estimate is a
function of the confidence level, the population
standard deviation, and the sample size
x z 2
Interval Width
The width of the confidence interval estimate is
a function of the confidence level, the
population standard deviation, and the sample
size
x z 2
Interval Width
The width of the confidence interval estimate
is a function of the confidence level, the
population standard deviation, and the sample
size
x z 2
Larger values of
produce wider
confidence intervals
Interval Width
The width of the confidence interval estimate is
a function of the confidence level, the
population standard deviation, and the sample
size
x z 2
Since: x z 2
It follows that
=5
.5
.5
1
2
1
2
Probability Interpretation
of the Level of Confidence
Pr ob[ X Z
X Z
] 1
.025
.025
95%
.4750
.4750
-1.96
1.96
X Z
n
n
46
46
153 1.96
153 1.96
85
85
153 9.78 153 9.78
143.22 162.78
X
X
X
X
X
X
X
Example
X 10.455, 7.7, and n 44.
90% confidence Z 1645
.
X Z
X Z
n
n
7.7
7.7
10.455 1.645
10.455 1.645
44
44
10.455 1.91 10.455 1.91
8.545 12.365
Example
X 34.3, 8, N = 800 and n 50.
98% confidence Z 2.33
X Z
N n
X Z
N 1
n
N n
N 1
8 800 50
8 800 50
34.3 2.33
34.3 2.33
50 800 1
50 800 1
34.3 2.554 34.3 2.554
3175
. 36.85
S
X Z
n
or
S
S
X Z
X Z
n
n
2
Example
X 85.5, S 19.3, and n 110.
99% confidence Z 2.575
S
X Z
n
19 .3
85 .5 2 .575
110
85 .5 4 .7
80 .8
S
X Z
n
19 .3
85 .5 2 .575
110
85 .5 4 .7
90 .2
t-distribution
The t Distribution
Developed by British statistician, William
Gosset
A family of distributions -- a unique
distribution for each value of its parameter,
degrees of freedom (d.f.)
Symmetric, Unimodal, Mean = 0, Flatter
than a Z
X
t formula t
S
n
Degrees of freedom
Example
No. of values we can choose freely.
-3
-2
-1
T-table
df
1
2
3
4
5
6.314
2.920
2.353
2.132
2.015
12.706
4.303
3.182
2.776
2.571
31.821
6.965
4.541
3.747
3.365
63.656
9.925
5.841
4.604
4.032
1.714
25
1.319
1.318
1.316
1.708
2.069
2.064
2.060
2.500
2.492
2.485
2.807
2.797
2.787
29
30
1.311
1.310
1.699
1.697
2.045
2.042
2.462
2.457
2.756
2.750
40
60
120
1.303
1.296
1.289
1.282
1.684
1.671
1.658
1.645
2.021
2.000
1.980
1.960
2.423
2.390
2.358
2.327
2.704
2.660
2.617
2.576
23
24
1.711
S
X t
n
or
S
S
X t
X t
n
n
df n 1
Example
X 2 .1 4 , S 1.2 9 , n 1 4 , d f n 1 1 3
1 .9 9
0 .0 0 5
2
2
t .0 0 5 ,1 3 3.0 1 2
S
X t
n
1.2 9
2 .1 4 3.0 1 2
14
2 .1 4 1.0 4
1.1 0
S
X t
n
1.2 9
2 .1 4 3.0 1 2
14
2 .1 4 1.0 4
3.1 8
S
X t
n
1.2 9
2 .1 4 3.0 1 2
14
2 .1 4 1.0 4
3.1 8
Pr ob[110
. 318
. ] 0.99
Chi-Square distribution
Population Variance
Variance is an inverse measure of the groups
homogeneity.
Variance is an important indicator of total quality in
standardized products and services.
Managers improve processes to reduce variance.
Variance is a measure of financial risk. Variance of
rates of return help managers assess financial and
capital investment alternatives.
Variability is a reality in global markets. Productivity,
wages, and costs of living vary between regions and
nations.
X X
n 1
1
S
degrees of freedom = n - 1
n 1 S
n 1 S
df n 1
1 level of confidence
Selected 2 Distributions
df = 3
df = 5
df = 10
2 Table
df
0.975
0.950
1 9.82068E-04 3.93219E-03
2
0.0506357
0.102586
3
0.2157949
0.351846
4
0.484419
0.710724
5
0.831209
1.145477
6
1.237342
1.63538
7
1.689864
2.16735
8
2.179725
2.73263
9
2.700389
3.32512
10
3.24696
3.94030
0.100
2.70554
4.60518
6.25139
7.77943
9.23635
10.6446
12.0170
13.3616
14.6837
15.9872
0.050
3.84146
5.99148
7.81472
9.48773
11.07048
12.5916
14.0671
15.5073
16.9190
18.3070
0.025
5.02390
7.37778
9.34840
11.14326
12.83249
14.4494
16.0128
17.5345
19.0228
20.4832
20
21
22
23
24
25
9.59077
10.28291
10.9823
11.6885
12.4011
13.1197
10.8508
11.5913
12.3380
13.0905
13.8484
14.6114
28.4120
29.6151
30.8133
32.0069
33.1962
34.3816
31.4104
32.6706
33.9245
35.1725
36.4150
37.6525
34.1696
35.4789
36.7807
38.0756
39.3641
40.6465
70
80
90
100
48.7575
57.1532
65.6466
74.2219
51.7393
60.3915
69.1260
77.9294
85.5270
96.5782
107.5650
118.4980
90.5313
101.8795
113.1452
124.3421
95.0231
106.6285
118.1359
129.5613
df = 5
0.10
10
15
20
9.23635
With df = 5 and =
0.10, 2 = 9.23635
2.16735
10
12
14
16
18
20
14.0671
df
1
2
3
4
5
6
7
8
9
10
0.950
3.93219E-03
0.102586
0.351846
0.710724
1.145477
1.63538
2.16735
2.73263
3.32512
3.94030
0.050
3.84146
5.99148
7.81472
9.48773
11.07048
12.5916
14.0671
15.5073
16.9190
18.3070
20
21
22
23
24
25
10.8508
11.5913
12.3380
13.0905
13.8484
14.6114
31.4104
32.6706
33.9245
35.1725
36.4150
37.6525
.1
2
.05
2
1
2
1
.1
2
n 1 S
2
.95
2.16735
8 1 . 0 0 2 2 1 2 5
1 4 .0 6 7 1
.0 0 1 1 0 1
n 1 S
2
1
8 1 . 0 0 2 2 1 2 5
2 .1 6 7 3 5
.0 0 7 1 4 6
1 .2544 , n 25 , df n 1 24 , .05
2
1
2
.05
2
2
1
.05
2
2
.025
n 1 S 2
39 .3641
2
.975
12 .4011
25 1(1 .2544 )
0 .7648
39 .3641
n 1 S 2
25 1(1 .2544 )
12 .4011
2 .4277
Error of Estimation
(tolerable error)
Estimated Sample
Size
Estimated
E X
Z
2
1
range
4
Z
2
2
2
E
(
1645
.
)
(
4
)
1
2
43.30 or 44
Z
E
(
196
.
)
(
6
.
25
)
2
2
37.52 or 38
p P
Z
P Q
n
E p P
Estimated Sample
Size
PQ
Z
n
E
2
PQ
Z
n
E
0.40 0.60
(
2
.
33
)
.003
2
1,447.7 or 1,448
PQ
Z
n
E
0.50 0.50
(
1645
.
)
.05
2
270.6 or 271