Vous êtes sur la page 1sur 48

Inference, condence intervals and sample size determination

Applied Marketing (Market Research Methods) Topic 6: Inference, condence intervals and sample size determination
Dr James Abdey

Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Overview
Here we consider sample size determination in simple random sampling Properties of the sampling distribution are discussed We describe the required adjustments to statistically determined sample sizes to account for incidence and completion rates Non-response issues in sampling are also covered, with ways of improving response rates

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Choosing a sample size


The question How big a sample do I need to take? is a common one when sampling data The answer to this depends on the quality of inference that the researcher requires from the data In the estimation context this can be expressed in terms of the accuracy of estimation If the researcher requires that there should be a 95% chance that the estimation error should be no bigger than d units (we refer to d as the tolerance), then this is equivalent to having a 95% condence interval of width 2d Note here d represents the half-width of the condence interval since the point estimate is, by construction, at the centre of the condence interval

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Simple random sampling (SRS)


Recall a simple random sample is a sample selected by a process such that every possible sample (of the same size, n) has the same probability of selection The selection process is left to chance, thus eliminating the effect of selection bias Due to the random selection mechanism, we do not know (in advance) which sample will occur Every population element has a known, non-zero probability of selection in the sample but no element is certain to appear

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Simple random sampling (SRS) Example


Consider a population of size N = 6 elements: A, B, C, D, E and F We consider all possible samples of size n = 2 (without replacement) There are 15 different, but equally likely, such samples: AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF Since this is SRS, each sample has a probability of selection of 1/15

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Estimation
A population has particular characteristics of interest such as the mean, variance etc. Collectively we refer to these characteristics as parameters If we do not have population data, the parameter values will be unknown Statistical inference is the process of estimating the (unknown) parameter values using the (known) sample data We use a statistic (estimator) calculated from sample observations to provide a point estimate

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Estimation Example
Returning to our example, recall there are 15 different samples of size 2 from a population of size 6 Suppose the variable of interest is income A B C D E Individual Income in 000s 3 6 4 9 7

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X

F 7

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

If we seek the population mean, , we will use the , as our estimator sample mean, X =1 X n
n

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size

Xi
i =1

Adjusting for non-response

For example, if the observed sample was AB, the sample mean is (3000 + 6000)/2 = 4,500

Estimation Example
Clearly, different observed samples will lead to different sample means for all possible samples (in 000s): Consider X Sample Values X Sample Values X AB 36 4.5 BF 67 6.5 AC 34 3.5 CD 49 6.5 AD 39 6 CE 47 5.5 AE 37 5 CF 47 5.5 AF 37 5 DE 97 8 BC 64 5 DF 97 8 BD 69 7.5 EF 77 7 BE 67 6.5

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

values vary from 3.5 to 8, depending on the So X sample values

Sampling distribution of X
The previous slide showed all possible values of the estimator X Since we have the population data here, we can actually compute the population mean (in 000s) 1 = N
N i =1

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem

3+6+4+9+7+7 Xi = =6 6

Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

values far from So even with SRS, we obtain some X = Here only one sample (AD) results in X

Sampling distribution of X
| Lets now consider the maximum | X

Inference, condence intervals and sample size determination Dr James Abdey

Overview

| max | X 0 0.5 1 1.5 2 2.5

Choosing a sample size

Number of samples 1 6 10 12 14 15

Probability 0.067 0.400 0.667 0.800 0.933 1.000

Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

So, for example, there is an 80% chance of being within 1.5 units of

Sampling distribution of X
We now represent this as a frequency distribution That is, we record the frequency of each possible value of X Frequency Relative frequency X 3.5 1 1/15 = 0.067 4.5 1 1/15 = 0.067 3 3/15 = 0.200 5.0 5.5 2 2/15 = 0.133 6.0 1 1/15 = 0.067 3 3/15 = 0.200 6.5 7.0 1 1/15 = 0.067 7.5 1 1/15 = 0.067 8.0 2 2/15 = 0.133 This is known as the sampling distribution of X

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution of X
The sampling distribution is a central and vital concept in statistics It can be used to evaluate how good an estimator is Specically, we care about how close the estimator is to the population parameter of interest As we have seen, different samples yield different X values, as a consequence of the random sampling procedure is an example) are Hence estimators (of which X random variables is our estimator of So, X is a point estimate The observed value of X

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties

Inference, condence intervals and sample size determination Dr James Abdey

Like any distribution, we care about a sampling distributions mean and variance Together, we can assess how good an estimator is First, consider the mean we seek an estimator which does not mislead us systematically So the average (mean) value of an estimator, over all possible samples, should be equal to the population parameter

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties


Returning to our example: Frequency X 3.5 1 1 4.5 5.0 3 2 5.5 6.0 1 3 6.5 7.0 1 7.5 1 8.0 2 Total 15

Inference, condence intervals and sample size determination Dr James Abdey

Product 3.5 4.5 15.0 11.0 6.0 19.5 7.0 7.5 16.0 90.0

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Hence the mean of this sampling distribution is 90/15 =6

Sampling distribution properties


An important difference between a sampling distribution and other distributions is that the values in a sampling distribution are summary measures of whole samples (i.e. statistics/estimators) rather than individual observations Formally, the mean of a sampling distribution is called the expected value of the estimator, denoted by E[] Hence the expected value of the sample mean is ] E[X An unbiased estimator has its expected value equal to the parameter being estimated ] = 6 = For our example, E[X

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties

Inference, condence intervals and sample size determination Dr James Abdey

is always an Fortunately the sample mean X unbiased estimator in SRS, regardless of:
the sample size, n the distribution of the (parent) population

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size

This is a good illustration of a population parameter, , being estimated by its sample counterpart, X

Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties


The unbiasedness of an estimator is clearly desirable, however we also need to take into account the dispersion of the estimators sampling distribution Ideally, the possible values of the estimator should not vary much around the true parameter value So, we seek an estimator with a small variance Recall the variance is dened to be the mean of the squared deviations about the mean of the distribution In the case of sampling distributions, it is referred to as the sampling variance

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties


Returning to our example: )2 (X X X 3.5 2.5 6.25 4.5 1.5 2.25 5.0 1.0 1.00 5.5 0.5 0.25 0.0 0.00 6.0 6.5 0.5 0.25 7.0 1.0 1.00 7.5 1.5 2.25 2.0 4.00 8.0 Total

Inference, condence intervals and sample size determination Dr James Abdey

Frequency 1 1 3 2 1 3 1 1 2 15

Product 6.25 2.25 3.00 0.50 0.00 1.75 1.00 2.25 8.00 24.00

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Hence sampling variance is 24/15 = 1.6

Sampling distribution properties

Inference, condence intervals and sample size determination Dr James Abdey

The population itself has a variance the population variance, 2 X 3 6 4 9 7 X 3 0 2 3 1 (X 9 0 4 9 1 ) 2 Frequency 1 1 1 1 2 Product 9 0 4 9 2

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Hence the population variance is 2 = 24/6 = 4

Sampling distribution properties


We now consider the relationship between 2 and the sampling variance Intuitively, a larger 2 should lead to a larger sampling variance why? For population size N and sample size n,
2 ) = N n Var(X N 1 n

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

So for our example, ) = 6 2 4 = 1.6 Var(X 61 2 We use the term standard error to refer to the standard deviation of the sampling distribution, ) = S.E.(X ) = Var(X N n 2 = X N 1 n

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution properties

Inference, condence intervals and sample size determination Dr James Abdey

Implications:
Overview Choosing a sample size

as the sample size, n, increases, the sampling variance decreases, i.e. the precision increases1 provided the sampling fraction, n/N , is small, the term N n 1 N 1 so can be ignored the precision depends effectively on n only

Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Although greater precision is desirable, data collection costs will rise with n (remember why we sample in the rst place!)

Sample size and sampling fraction


The larger the sample, the less variability there will be between samples n=2 n=4 X 3.50 1 4.50 1 5.00 3 2 5.25 1 2 1 5.50 5.75 3 6.00 1 1 6.25 2 6.50 3 6.75 1 7.00 1 7.25 1 1 7.50 8.00 2

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sample size and sampling fraction


There is a striking improvement in the precision of the estimator The variability has decreased considerably values goes from 3.5 to 8.0 Range of possible X down to 5.0 to 7.25 The sampling variance is reduced from 1.6 to 0.4 Note precision in statistics refers to the inverse of the sampling variance

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sample size and sampling fraction


The factor
N n N 1

Inference, condence intervals and sample size determination Dr James Abdey

decreases steadily as n N

When n = 1 the factor equals 1, and when n = N it equals 0 Sampling without replacement, increasing n must increase precision since less of the population is left out In much practical sampling N is very large (e.g. several million), while n is comparably small (e.g. at most 1,000, say) Therefore in such cases the factor negligible, hence ) = Var(X
N n N 1

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

becomes

N n 2 2 for small n/N N 1 n n

Sample size and sampling fraction


n/N is called the sampling fraction When N is large, it is the sample size n which is important in determining precision, not the sampling fraction Consider two populations: N1 = 3 million and N2 = 200 million, both with the same variance 2 We sample n1 = n2 = 1, 000 from each population, then
2 X

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

= =

2 X

N1 n1 = (0.999667) N1 1 n1 1000 2 N2 n2 2 = (0.999995) N2 1 n2 1000


2

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

2 2 , despite N << N So X 1 2 X
1

Central Limit Theorem


When sampling from (almost) any non-normal : distribution, for sufciently large n, X
1. is approximately normally distributed 2. has mean 2 3. has variance n and standard error n

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction

The approximation is reasonable for n at least 30, as a rule-of-thumb Though because this is an asymptotic approximation (i.e. as n ), the bigger n is, the better the normal approximation Special case: if the population distribution is itself will have an exact Normal distribution for Normal, X any sample size n

Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Central Limit Theorem


for small Below is the sampling distribution of X (red) and large (black) n As n increases, the sampling variability of X decreases
Sampling Distribution of Sample Mean
0.4

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

0.3

Construction: CI for X Variance Unknown


Density 0.2
Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

0.0

0.1

0 Sample mean

Central Limit Theorem

Inference, condence intervals and sample size determination Dr James Abdey

Although the shape of the population distribution does not affect the generality of the CLT result, it does affect the speed of convergence of the to the Normal distribution sampling distribution of X Obviously a symmetric population distribution would converge faster in n In practice, n = 30 is usually adequate to make the Normal approximation reasonable

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Central Limit Theorem

Inference, condence intervals and sample size determination Dr James Abdey

Remember the CLT is based on SRS Without probability sampling methods, there is absolutely no basis for the use of the CLT This is principally why we insist on probability (random) sampling Otherwise the whole structure of statistical inference collapses!

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Central Limit Theorem

Inference, condence intervals and sample size determination Dr James Abdey

The CLT also makes the use of the variance more reasonable The Normal distribution is completely characterised by its mean and variance Hence it is sensible to focus attention on these two characteristics of the sampling distribution

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Principles of condence intervals

Inference, condence intervals and sample size determination Dr James Abdey

A point estimate is our best guess of an unknown population parameter based on sample data But as its based on a sample, there is some uncertainty/imprecision Condence intervals (CIs) communicate the level of imprecision

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Principles of condence intervals


Formally, an x % condence interval covers the unknown parameter with x % probability over repeated samples The shorter the condence interval, the more reliable the estimate As we shall see, this is achievable by:
reducing the level of condence increasing the sample size

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

We now look at how to construct CIs

Principles of condence intervals

Inference, condence intervals and sample size determination Dr James Abdey

The general format (for our purposes) for a condence interval is statistic (multiplier coefcient) standard error

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Alternatively, estimate margin of error

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance known)


= Point estimate for is calculated using X
n i =1

Inference, condence intervals and sample size determination Dr James Abdey

Xi
Overview Choosing a sample size

n
Estimation

Assuming the (population) variance 2 is known, the is standard error of X ) = = S.E.(X X N n 2 N 1 n n

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown

Hence a 95% condence interval for is 1.96 X = n 1.96 X , X + 1.96 n n

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance known)


This is a simple, but important result, forming a useful template Note the above interval was for 95% condence Other levels of condence pose no problem, but require a different multiplier coefcient When the variance is known we obtain a multiplier from the standard normal distribution For 90% condence, use the multiplier For 95% condence, use the multiplier For 99% condence, use the multiplier Hence a 99% condence interval for is 2.576 X = n 2.576 X , X + 2.576 n n 1.645 1.96 2.576 ( 2 )

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance known)


So we see that a higher level of condence (a good thing) leads to a larger multiplier coefcient, and hence a wider condence interval (a bad thing) Hence, other things equal, we face a trade-off between level of condence and width of condence interval Since the width of a CI is part-determined by the standard error, by increasing n (costly) we will reduce the standard error, hence shorten the CI (a good thing)

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance unknown)

Inference, condence intervals and sample size determination Dr James Abdey

Unfortunately, to use the approach just discussed requires knowledge of the population variance, 2 This is because it is used in the standard error: z X n In practice, we are unlikely to know 2 After all, its a population characteristic, and so if we do not know , why would we know 2 ?

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance unknown)


is Recall the sampling variance of X N n 2 2 ) = 2 Var(X = X N 1 n n But if 2 is unknown we have a problem It is not that we are fundamentally interested in 2 , only that we need to estimate it because the depends on it precision of X And there is little point having a point estimate if we know nothing about its precision

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

CI for (variance unknown)


) is Our estimate of Var(X
2 sX

Inference, condence intervals and sample size determination Dr James Abdey

s2 N n s2 = N n n
n

Overview Choosing a sample size

where 1 s2 = n1
n i =1

Estimation

Sampling distribution of X

1 )2 = (xi x n1

Sampling distribution properties

2 xi2 nx
i =1

Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Our estimate of the standard error is thus sx = N n s2 s N n n

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

n 1 in the social sciences since typically NN Once we have estimated this, we proceed as before to construct a CI using the estimate of the standard error in place of the actual standard error

CI for (variance unknown)


So, for a 90% condence interval we use s 1.645 x n Similarly, for a 95% condence interval we use s 1.96 x n Finally, for a 99% condence interval we use s 2.576 x n

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Choosing sample size

Inference, condence intervals and sample size determination Dr James Abdey

Note the trade-off between accuracy and data cost Solution: x desired precision and nd smallest n which achieves this If we want the sample mean to be within a tolerance d of with a specied probability, then d =z n = z 22 n= d2
Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

n is the minimum sample size required to achieve the desired precision n must be an integer, so always round up!

Choosing sample size Example


A random sample is to be taken from a population with unknown mean and = 3 How big a sample size would be needed if there is to being within 1 unit of ? be a 95% chance of X The sample size n required for a tolerance of 1 satises 3 1 = 1.96 n = n = 34.57 = n = 35

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size

Note that the required sample size in this type of calculation needs to be rounded up from a decimal fraction, since rounding down would result in a value not quite large enough!

Adjusting for non-response

Adjusting the statistically determined sample size


Incidence rate refers to the rate of occurrence, or the percentage, of persons eligible to participate in the study In general, if there are k qualifying factors with an incidence of Q1 , Q2 , Q3 , . . ., Qk , each expressed as a proportion: Incidence rate = Q1 Q2 Q3 . . . Qk The completion rate is the percentage of qualied respondents who complete the interview, enabling researchers to account for anticipated refusals by people who qualify Initial sample size = Final sample size Incidence rate Completion rate

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Adjusting for non-response


Sub-sampling of non-respondents the researcher contacts a sub-sample of the non-respondents, usually by means of telephone or personal interviews In replacement, the non-respondents in the current survey are replaced with non-respondents from an earlier, similar survey The researcher attempts to contact these non-respondents from the earlier survey and administer the current survey questionnaire to them, possibly by offering a suitable incentive

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Adjusting for non-response


In substitution, the researcher substitutes for non-respondents other elements from the sampling frame that are expected to respond The sampling frame is divided into sub-groups that are internally homogeneous in terms of respondent characteristics but heterogeneous in terms of response rates These sub-groups are then used to identify substitutes who are similar to particular non-respondents but dissimilar to respondents already in the sample

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Adjusting for non-response

Inference, condence intervals and sample size determination Dr James Abdey

Subjective estimates when it is no longer feasible to increase the response rate by sub-sampling, replacement, or substitution, it may be possible to arrive at subjective estimates of the nature and effect of non-response bias This involves evaluating the likely effects of non-response based on experience and available information

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Adjusting for non-response


Weighting attempts to account for non-response by assigning differential weights to the data depending on the response rates For example, in a survey the response rates were 85%, 70% and 40%, respectively, for the high-, medium- and low-income groups In analysing the data, these sub-groups are assigned weights inversely proportional to their response rates That is, the weights assigned would be (100/85), (100/70) and (100/40), respectively, for the highmedium- and low-income groups

Inference, condence intervals and sample size determination Dr James Abdey

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Imputation

Inference, condence intervals and sample size determination Dr James Abdey

Imputation involves imputing, or assigning, the characteristic of interest to the non-respondents based on the similarity of the variables available for both non-respondents and respondents For example, a respondent who does not report brand usage may be imputed the usage of a respondent with similar demographic characteristics

Overview Choosing a sample size Estimation

Sampling distribution of X
Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Construction: CI for X Variance Unknown


Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response