0 évaluation0% ont trouvé ce document utile (0 vote)

18 vues48 pagesJul 23, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT ou lisez en ligne sur Scribd

Attribution Non-Commercial (BY-NC)

0 évaluation0% ont trouvé ce document utile (0 vote)

18 vues48 pagesAttribution Non-Commercial (BY-NC)

Vous êtes sur la page 1sur 48

Applied Marketing (Market Research Methods) Topic 6: Inference, condence intervals and sample size determination

Dr James Abdey

Dr James Abdey

Sampling distribution of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Overview

Here we consider sample size determination in simple random sampling Properties of the sampling distribution are discussed We describe the required adjustments to statistically determined sample sizes to account for incidence and completion rates Non-response issues in sampling are also covered, with ways of improving response rates

Sampling distribution of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

The question How big a sample do I need to take? is a common one when sampling data The answer to this depends on the quality of inference that the researcher requires from the data In the estimation context this can be expressed in terms of the accuracy of estimation If the researcher requires that there should be a 95% chance that the estimation error should be no bigger than d units (we refer to d as the tolerance), then this is equivalent to having a 95% condence interval of width 2d Note here d represents the half-width of the condence interval since the point estimate is, by construction, at the centre of the condence interval

Sampling distribution of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Recall a simple random sample is a sample selected by a process such that every possible sample (of the same size, n) has the same probability of selection The selection process is left to chance, thus eliminating the effect of selection bias Due to the random selection mechanism, we do not know (in advance) which sample will occur Every population element has a known, non-zero probability of selection in the sample but no element is certain to appear

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Consider a population of size N = 6 elements: A, B, C, D, E and F We consider all possible samples of size n = 2 (without replacement) There are 15 different, but equally likely, such samples: AB, AC, AD, AE, AF, BC, BD, BE, BF, CD, CE, CF, DE, DF, EF Since this is SRS, each sample has a probability of selection of 1/15

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Estimation

A population has particular characteristics of interest such as the mean, variance etc. Collectively we refer to these characteristics as parameters If we do not have population data, the parameter values will be unknown Statistical inference is the process of estimating the (unknown) parameter values using the (known) sample data We use a statistic (estimator) calculated from sample observations to provide a point estimate

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Estimation Example

Returning to our example, recall there are 15 different samples of size 2 from a population of size 6 Suppose the variable of interest is income A B C D E Individual Income in 000s 3 6 4 9 7

Sampling distribution of X

F 7

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

If we seek the population mean, , we will use the , as our estimator sample mean, X =1 X n

n

Choosing sample size Adjusting the statistically determined sample size

Xi

i =1

For example, if the observed sample was AB, the sample mean is (3000 + 6000)/2 = 4,500

Estimation Example

Clearly, different observed samples will lead to different sample means for all possible samples (in 000s): Consider X Sample Values X Sample Values X AB 36 4.5 BF 67 6.5 AC 34 3.5 CD 49 6.5 AD 39 6 CE 47 5.5 AE 37 5 CF 47 5.5 AF 37 5 DE 97 8 BC 64 5 DF 97 8 BD 69 7.5 EF 77 7 BE 67 6.5

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution of X

The previous slide showed all possible values of the estimator X Since we have the population data here, we can actually compute the population mean (in 000s) 1 = N

N i =1

Sampling distribution of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem

3+6+4+9+7+7 Xi = =6 6

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

values far from So even with SRS, we obtain some X = Here only one sample (AD) results in X

Sampling distribution of X

| Lets now consider the maximum | X

Overview

Number of samples 1 6 10 12 14 15

Estimation

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

So, for example, there is an 80% chance of being within 1.5 units of

Sampling distribution of X

We now represent this as a frequency distribution That is, we record the frequency of each possible value of X Frequency Relative frequency X 3.5 1 1/15 = 0.067 4.5 1 1/15 = 0.067 3 3/15 = 0.200 5.0 5.5 2 2/15 = 0.133 6.0 1 1/15 = 0.067 3 3/15 = 0.200 6.5 7.0 1 1/15 = 0.067 7.5 1 1/15 = 0.067 8.0 2 2/15 = 0.133 This is known as the sampling distribution of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sampling distribution of X

The sampling distribution is a central and vital concept in statistics It can be used to evaluate how good an estimator is Specically, we care about how close the estimator is to the population parameter of interest As we have seen, different samples yield different X values, as a consequence of the random sampling procedure is an example) are Hence estimators (of which X random variables is our estimator of So, X is a point estimate The observed value of X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Like any distribution, we care about a sampling distributions mean and variance Together, we can assess how good an estimator is First, consider the mean we seek an estimator which does not mislead us systematically So the average (mean) value of an estimator, over all possible samples, should be equal to the population parameter

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Returning to our example: Frequency X 3.5 1 1 4.5 5.0 3 2 5.5 6.0 1 3 6.5 7.0 1 7.5 1 8.0 2 Total 15

Product 3.5 4.5 15.0 11.0 6.0 19.5 7.0 7.5 16.0 90.0

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

An important difference between a sampling distribution and other distributions is that the values in a sampling distribution are summary measures of whole samples (i.e. statistics/estimators) rather than individual observations Formally, the mean of a sampling distribution is called the expected value of the estimator, denoted by E[] Hence the expected value of the sample mean is ] E[X An unbiased estimator has its expected value equal to the parameter being estimated ] = 6 = For our example, E[X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

is always an Fortunately the sample mean X unbiased estimator in SRS, regardless of:

the sample size, n the distribution of the (parent) population

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size

This is a good illustration of a population parameter, , being estimated by its sample counterpart, X

The unbiasedness of an estimator is clearly desirable, however we also need to take into account the dispersion of the estimators sampling distribution Ideally, the possible values of the estimator should not vary much around the true parameter value So, we seek an estimator with a small variance Recall the variance is dened to be the mean of the squared deviations about the mean of the distribution In the case of sampling distributions, it is referred to as the sampling variance

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Returning to our example: )2 (X X X 3.5 2.5 6.25 4.5 1.5 2.25 5.0 1.0 1.00 5.5 0.5 0.25 0.0 0.00 6.0 6.5 0.5 0.25 7.0 1.0 1.00 7.5 1.5 2.25 2.0 4.00 8.0 Total

Frequency 1 1 3 2 1 3 1 1 2 15

Product 6.25 2.25 3.00 0.50 0.00 1.75 1.00 2.25 8.00 24.00

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

The population itself has a variance the population variance, 2 X 3 6 4 9 7 X 3 0 2 3 1 (X 9 0 4 9 1 ) 2 Frequency 1 1 1 1 2 Product 9 0 4 9 2

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

We now consider the relationship between 2 and the sampling variance Intuitively, a larger 2 should lead to a larger sampling variance why? For population size N and sample size n,

2 ) = N n Var(X N 1 n

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

So for our example, ) = 6 2 4 = 1.6 Var(X 61 2 We use the term standard error to refer to the standard deviation of the sampling distribution, ) = S.E.(X ) = Var(X N n 2 = X N 1 n

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Implications:

Overview Choosing a sample size

as the sample size, n, increases, the sampling variance decreases, i.e. the precision increases1 provided the sampling fraction, n/N , is small, the term N n 1 N 1 so can be ignored the precision depends effectively on n only

Estimation

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Although greater precision is desirable, data collection costs will rise with n (remember why we sample in the rst place!)

The larger the sample, the less variability there will be between samples n=2 n=4 X 3.50 1 4.50 1 5.00 3 2 5.25 1 2 1 5.50 5.75 3 6.00 1 1 6.25 2 6.50 3 6.75 1 7.00 1 7.25 1 1 7.50 8.00 2

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

There is a striking improvement in the precision of the estimator The variability has decreased considerably values goes from 3.5 to 8.0 Range of possible X down to 5.0 to 7.25 The sampling variance is reduced from 1.6 to 0.4 Note precision in statistics refers to the inverse of the sampling variance

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

The factor

N n N 1

decreases steadily as n N

When n = 1 the factor equals 1, and when n = N it equals 0 Sampling without replacement, increasing n must increase precision since less of the population is left out In much practical sampling N is very large (e.g. several million), while n is comparably small (e.g. at most 1,000, say) Therefore in such cases the factor negligible, hence ) = Var(X

N n N 1

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

becomes

n/N is called the sampling fraction When N is large, it is the sample size n which is important in determining precision, not the sampling fraction Consider two populations: N1 = 3 million and N2 = 200 million, both with the same variance 2 We sample n1 = n2 = 1, 000 from each population, then

2 X

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

= =

2 X

2

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

2 2 , despite N << N So X 1 2 X

1

When sampling from (almost) any non-normal : distribution, for sufciently large n, X

1. is approximately normally distributed 2. has mean 2 3. has variance n and standard error n

Sampling distribution of X

Sampling distribution properties Sample size and sampling fraction

The approximation is reasonable for n at least 30, as a rule-of-thumb Though because this is an asymptotic approximation (i.e. as n ), the bigger n is, the better the normal approximation Special case: if the population distribution is itself will have an exact Normal distribution for Normal, X any sample size n

Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

for small Below is the sampling distribution of X (red) and large (black) n As n increases, the sampling variability of X decreases

Sampling Distribution of Sample Mean

0.4

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

0.3

Density 0.2

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

0.0

0.1

0 Sample mean

Although the shape of the population distribution does not affect the generality of the CLT result, it does affect the speed of convergence of the to the Normal distribution sampling distribution of X Obviously a symmetric population distribution would converge faster in n In practice, n = 30 is usually adequate to make the Normal approximation reasonable

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Remember the CLT is based on SRS Without probability sampling methods, there is absolutely no basis for the use of the CLT This is principally why we insist on probability (random) sampling Otherwise the whole structure of statistical inference collapses!

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

The CLT also makes the use of the variance more reasonable The Normal distribution is completely characterised by its mean and variance Hence it is sensible to focus attention on these two characteristics of the sampling distribution

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

A point estimate is our best guess of an unknown population parameter based on sample data But as its based on a sample, there is some uncertainty/imprecision Condence intervals (CIs) communicate the level of imprecision

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Formally, an x % condence interval covers the unknown parameter with x % probability over repeated samples The shorter the condence interval, the more reliable the estimate As we shall see, this is achievable by:

reducing the level of condence increasing the sample size

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

The general format (for our purposes) for a condence interval is statistic (multiplier coefcient) standard error

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

= Point estimate for is calculated using X

n i =1

Xi

Overview Choosing a sample size

n

Estimation

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

This is a simple, but important result, forming a useful template Note the above interval was for 95% condence Other levels of condence pose no problem, but require a different multiplier coefcient When the variance is known we obtain a multiplier from the standard normal distribution For 90% condence, use the multiplier For 95% condence, use the multiplier For 99% condence, use the multiplier Hence a 99% condence interval for is 2.576 X = n 2.576 X , X + 2.576 n n 1.645 1.96 2.576 ( 2 )

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

So we see that a higher level of condence (a good thing) leads to a larger multiplier coefcient, and hence a wider condence interval (a bad thing) Hence, other things equal, we face a trade-off between level of condence and width of condence interval Since the width of a CI is part-determined by the standard error, by increasing n (costly) we will reduce the standard error, hence shorten the CI (a good thing)

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Unfortunately, to use the approach just discussed requires knowledge of the population variance, 2 This is because it is used in the standard error: z X n In practice, we are unlikely to know 2 After all, its a population characteristic, and so if we do not know , why would we know 2 ?

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

is Recall the sampling variance of X N n 2 2 ) = 2 Var(X = X N 1 n n But if 2 is unknown we have a problem It is not that we are fundamentally interested in 2 , only that we need to estimate it because the depends on it precision of X And there is little point having a point estimate if we know nothing about its precision

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

) is Our estimate of Var(X

2 sX

s2 N n s2 = N n n

n

where 1 s2 = n1

n i =1

Estimation

Sampling distribution of X

1 )2 = (xi x n1

2 xi2 nx

i =1

Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

n 1 in the social sciences since typically NN Once we have estimated this, we proceed as before to construct a CI using the estimate of the standard error in place of the actual standard error

So, for a 90% condence interval we use s 1.645 x n Similarly, for a 95% condence interval we use s 1.96 x n Finally, for a 99% condence interval we use s 2.576 x n

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Note the trade-off between accuracy and data cost Solution: x desired precision and nd smallest n which achieves this If we want the sample mean to be within a tolerance d of with a specied probability, then d =z n = z 22 n= d2

Overview Choosing a sample size Estimation

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

n is the minimum sample size required to achieve the desired precision n must be an integer, so always round up!

A random sample is to be taken from a population with unknown mean and = 3 How big a sample size would be needed if there is to being within 1 unit of ? be a 95% chance of X The sample size n required for a tolerance of 1 satises 3 1 = 1.96 n = n = 34.57 = n = 35

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size

Note that the required sample size in this type of calculation needs to be rounded up from a decimal fraction, since rounding down would result in a value not quite large enough!

Incidence rate refers to the rate of occurrence, or the percentage, of persons eligible to participate in the study In general, if there are k qualifying factors with an incidence of Q1 , Q2 , Q3 , . . ., Qk , each expressed as a proportion: Incidence rate = Q1 Q2 Q3 . . . Qk The completion rate is the percentage of qualied respondents who complete the interview, enabling researchers to account for anticipated refusals by people who qualify Initial sample size = Final sample size Incidence rate Completion rate

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Sub-sampling of non-respondents the researcher contacts a sub-sample of the non-respondents, usually by means of telephone or personal interviews In replacement, the non-respondents in the current survey are replaced with non-respondents from an earlier, similar survey The researcher attempts to contact these non-respondents from the earlier survey and administer the current survey questionnaire to them, possibly by offering a suitable incentive

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

In substitution, the researcher substitutes for non-respondents other elements from the sampling frame that are expected to respond The sampling frame is divided into sub-groups that are internally homogeneous in terms of respondent characteristics but heterogeneous in terms of response rates These sub-groups are then used to identify substitutes who are similar to particular non-respondents but dissimilar to respondents already in the sample

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Subjective estimates when it is no longer feasible to increase the response rate by sub-sampling, replacement, or substitution, it may be possible to arrive at subjective estimates of the nature and effect of non-response bias This involves evaluating the likely effects of non-response based on experience and available information

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Weighting attempts to account for non-response by assigning differential weights to the data depending on the response rates For example, in a survey the response rates were 85%, 70% and 40%, respectively, for the high-, medium- and low-income groups In analysing the data, these sub-groups are assigned weights inversely proportional to their response rates That is, the weights assigned would be (100/85), (100/70) and (100/40), respectively, for the highmedium- and low-income groups

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

Imputation

Imputation involves imputing, or assigning, the characteristic of interest to the non-respondents based on the similarity of the variables available for both non-respondents and respondents For example, a respondent who does not report brand usage may be imputed the usage of a respondent with similar demographic characteristics

Sampling distribution properties Sample size and sampling fraction Central Limit Theorem Principle of condence intervals Construction: CI for X Variance Known

Choosing sample size Adjusting the statistically determined sample size Adjusting for non-response

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.