Vous êtes sur la page 1sur 30

EARTH SC \ ENVIR SC \ GEOG 3MB3 STATISTICAL ANALYSIS SECTION 4 INFERENTIAL STATISTICS (contd)

Two-Sample Difference of Means Tests


We may want to form hypotheses comparing two populations; does significant difference exist? Examples: Two similar cars are introduced at the same time with the same price. In after 5 years, have the two cars values depreciated the same amount? In China, do we find a significant difference between the number of children born to women in coastal regions (heavy policing of onechild policy) and inland regions (weak policing of one-child policy) ? Slight alterations of the one-sample difference of means Z and t test allow us to compare 2 populations

Two-Sample Difference of Means Z Test


Where: E(X1) is the mean of sample 1 E(X2) is the mean of sample 2 12 is the variance of sample 1 22 is the variance of sample 2 n1 is the size of sample 1 n2 is the size of sample 2

Like the one-sample Z-test, we use the two-sample difference of means Z-test when both n1 and n2 30

Two-Sample Difference of Means t Test


In most cases we do not know the variance of the populations, so we estimate it from sample variances (s2) using the two-sample difference of means t test:

Though the formula for the Z and t test look the same, the denominator of the t test is derived using the sample variances. There are two ways to do this:

1. Assume population variances are equal (12 = 22), and calculate a weighted average of the two sample variances called a pooled variance estimate (PVE) 2. Assume population variances are unequal (12 22), direct substitution of sample variances for population variances called a separate variance estimate (SVE)

Two-Sample Difference of Means t Test (contd)


Pooled Variance Estimate ( 12 = 22 )

Separate Variance Estimate ( 12 22 )

Two-Sample Difference of Means t Test Example


A researcher found the mean house price in Dundas and Ancaster from a record of housing sales from 2012

Ancaster A : $ 462, 579 Dundas D : $ 455, 891

n = 23 s = 35,000 s2 = 900,000,000 n = 17 s = 15,000 s2 = 225,000,000

Is there a significant difference between mean house price in Dundas and Ancaster? H0 : A = D HA : A > D

Two-Sample Difference of Means t Test Example (contd)


We cannot use the Z test because we do not know the variances of the populations the 2 samples were taken from. We were only given the sample variances. We assume the population variances to be unequal

Two-Sample Difference of Means t Test Example (contd) The t-value of 0.94 , according to the t table, corresponds to A = 0.3159 Therefore, the p-value is : H0 : A = D HA : A > D

p-value = 0.5000 0.3159 = 0.1841

This is a relatively high p-value. We can not reject H0 at both = 0.10 and = 0.05

Two-Sample Difference of Proportions Test


Used to compare two sample proportions for difference. Assumption: Variable being considered is binary (i.e. only 2 types of observation: yes-no, male-female

Where: p1 = proportion of sample 1 in category of focus p2 = proportion of sample 2 in category of focus = pooled estimate of the focus category for the population

We define the focus category as one of the two possible responses. Ex. Proportions in Sample 1: Yes 0.86 ; No 0.14 If we choose Yes as the focus category , we use 0.86 for calculations

Two-Sample Difference of Proportions Example


A sample was taken from a county regarding a proposed legislation. Participants were divided into two categories: rural and urban. We want to know if there is a significant difference of opinion between rural and urban citizens on the legislation.
Sample Proportion in Proportion Category Size (n) favour Against Urban Rural 79 44 0.63 0.59 0.37 0.41

H0 = purban = prural HA = purban prural

Two-Sample Difference of Proportions Example (contd)


Substitute the pooled estimate value into standard error of the difference equation

Put that expression into the Zp equation

Two-Sample Difference of Proportions Example (contd)


Zp = - 0.43714, which corresponds to A = 0.1700 p-value = [ 0.5000 (0.1700) ] x 2 = 0.3300 x 2 = 0.6600

H0 : purban = prural HA : purban prural

Multiply by 2 because we have a non-directional HA

This p-value of 0.6600 is very large. We cannot reject the null hypothesis that there is no difference between urban and rural opinions on the new legislation.

Matched-Pairs Tests
Matched-pairs tests are used to analyze dependent samples Dependent Samples : Samples that are related; results of one sample give information about other samples Examples:

1. Two measurements of the same participants non-commute driving distances before and after an oil crisis. Did driving distances decrease?
2. Random sample of men and women from same Mexican villages to determine the average male and female life expectancy for these villages. Do male and female life expectancies differ between villages?

Matched-Pairs Tests (contd)


Each sample observation has two values, which are known as a matched-pair:

In the first example, matched-pairs would be formed from each participants before and after distances. In the second, the life expectancies of men and women from the same village are dependent and constitute a matched-pair. This is because people in the same village are affected by the same social, economic, and environmental factors.

Matched-Pairs t Test
A parametric test which compares the mean differences of matched-pairs
Where: di = difference of matched-pair i E(d) = mean of matched-pair differences d = standard error of matched-pair differences sd = standard deviation of matched-pair differences

Matched-Pairs t Test Example


Lets perform a matched-pairs t test on the crop yield data The average difference, E(d), was found to be 1.5333 The term *di - E(d)]2 was found to be 177.73422
*[calculations are very space-consuming]

Matched-Pairs t Test Example (contd) The t statistic calculated was 1.67 This corresponds to A = 0.4444 p = 0.5000 0.4444 p = 0.0556

This is a relatively low p-value. It rejects the null hypothesis at = 0.10. It cannot reject the null hypothesis at = 0.05 and = 0.01

Parametric and Nonparametric Tests


Up to this point, we have made assumptions about the populations we have tested for differences in means and proportions: Population Parameters ( ,, and ) Populations are normally distributed with mean and standard deviation Parametric Tests Tests that require knowledge of population parameters and make certain assumptions about the populations distribution. Can only be used with interval/ratio scale data

Parametric and Nonparametric Tests


Nonparametric Tests Tests that require no knowledge of population parameters and make few assumptions about the populations distribution. Can only be used for data in ordinal form

Data may only be available in ordinal form. Sometimes, we choose to downgrade interval/ratio data to ordinal data to use nonparametric tests. We use nonparametric tests for non-normally distributed data
Example on next slide

Non-normally Distributed Data


Turbidity, the measure of haziness or cloudiness of a fluid caused by suspended solids, is a key test of water quality. Turbidity values are generally very high upstream, and drop off downstream.

Figure 1: Water Quality Along a Stream

Turbidity

non-normal distribution

normal distribution

Distance downstream from an arbitrarily chosen starting point (km)

As you can see, a non-normal distribution in green fits the data better than the normal distribution in red. We should use a non-parametric test to analyze

Wilcoxon Rank Sum W Test


A nonparametric test of sample mean difference, which only works for ordinal data. It assumes that the two population distributions have the same shape Procedure: 1. Combine the results of two samples and rank them (starting by assigning the lowest value the rank of 1) 2. If there is a tie, assign the average rank between the pairs ( Rank 7 through 11 all equal 34.3. Assign a rank of 9 to all values. 3. Put the ranked values back into their original samples

Where: Wi = sum of ranks for smaller sample E(Wi) = mean rank of smaller sample

Wilcoxon Matched-Pairs Signed-Ranks Test


A nonparametric test comparing matched-pair differences using their absolute differences
Steps:

1. For all pairs, determine the sign of the differences and the absolute value of the differences 2. Exclude all pairs with an absolute difference of 0 3. Order remaining pairs from smallest absolute difference to largest absolute difference 4. Rank the pairs, starting with the smallest as 1. Ties receive a rank equal to the average of the ranks they span

Wilcoxon Matched-Pairs Signed-Ranks Test (contd)


Where: n = number of matched-pairs; must be > 10 T = rank sum

There are two possible values for T: Tp (rank sum for positive differences) and Tn (rank sum for negative differences). Which to use depends on how HA is stated. If non-directional, we choose the smaller of Tp and Tn
If directional, we choose the value of T according to the smaller number of hypothesized differences (i.e. If more differences are expected to be positive, we choose Tn.

Wilcoxon Rank Sum W Test Example The Canadian government decides to present Canadas gross exports for 2013 by dividing the country into 20 geographic regions. Instead of providing exact dollar values for each region, they rank them. Researchers are interested in determining whether there is an appreciable difference in gross exports between Eastern and Western Canada. Groups A - K are classified as Eastern, and L-T as Western H0 : RE = RW HA : RE RW

Wilcoxon Rank Sum W Test Example (contd)


Region Location Rank A East 11 B East 12 C East 3 D East 16 E East 6 F East 14 G East 2 H East 5 I East 13 J East 20 K East 4 L West 10 M West 9 N West 17 O West 1 P West 18 Q West 7 R West 19 S West 8 T West 15 Region O G C K H E Q S M L A B I F T D N P R J Location Rank West 1 East 2 East 3 East 4 East 5 East 6 West 7 West 8 West 9 West 10 East 11 East 12 East 13 East 14 West 15 East 16 West 17 West 18 West 19 East 20

Wilcoxon Rank Sum W Test Example (contd)


We first calculate the sum of the Western and Eastern Ranks respectively
Sum of Western Ranks
RW = 10 + 9 + 17 + 1 + 18 + 7 + 19 + 8 + 15 = 104 ** Calculate W using values from sample with smaller sample size

Sum of Eastern Ranks


RE = 11 + 12 + 16 + 3 + 6 + 14 + 2 + 5 + 20 + 13 + 4 = 106

Wilcoxon Rank Sum W Test Example (contd)

The Z-score calculated using the Wilcoxon Rank Sum Test was 0.722 According to the normal table, that Z-score corresponds to A = 0.2642

p-value = 2(0.5000 0.2642) = 0.4715

H0 : WE = WW HA : WE WW

This is a very high p-value, so we cannot reject the H0 that there is no appreciable difference in gross exports between the Eastern and Western regions

Wilcoxon Matched-Pairs Signed-Ranks Test Example


A farmer recorded his crop yields for two consecutive years. The average rainfall in the growing season of Year 1 was much greater than Year 2. Is there significant difference in crop yields between the two years? H0: Yield2011 = Yield2012 HA: Yield2011 > Yield 2012
** we exclude pairs where X-Y = 0
Crop Yield i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2011 (X) 92 91 84 86 87 90 92 91 97 102 107 102 89 90 92 2012 (Y) 87 90 78 86 89 87 93 94 92 95 101 101 93 91 92 X-Y sign |X-Y| pos 5 pos 1 pos 6 N/A 0 neg 2 pos 3 neg 1 neg 3 pos 5 pos 7 pos 6 pos 1 neg 4 neg 1 N/A 0

i 2 7 14 12 5 8 6 13 9 1 3 11 10

Crop Yield 2011 2012( (X) Y) 91 90 92 93 90 91 102 101 87 89 91 94 90 87 89 93 97 92 92 87 84 78 107 101 102 95

X-Y
sign |X-Y| Rank pos 1 2 neg 1 2 neg 1 2 pos 1 2 neg 2 5 neg 3 6.5 pos 3 6.5 neg 4 8 pos 5 9.5 pos 5 9.5 pos 6 11.5 pos 6 11.5 pos 7 13

Wilcoxon Matched-Pairs Signed-Ranks Test Example (contd)


We now find the sum of ranks for pairs with positive and negative differences

Sum of Ranks for Positive Differences (Tp) Tp = 2 + 2 + 6.5 + 9.5 + 9.5 + 11.5 + 11.5 + 13 = 63.5 Sum of Ranks for Negative Differences (Tn) Tn = 2 + 2 + 5 + 6.5 + 8 = 23.5
Since our HA is directional, that the yield of 2011 will be greater than 2012, we must determine whether there are more positive or negative differences in our data. We use the rank sum of the group with the smaller number of differences.

Thus, we use rank sum for negative differences = 23.5

Wilcoxon Matched-Pairs Signed-Ranks Test Example (contd)

A Z-score of -1.54 corresponds to A = 0.4382

p = 0.5000 0.4382 p = 0.0618

This p-value rejects the null hypothesis at = 0.10, but fails to reject it at = 0.05 and = 0.01

Vous aimerez peut-être aussi