Académique Documents
Professionnel Documents
Culture Documents
Like the one-sample Z-test, we use the two-sample difference of means Z-test when both n1 and n2 30
Though the formula for the Z and t test look the same, the denominator of the t test is derived using the sample variances. There are two ways to do this:
1. Assume population variances are equal (12 = 22), and calculate a weighted average of the two sample variances called a pooled variance estimate (PVE) 2. Assume population variances are unequal (12 22), direct substitution of sample variances for population variances called a separate variance estimate (SVE)
Is there a significant difference between mean house price in Dundas and Ancaster? H0 : A = D HA : A > D
Two-Sample Difference of Means t Test Example (contd) The t-value of 0.94 , according to the t table, corresponds to A = 0.3159 Therefore, the p-value is : H0 : A = D HA : A > D
This is a relatively high p-value. We can not reject H0 at both = 0.10 and = 0.05
Where: p1 = proportion of sample 1 in category of focus p2 = proportion of sample 2 in category of focus = pooled estimate of the focus category for the population
We define the focus category as one of the two possible responses. Ex. Proportions in Sample 1: Yes 0.86 ; No 0.14 If we choose Yes as the focus category , we use 0.86 for calculations
This p-value of 0.6600 is very large. We cannot reject the null hypothesis that there is no difference between urban and rural opinions on the new legislation.
Matched-Pairs Tests
Matched-pairs tests are used to analyze dependent samples Dependent Samples : Samples that are related; results of one sample give information about other samples Examples:
1. Two measurements of the same participants non-commute driving distances before and after an oil crisis. Did driving distances decrease?
2. Random sample of men and women from same Mexican villages to determine the average male and female life expectancy for these villages. Do male and female life expectancies differ between villages?
In the first example, matched-pairs would be formed from each participants before and after distances. In the second, the life expectancies of men and women from the same village are dependent and constitute a matched-pair. This is because people in the same village are affected by the same social, economic, and environmental factors.
Matched-Pairs t Test
A parametric test which compares the mean differences of matched-pairs
Where: di = difference of matched-pair i E(d) = mean of matched-pair differences d = standard error of matched-pair differences sd = standard deviation of matched-pair differences
Matched-Pairs t Test Example (contd) The t statistic calculated was 1.67 This corresponds to A = 0.4444 p = 0.5000 0.4444 p = 0.0556
This is a relatively low p-value. It rejects the null hypothesis at = 0.10. It cannot reject the null hypothesis at = 0.05 and = 0.01
Data may only be available in ordinal form. Sometimes, we choose to downgrade interval/ratio data to ordinal data to use nonparametric tests. We use nonparametric tests for non-normally distributed data
Example on next slide
Turbidity
non-normal distribution
normal distribution
As you can see, a non-normal distribution in green fits the data better than the normal distribution in red. We should use a non-parametric test to analyze
Where: Wi = sum of ranks for smaller sample E(Wi) = mean rank of smaller sample
1. For all pairs, determine the sign of the differences and the absolute value of the differences 2. Exclude all pairs with an absolute difference of 0 3. Order remaining pairs from smallest absolute difference to largest absolute difference 4. Rank the pairs, starting with the smallest as 1. Ties receive a rank equal to the average of the ranks they span
There are two possible values for T: Tp (rank sum for positive differences) and Tn (rank sum for negative differences). Which to use depends on how HA is stated. If non-directional, we choose the smaller of Tp and Tn
If directional, we choose the value of T according to the smaller number of hypothesized differences (i.e. If more differences are expected to be positive, we choose Tn.
Wilcoxon Rank Sum W Test Example The Canadian government decides to present Canadas gross exports for 2013 by dividing the country into 20 geographic regions. Instead of providing exact dollar values for each region, they rank them. Researchers are interested in determining whether there is an appreciable difference in gross exports between Eastern and Western Canada. Groups A - K are classified as Eastern, and L-T as Western H0 : RE = RW HA : RE RW
The Z-score calculated using the Wilcoxon Rank Sum Test was 0.722 According to the normal table, that Z-score corresponds to A = 0.2642
H0 : WE = WW HA : WE WW
This is a very high p-value, so we cannot reject the H0 that there is no appreciable difference in gross exports between the Eastern and Western regions
i 2 7 14 12 5 8 6 13 9 1 3 11 10
Crop Yield 2011 2012( (X) Y) 91 90 92 93 90 91 102 101 87 89 91 94 90 87 89 93 97 92 92 87 84 78 107 101 102 95
X-Y
sign |X-Y| Rank pos 1 2 neg 1 2 neg 1 2 pos 1 2 neg 2 5 neg 3 6.5 pos 3 6.5 neg 4 8 pos 5 9.5 pos 5 9.5 pos 6 11.5 pos 6 11.5 pos 7 13
Sum of Ranks for Positive Differences (Tp) Tp = 2 + 2 + 6.5 + 9.5 + 9.5 + 11.5 + 11.5 + 13 = 63.5 Sum of Ranks for Negative Differences (Tn) Tn = 2 + 2 + 5 + 6.5 + 8 = 23.5
Since our HA is directional, that the yield of 2011 will be greater than 2012, we must determine whether there are more positive or negative differences in our data. We use the rank sum of the group with the smaller number of differences.
This p-value rejects the null hypothesis at = 0.10, but fails to reject it at = 0.05 and = 0.01