Vous êtes sur la page 1sur 40

Analysis of Variance

We learned To perform tests to compare one mean to another. Comparing samples mean drawn from two independent populations INDEPENDENT SAMPLES T-TEST. One sample of subjects was used to compare two different treatment conditions PAIRED SAMPLES T-TEST

Vishal Sarin

Comparing three or more means

There are a number of ANOVA procedures, typically distinguished by the number of FACTORS, or independent variables. One factor ANOVA (one way ANOVA) indicates there is single independent variable.

Vishal Sarin

students.sav
Suppose we are curious about whether students who have to work many hours outside of school to support themselves find their grade suffering. We could examine this question by comparing the GPAs of students who work various amounts of time outside of school. The one factor in this example is the amount of work because it defines the different conditions being compared.
Vishal Sarin 3

In student.sav
One variable = WorkCat, represents work time outside of school (0 hours, 1-19 hours, 20 or more hours). Factor = average GPA

Vishal Sarin

Vishal Sarin

Choose Simple and Click Define


Vishal Sarin 6

Vishal Sarin

The boxplots shows some variation across the groups, with the highest GPAs belonging to the students who worked between 119 hours. Median GPA differ slightly among the groups. Why? May be because of Sampling error?
Vishal Sarin 8

Should we attribute the observed differences to sampling error, or do they reflect genuine differences among the three populations? Neither the boxplots nor the sample medians offer decisive evidence. To clarify some of the ambiguity by ANOVA. How much variation due to sampling error and how much due to factor.
Vishal Sarin 9

H 0 : 1: 2: 3 H 1 : Atleast one population mean is different from the others. But ANOVA requires three conditions
Independent samples Normal populations Homogeneity (or equality) of pop. variances.

We will be able to formally test both the normality and the homogeneity assumptions.
Vishal Sarin 10

Vishal Sarin

11

Vishal Sarin

12

Kolmogorov-Smirnov test assesses whether there is significant departure from normality in the population distribution for each of the three groups. Null hypo that the population distribution is normal. As p-values > 0.05, we do not reject the Null hypothesis. Conclusion: Data do not violate the normality assumption. Shapiro-Wilk used when n<50.
Vishal Sarin 13

Homogeneity of variance

Vishal Sarin

14

Vishal Sarin

15

Levenes test for homogeneity of variances assesses whether the population variances for groups are significantly different from each other. Null hypothesis: population variances are equal. P-value > 0.05, we do not reject null hypothesis. Conclusion: Data do not violate the homogeneity of variance assumption.

Vishal Sarin

16

We can see that the test statistic (F) equals 8.865 with a corresponding p-value of 0.000. In this test, we would reject the null hypothesis. Conclusion: Data provide substantial evidence of at least one significant differences in mean GPAs among the three groups of students.

Vishal Sarin

17

Where are the differences?


From ANOVA we know the means are not all the same. POST-HOC test provide answers which means are different. There are many POST-HOC We will use TUKEYS HONESTLY SIGNIFICANT DIFFERENCE TEST.

Vishal Sarin

18

Which group had the lowest mean GPA? Which group had the highest mean GPA? Do you think a mean GPA of 3.17 is significantly better than a mean GPA of 3.02?

Data failed to tell decisively if significant differences exist.

Vishal Sarin

19

Vishal Sarin

20

The first line of the table represents the pairwise comparison of the mean GPAs between the none and the some categories of work. The mean difference is listed as -0.28636 and an asterisk (*) is displayed next to it, indicating that this represents a significant difference.
Vishal Sarin 21

Students who worked some hours (1-19 hrs.) had better GPAs than students who did not work at all. Students who worked some hours (1-19 hrs.) had comparable GPAs to students who worked many hours (20 + hrs.) Students who worked many hours had comparable GPAs to students who did not work at all.
Vishal Sarin 22

TWO WAY ANOVA


In ANOVA II, variation in data caused by two factors. It is used when we want to consider two independent variables at the same time. Variables are called FACTORS

Vishal Sarin

23

HYPOTHESIS
It involves three distinct hypothesis 1.The mean difference between levels of the first factor. 2.The mean difference between levels of the second factor. 3.Any other mean differences that may result from the unique combination of the two factors.
Vishal Sarin 24

First two hypothesis is called MAIN effects. Null hypothesis for main effects is always that there are no difference among the levels of the factor. The third hypothesis test is called the test for the interaction, as it examines the effects of the combination of the two factors together. Null hypothesis for the interaction is that there is not interaction between the factors.
Vishal Sarin 25

BP.SAV
This file is based medical research, interested in studying risk factors for developing hypertension. This file seek whether a persons gender and whether a person had a parent with hypertension. These two independent variables resulted in four groups of participation: male with and without parental hypertension, and female with and without parental hypertension. Focus variables: systolic blood pressure, gender and parental history.
Vishal Sarin 26

Univariate ANOVA Output

Vishal Sarin

27

Vishal Sarin

28

Levenes test for homogeneity of variances assesses whether the population variances for the groups are significantly different from each other. Levene statistic (F) has a value of 1.364 and the p-value of 0.255. Do not reject H0. Data do not violate the homogeneity of variance assumption.

Vishal Sarin

29

See the variable gender and notice that the F statistic for this test of the main effect has a value of 36.501, with a corresponding significance of 0.000. Reject Ho and conclude that there is significant main effect for the gender factor.
Vishal Sarin 30

See the variable ph and notice that the F statistic for this test of the main effect has a value of 9.210, with a corresponding significance of 0.003. Reject Ho and conclude that there is significant main effect for the ph factor also.
Vishal Sarin 31

To find out if there is a significant interaction between the gender and parental history factors, we see gender*ph. Null hypothesis accepted i.e. there is no significant interaction between the two factors.
Vishal Sarin 32

We have two significant main effects and a non significant interaction. What exactly do these results mean?

Graph will rescue us from this quandary.


Vishal Sarin 33

Graph > Chart builder


Choose Bar and draft the first bar graph icon (simple) to the perview area. Then drag sbpma to the vertical axis and PH to the horizontal axis. Click on the Groups/Points ID tab and choose Columns panel variable. Then drag gender to the panel box. Press OK
Vishal Sarin 34

Vishal Sarin

35

All bars are seems to be of same heights, so the significant main effects are hard to discern. Use Chart Editor to make bar chart where the y-axis (sbpma) does not start with zero. It will show the difference clearly.
Vishal Sarin 36

On double click Chart Editor will appear. According to data the minimum sbpma is 118 so we select it as 115. New graphs will appear.
Vishal Sarin 37

Now the new bar shows the significant main effects much more clearly.

Vishal Sarin

38

OBSERVATIONS; 1.Males have higher systolic BP during mental arithmetic then females, regardless of parental hypertension. 2.Individuals having a parent with hypertension have higher systolic pressure than those who do not, and this occurs regardless of gender. 3.Therefore, both factors separately, but not their combination, can put one at risk for developing hypertension.
Vishal Sarin 39

Vishal Sarin

40