ANOVA (Analysis of Variance) 5

Rift Valley University,
Department of Public Health
Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

5/15/2016 1
terek7@gmail.com
Types of t-test:
– One-sample t-test: which is used to compare a
single mean to a fixed number or "gold standard".
– Paired t-test: which is used to compare two
means based on samples that are paired in some
way.
– Two-sample t-test: which is used to compare two
population means based on independent samples
from the two populations or groups.

5/15/2016 2
terek7@gmail.com
One-sample t-test:
– One-sample t-test: The One-Sample T Test

procedure tests whether the mean of a single
variable differs from a specified constant.
– Assumptions: This test assumes that the data are
normally distributed;
If the confidence interval does not contain null
value, this also indicates that the test is
significant.

5/15/2016 3
terek7@gmail.com
One-sample t-test, Example:-
 A researcher is planning a psychological intervention

study, but before he proceeds he wants to characterize
his participants' depression levels. He tests each
participant on a particular depression index, where
anyone who achieves a score of 4.0 is deemed to have
'normal' levels of depression

5/15/2016 4
terek7@gmail.com
One-sample t-test, Example cont’d…
 Lower scores indicate less depression and higher

scores indicate greater depression. He has recruited
40 participants to take part in the study. Depression
scores are recorded in the variable dep_score. He
wants to know whether his sample is representative
of the normal population (i.e., do they score
statistically significant difference from 4.0).
One-Sample Test
Test Value = 4
95% Confidence
Interval of the
Mean Difference
t df Sig. (2-tailed) Difference Lower Upper
dep_score -3.347 Teresa Kisi
39 (MPH in Epidemiology
.002 and
-.26875 Biostatistics,
-.4312 Assist.
-.1063Prof.)
5/15/2016 5
terek7@gmail.com
Paired t-test:
 One of the most common experimental designs is

the "pre-post" design.
 A study of this type often consists of two
measurements taken on the same subject, one
before and one after the introduction of a treatment
or a stimulus.
 The basic idea is simple. If the treatment had no
effect, the average difference between the
measurements is equal to 0 and the null hypothesis
holds.
5/15/2016 6
terek7@gmail.com
Paired t-test cont’d…
 On the other hand, if the treatment did have an

effect (intended or unintended!), the average
difference is not 0 and the null hypothesis is
rejected.
 The Paired-Samples T Test procedure is used to test
the hypothesis of no difference between two
variables.

5/15/2016 7
terek7@gmail.com
Example (dietstudy)
 A physician is evaluating a new diet for her patients
with a family history of heart disease. To test the
effectiveness of this diet, 16 patients are placed on
the diet for 6 months.
 Their weights and triglyceride levels are measured
before and after the study, and the physician wants
to know if either set of measurements has changed.

5/15/2016 8
terek7@gmail.com
 Use Paired-Samples T Test to determine whether

there is a statistically significant difference between
the pre- and post-diet weights and triglyceride levels
of these patients.

5/15/2016 9
terek7@gmail.com
 Select Triglyceride (tg0) and Final Triglyceride(tg4)

as the first set of paired variables.
 Select Weight and Final Weight as the second pair.
Paired Samples Test
Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair 1 Triglyceride - Final
14.063 46.875 11.719 -10.915 39.040 1.200 15 .249
triglyceride
Pair 2 Weight - Final weight 8.063 2.886 .722 6.525 9.600 11.175 15 .000
Since the significance value for change in weight is less than

0.05, you can conclude that the average loss of 8.06 pounds per
patient is not due toTeresa
chance variation, and can be attributed to
Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)
the diet.
5/15/2016
terek7@gmail.com
10
 However, the significance value greater than 0.10 for

change in triglyceride level shows the diet did not
significantly reduce their triglyceride levels.

5/15/2016 11
terek7@gmail.com
Two-sample t-test:
 First, there are a number of pre-requisites that need

to be met:
– Data for both groups must be metric.
– The distribution of the relevant variable in each
population must be reasonably Normal.
– The population standard deviations of the two
variables concerned should be approximately the
same, but this requirement becomes less
important as sample sizes get larger. You can
check this by examining the two sample standard
deviation
5/15/2016 12
terek7@gmail.com
Two-sample t-test: cont’d…
 Example:
 Suppose you want to compare (by estimating the
difference between them), the population mean
birth weights of infants born in a maternity unit
with that of infants born at home (sample data is
given as two sample test). The two samples were
selected independently with no attempt at
matching.

5/15/2016 13
terek7@gmail.com
 It is important to remember that a difference

between two sample values does not necessarily
mean that there is a difference in the two
population values. Any difference in these sample
birth weight means might simply be due to chance.
A low significance value for the t test (typically less than 0.05)
indicates that there is a significant difference between the two
group means.
5/15/2016 14
terek7@gmail.com
Independent Samples Test
Levene's Test for

Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
Hospiatlbirth Equal variances
.039 .845 -.833 58 .409 -81.96667 98.45150 -279.039 115.10543
assumed
Equal variances
-.833 57.967 .409 -81.96667 98.45150 -279.041 115.10779
not assumed
 If you want to know, the existence of statistically

significant difference between two population
means, calculate the 95 percent confidence interval
for the difference and see if it contains zero.

5/15/2016 15
terek7@gmail.com
Since this confidence interval includes zero, you can

conclude that there is no statistically significant
difference in population mean birth weights of infants
born in a maternity unit and infants born at home.
If the significance value for the Levene test is high (typically
greater than 0.05)... Use the results that assume equal
variances for both groups.
If the significance value for the Levene test is low...

Use the results that do no assume equal variances for
both groups.
5/15/2016 16
terek7@gmail.com
– A t-distribution can be used for testing
hypotheses about differences of means for
independent samples if both populations are
normal and have the same variances.
– However, the usual two-sample t-test cannot be
applied when more complex sets of data
comprising more than two groups are considered.
In this regard, one-way analysis of variance
(ANOVA) is used to compare the means of several
groups.

5/15/2016 17
terek7@gmail.com
 Comparison of several means - Analysis of variance
– It is used when there is a single way of classifying
individuals. That is, when the subgroups to be
compared are defined by just one factor,
E.g. For example, say you are interested in comparing/
studying the blood pressure level of three groups of
patients who take three different treatments. There
is only one grouping (type of treatment
administered) that you are using to define the groups.
– When there are two factors classifying the
observations we need two way analysis of variance,
and so on.
5/15/2016 18
terek7@gmail.com
Dependant variable ===> blood pressure (outcome)
Independent variable ===>treatment (factor)

5/15/2016 19
terek7@gmail.com
Hypothesis of Anova:
– Test the hypothesis that the means of two or
more groups are not significantly different.
One-Way ANOVA also offers:
– Group-level statistics for the dependent variable
– A test of variance equality
– A plot of group means
– Pair wise multiple comparisons, and describe the
nature of the group differences
5/15/2016 20
terek7@gmail.com
– One-way analysis of variance is based on assessing
how much of the overall variation in the data is
attributable to differences between the group
means, and comparing this with the amount
attributable to differences between individuals in
the same group.
– The calculations for one way ANOVA are
expressed in relation to the sum of the
observations in each sample.

5/15/2016 21
terek7@gmail.com
– Suppose we have K samples of observations, with
ni observations in the sample, then we calculate:
– y = mean of observations in the ith group,
i k
– T = sum of all observations =  ni y i =Σyi
i 1
n
– S = sum of squares of all observations =  yi2
i 1
k
– N = total number of observations = i 1

ni

5/15/2016 22
terek7@gmail.com
 One way ANOVA partitions the total sum of squares
(SST) into two distinct components.
– The sum of squares due to differences between
the group means (SSB).
– The sum of squares due to differences among the

observations within each group (SSW). This is also
called the residual sum of squares or unexplained.
SST = SSB + SSW

5/15/2016 23
terek7@gmail.com
– SSB = Total sum of squared deviations of group
means about grand mean
k  
SS B =  ( yi y ) 2
i1
– SSW = Total sum of squared deviations of each

observation about group mean
n 
SS W =  ( y i  y i
) 2
i1

5/15/2016 24
terek7@gmail.com
SST = Total sum of squared deviations of each
observation about grand mean
n 
SST = SSB + SSW =  i y
( y
i 1
- ) 2

5/15/2016 25
terek7@gmail.com
 The sum of squares for one way ANOVA are given as
follows:
Source of variation Sum of squares
Between groups k
2 2
(Explained) SSB =  i i
n x
i 1
i
 T /N
2
Within groups SSW = S T / N
(Unexplained)
k
Total SST = S   ni M i 2
i 1
(= SSB
Teresa Kisi (MPH +SSW) and Biostatistics, Assist. Prof.)
in Epidemiology
5/15/2016 26
terek7@gmail.com
– The significance test for differences between the
groups is based on a comparison of the between
groups and within groups mean squares.
– If the observed differences between the means of

the groups are simply due to chance variation, the
variation between these group means will be
about the same as the variation within individuals
of the same type.

5/15/2016 27
terek7@gmail.com
 If there are real differences, the between groups
variation will be larger. The mean squares are
compared using the F-test. This test is sometimes
known as variance-ratio test.
B e tw e e n g r o u p s
F =
W ith in g r o u p s
Df Between-groups = k-1
Df within-groups = N-k
where:
N is the total number of observations and
k is the number of groups.
5/15/2016 28
terek7@gmail.com
One way ANOVA table looks like the following:
Source of DF SS Mean square F P
variation
Between k-1 SSB SSB / k-1

groups (SSB / k-1)/
(SSW / N-k)
Within N-k SSW SSW / N-k
groups
Total N-1 SST

5/15/2016 29
terek7@gmail.com
 Assumptions
– The data are normally distributed or the samples
have to come from Normally distributed
populations.
– The population value for the standard deviation
between individuals is the same for each group
(equal variance).
– Moderate departures from normality and unequal
standard deviations may be safely ignored. If not
transforming, using assumption free, the data
may be useful.
5/15/2016 30
terek7@gmail.com
 Example 1
Twenty-two patients undergoing cardiac bypass surgery were
randomized to one of three ventilation groups:
Group I: Patients received a 50% nitrous oxide and
50% oxygen mixture continuously for 24 hours;
Group II: Patients received a 50% nitrous oxide and 50%

oxygen mixture only during the operation;
Group III: Patients received no nitrous oxide but received

35-50% oxygen for 24 hours.
5/15/2016 31
terek7@gmail.com
– The table below shows red cell folate levels for the
three groups after 24 hours' ventilation. We wish to
compare the three groups, and test the null
hypothesis that the three groups have the same red
cell folate levels.
– Examination of the data does not reveal any obvious

outliers and the data in each group look plausible
samples from a Normal distribution. The standard
deviation in group I is rather higher than those in the
other groups, but moderate variability is not a
problem.
5/15/2016 32
terek7@gmail.com
– Levene statistic test is useful for assessing the null
hypothesis that more than two samples come from
populations with the same variance. Some computer
programs incorporate this test (Eg. In SPSS).
Bartlett's test: is used in Stata
Test of Homogeneity of Variances
Red cell folate levels (µg/l)

Levene
Statistic df1 df2 Sig.
3.823 2 19 .040
The significance value is less than 0.05, suggesting that the variances for
the three groups are not equal and the assumption is not assumed .
5/15/2016 33
terek7@gmail.com
Example 1: Red cell folate levels (μg/l) in three groups
of cardiac bypass patients given different levels of
nitrous oxide ventilation (Amess et al., 1978)
Group I Group II Group III
(n=8) (n=9) (n=5)
243 206 241
251 210 258
275 226 270
291 249 293
347 255 328
354 273
380 285
392 295
309
Mean =316.6 256.4 278.0

SD = 58.7 37.1 33.8
5/15/2016 34
terek7@gmail.com
 Hypotheses
– Ho : μ1 = μ2 = μ3 or means of groups are not
significantly different.
– HA : Differences exist between at least some of the

means/ groups

5/15/2016 35
terek7@gmail.com
 ANOVA table Explained by the model
ANOVA
Red cell folate levels (µg/l)

Sum of
Squares df Mean Square F Sig.
Between Groups 15515.766 2 7757.883 3.711 .044
Within Groups 39716.097 19 2090.321
Total 55231.864 21
Since the P value is less than 0.05, the null hypothesis

is rejected. This is a global test
Unexplained variation

5/15/2016 36
terek7@gmail.com
 Pair-wise comparisons of group means
– One way ANOVA is an extension of the two
sample t test. When there are only two groups,
the F value will be the square of the
corresponding t value with (1, N-2) degrees of
freedom. Remember the degrees of freedom for
the two sample t test is N-2.
– With two groups the interpretation of a significant
difference is reasonably straightforward, but how
do we interpret significant variation among the
means of three or more groups?
5/15/2016 37
terek7@gmail.com
– Further analysis is required to find out how the
means differ, for example, whether one group
differs from all the others.
– It should be noted that pair-wise comparisons will
be carried out when the overall comparison of
groups in the analysis of variance is significant.
This is called Post Hoc multiple comparison
– With k groups, there are ½k(k-1) possible pair-
wise comparisons of group means.

5/15/2016 38
terek7@gmail.com
If we perform k paired comparisons, then we should
multiply the P value obtained from each test by k;
that is, we calculate P' = kP with the restriction that
P' cannot exceed 1.

5/15/2016 39
terek7@gmail.com
The Post Hoc tests
are divided into two
sets:
The first set assumes
groups with equal
variances.
The second set does not

assume that the variances are
equal.
5/15/2016 40
terek7@gmail.com
Do post hoc test for example 1 above using:
– Benferroni method
– Scheffe method

5/15/2016 41
terek7@gmail.com
Post hoc test result (Bonferroni)
Multiple Comparisons
Dependent Variable: Red cell folate levels (µg/l)
Mean
Difference 95% Confidence Interval
(I) group (J) group (I-J) Std. Error Sig. Lower Bound Upper Bound
Scheffe 1.00 2.00 60.18056* 22.21594 .045 1.2192 119.1420
3.00 38.62500 26.06443 .354 -30.5503 107.8003
2.00 1.00 -60.18056* 22.21594 .045 -119.1420 -1.2192
3.00 -21.55556 25.50141 .704 -89.2366 46.1255
3.00 1.00 -38.62500 26.06443 .354 -107.8003 30.5503
2.00 21.55556 25.50141 .704 -46.1255 89.2366
Bonferroni 1.00 2.00 60.18056* 22.21594 .042 1.8614 118.4998
3.00 38.62500 26.06443 .464 -29.7969 107.0469
2.00 1.00 -60.18056* 22.21594 .042 -118.4998 -1.8614
3.00 -21.55556 25.50141 1.000 -88.4995 45.3884
3.00 1.00 -38.62500 26.06443 .464 -107.0469 29.7969
2.00 21.55556 25.50141 1.000 -45.3884 88.4995
*. The mean difference is significant at the .05 level.

5/15/2016 42
terek7@gmail.com
A. Groups I and II
– The P- value = 0.042
– 95% CI = (1.86, 118.5)
B. Group I and III
– the p-value = 0.464
– 95% CI = (-29.8,107.05)
C. Group II and III
– The p-value = 1.00
– 95% CI = (-88.50, 45.39)
Therefore, the main explanation for the difference between
the groups that was identified in the analysis of variance is
thus the difference between groups I and II.
5/15/2016 43
terek7@gmail.com
Post hoc test result (Scheffe)
Multiple Comparisons
Dependent Variable: Red cell folate levels (µg/l)
Mean
Difference 95% Confidence Interval
(I) group (J) group (I-J) Std. Error Sig. Lower Bound Upper Bound
Scheffe 1.00 2.00 60.18056* 22.21594 .045 1.2192 119.1420
3.00 38.62500 26.06443 .354 -30.5503 107.8003
2.00 1.00 -60.18056* 22.21594 .045 -119.1420 -1.2192
3.00 -21.55556 25.50141 .704 -89.2366 46.1255
3.00 1.00 -38.62500 26.06443 .354 -107.8003 30.5503
2.00 21.55556 25.50141 .704 -46.1255 89.2366
Bonferroni 1.00 2.00 60.18056* 22.21594 .042 1.8614 118.4998
3.00 38.62500 26.06443 .464 -29.7969 107.0469
2.00 1.00 -60.18056* 22.21594 .042 -118.4998 -1.8614
3.00 -21.55556 25.50141 1.000 -88.4995 45.3884
3.00 1.00 -38.62500 26.06443 .464 -107.0469 29.7969
2.00 21.55556 25.50141 1.000 -45.3884 88.4995
*. The mean difference is significant at the .05 level.

5/15/2016 44
terek7@gmail.com
5/15/2016 45
terek7@gmail.com

ANOVA (Analysis of Variance) 5

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

ANOVA (Analysis of Variance) 5

Transféré par

Droits d'auteur :

Formats disponibles

Rift Valley University,

Department of Public Health

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

– One-sample t-test: The One-Sample T Test

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 A researcher is planning a psychological intervention

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 Lower scores indicate less depression and higher

 One of the most common experimental designs is

 On the other hand, if the treatment did have an

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 Use Paired-Samples T Test to determine whether

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 Select Triglyceride (tg0) and Final Triglyceride(tg4)

Since the significance value for change in weight is less than

 However, the significance value greater than 0.10 for

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 First, there are a number of pre-requisites that need

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

 It is important to remember that a difference

Independent Samples Test

Levene's Test for

 If you want to know, the existence of statistically

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Since this confidence interval includes zero, you can

If the significance value for the Levene test is low...

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

– N = total number of observations = i 1

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

– The sum of squares due to differences among the

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

– SSW = Total sum of squared deviations of each

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

– If the observed differences between the means of

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Between k-1 SSB SSB / k-1

Total N-1 SST

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Group II: Patients received a 50% nitrous oxide and 50%

Group III: Patients received no nitrous oxide but received

– Examination of the data does not reveal any obvious

Test of Homogeneity of Variances

Red cell folate levels (µg/l)

Mean =316.6 256.4 278.0

– HA : Differences exist between at least some of the

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Red cell folate levels (µg/l)

Since the P value is less than 0.05, the null hypothesis

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

The second set does not

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Dependent Variable: Red cell folate levels (µg/l)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Dependent Variable: Red cell folate levels (µg/l)

Teresa Kisi (MPH in Epidemiology and Biostatistics, Assist. Prof.)

Vous aimerez peut-être aussi