Académique Documents
Professionnel Documents
Culture Documents
One-factor ANOVA as
a Linear Model
yij comes from a population with a
11: Analysis of Variance (ANOVA)
common mean (m) plus a treatment
Each possible value of a factor or effect (Aj) plus random error (eij):
combination of factors is a treatment. yij = m + Aj + eij (j = 1, 2, …, c and i =
The F Distribution Test if each factor has a significant 1, 2, …, n)
effect on Y: Random error is assumed to be
H0: m1 = m2 = m3 =…= mc normally distributed with zero mean
H1: Not all the means are equal and the same variance.
If we cannot reject H0, we conclude that Testing hypotheses:
observations within each treatment have H0: A1 = A2 = … = Ac = 0
the same mean m. H1: Not all Aj are zero
ANOVA Assumptions If H0 is true, then the ANOVA model is:
Observations on Y are yij = m + eij
independent The same mean in all groups, or no
Populations being sampled are factor effect.
normal Decomposition of Variation
Populations being sampled have
equal variances
One-factor ANOVA
Group and Grand Means
The mean of each group (group mean)
Decision Rule:
F Statistics
F The F statistic is the ratio of the
variance due to treatments (MSA) to
the variance due to error (MSE). Where Tc,n−c is a critical value of the
Tukey test statistic Tcalc for the desired
Partition of Deviations
level of significance.
For a given observation yij, the following
5% critical values of Tukey test
relationship holds:
statistics
When F is near zero, there is little
difference among treatments and we
would not reject H0
Decision Rule: Reject H0 if F > Fα,
otherwise do not reject H0
The Tukey’s Test
Do after rejection of equal means in
Partioned Sum of Squares
ANOVA
Tells which population means are ANOVA Assumption
significantly different ANOVA assumes that observations on
e.g.: μ1 = μ2 ¹ μ3 the response variable are from
Hypothesis Testing Tukey’s studentized range test is a normally distributed populations that
multiple comparison test have the same variance.
SSA and SSE are used to test the
For c groups, there are c(c – 1)/2 distinct The one-factor ANOVA test is only
hypothesis of equal means by dividing
pairs of means to be compared. slightly affected by inequality of
each sum of squares by it degrees of
Tukey’s is a two-tailed test for equality variance when group sizes are equal.
freedom.
These ratios are called Mean Squares
of paired means from c groups One can test this assumption of
compared simultaneously. homogeneous variances by using
(MSA and MSE).
The hypotheses are: Hartley’s Fmax Test.
Hartley’s Test Two-factor ANOVA SST = SSA + SSB + SSE
Hypothesis Without Replication SST = Total sum of squared deviations
Two factor A and B may affect Y about the mean
factor A has r levels, factor B has c SSA = Between rows sum of squares
levels (effects of factor A)
all levels of both factors occur, and each SSB = Between columns sum of
The test statistic is the ratio of the
cell contains one observation squares (effects of factor B)
largest sample variance to the smallest
sample variance SSE = Error sum of squares
The decision rule: Linear Model of Two-factor
Reject H if ANOVA with Replication
0
H >H
calc critical
Levente’s Test
Levene’s test is a more robust
alternative to Hartley’s F test.
SST = SSA + SSB + SSI + SSE
Levene’s test does not assume a
SST = Total sum of squared deviations about
normal distribution. the mean
It is based on the distances of the SSA = Between rows sum of squares (effects of
observations from their sample factor A)
medians rather than their sample SSB = Between columns sum of squares
means. (effects of factor B)
Total Sum Squares can now be SSI = Interaction sum of squares (effects of AB)
split into three parts: SSE = Error sum of squares
12: Simple Linear Regression Least Squares Estimators
Regression analysis is used to: b0 and b1 are obtained by finding the
values of b0 and b1 that minimize the
Predict the value of a dependent sum of the squared differences between
variable based on the value of y and ŷ
independent variable(s)
min SSE min ei2
Explain the impact of changes in an
independent variable on the dependent min (yi ŷ i ) 2
Coefficient of Determination, R2
min [yi (b 0 b1x i )]
variable 2
The coefficient of determination is the
Linear Regression Model portion of the total variation in the
Differential calculus is used to obtain the
yi β 0 β1x i ε i coefficient estimators b0 and b1 that
dependent variable that is explained by
variation in the independent variable
minimize SSE The coefficient of determination is also
Where 0 and 1 are the population
called R-squared and is denoted as R2
model coefficients and is a random error
term. SSR SSE
R2 1
SST SST
0 R2 1
Measures of Variation
e 2
i
SSE
σ̂ 2 s e2 i 1
n2 n2
Division by n – 2 instead of n – 1 is
because the simple regression model