Académique Documents
Professionnel Documents
Culture Documents
Example 1: A marketing research firm tests the effectiveness of three new flavorings
for a leading beverage using a sample of 30 people, divided randomly into three groups
of 10 people each. Group 1 tastes flavor 1, group 2 tastes flavor 2 and group 3 tastes
flavor 3. Each person is then given a questionnaire which evaluates how enjoyable the
beverage was. The scores are as in Figure 1. Determine whether there is a perceived
significant difference between the three flavorings.
Our null hypothesis is that any difference between the three flavors is due to chance.
H0: μ1 = μ2 = μ3
We interrupt the analysis of this example to give some background, after which we will
resume the analysis.
Definition 1: Suppose we have k samples, which we will call groups (or treatments);
these are the columns in our analysis (corresponding to the 3 flavors in the above
example). We will use the index j for these. Each group consists of a sample of size nj.
The sample elements are the rows in the analysis. We will use the index i for these.
We will use the abbreviation x̄j for the mean of the jth group sample (called the group
mean) and x̄ for the mean of the total sample (called the total or grand mean).
SST is the sum of squares for the total sample, i.e. the sum of the squared deviations
from the grand mean. SSW is the sum of squares within the groups, i.e. the sum of the
squared means across all groups. SSB is the sum of the squares between group sample
means, i.e. the weighted sum of the squared deviations of the group means from the
grand mean.
Where
and so
Summarizing:
Observation: Clearly MST is the variance for the total sample. MSW is the weighted
average of the group sample variances (using the group df as the weights). MSB is the
variance for the “between sample” i.e. the variance of {n1x̄1, …, nkx̄k}.
Property 2:
Definition 2: Using the terminology from Definition 1, we define the structural model
as follows. First we estimate the group means from the total mean: = μ + aj where aj
denotes the effect of the jth group (i.e. the departure of the jth group mean from the total
mean). We have a similar estimate for the sample of x̄j = x̄ + aj.
Similarly, we can represent each element in the sample as xij = μ + αj + εij where εij
denotes the error for the ith element in the jth group. As before we have the sample
version xij = x̄ + aj + eij where eij is the counterpart to εij in the sample.
Also εij = xij – (μ + αij) = xij – μj and similarly, eij = xij – x̄j.
Observation: Since
it follows that
and so
for any j, as well as
If all the groups are equal in size, say nj = m for all j, then
i.e. the mean of the group means is the total mean. Also
Property 3:
Observation: MSB is a measure of variability of the group means around the total mean.
MSW is a measure of the variability of each group around its mean, and, by Property 3,
can be considered a measure of the total variability due to error. For this reason, we will
sometimes replace MSW, SSW and dfW by MSE, SSE and dfE.
In fact,
If the null hypothesis is true then MSW and MSB are both measures of the same error and
so we should expect F = MSB / MSW to be around 1. If the null hypothesis is false we
expect that F > 1 since MSB will estimate the same quantity as MSW plus group effects.
In conclusion, if the null hypothesis is true, and so the population means μj for the k
groups are equal, then any variability of the group means around the total mean is due to
chance and can also be considered error.
Thus the null hypothesis becomes equivalent to H0: σB = σW (or in the one-tail test, H0:
σB ≤ σW). We can therefore use the F-test (see Two Sample Hypothesis Testing of
Variances) to determine whether or not to reject the null hypothesis.
Based on the null hypothesis, the three group means are equal, and as we can see from
Figure 2, the group variances are roughly the same. Thus we can apply Theorem 1. To
calculate F we first calculate SSB and SSW. Per Definition 1, SSW is the sum of the group
SSj (located in cells J7:J9). E.g. SS1 (in cell J7) can be calculated by the formula
=DEVSQ(A4:A13). SSW (in cell F14) can therefore be calculated by the formula
=SUM(J7:J9).
The formula =DEVSQ(A4:C13) can be used to calculate SST (in cell F15), and then per
Property 2, SSB = SST – SSW = 492.8 – 415.4 = 77.4. By Definition 1, dfT = n – 1 = 30 –
1 = 29, dfB = k – 1 = 3 – 1 = 2 and dfW = n – k = 30 – 2 = 28. Each SS value can be
divided by the corresponding df value to obtain the MS values in cells H13:H15. F is
then MSB / MSW = 38.7/15.4 = 2.5. We now test F as we did in Two Sample Hypothesis
Testing of Variances, namely:
Either of these shows that we can’t reject the null hypothesis that all the means are
equal.
As explained above, the null hypothesis can be expressed by H0: σB ≤ σW, and so the
appropriate F test is a one-tail test, which is exactly what FDIST and FINV provide.
We can also calculate SSB as the square of the deviations of the group means where each
group mean is weighted by its size. Since all the groups have the same size this can be
expressed as =DEVSQ(H7:H9)*F7.
SSB can also be calculated as DEVSQ(G7:G9)/F7. This works as long as all the group
means have the same size.
Excel Data Analysis Tool: Excel’s Anova: Single Factor data analysis tool can also be
used to perform analysis of variance. We show the output for this tool in Example 2
below.
The Real Statistics Resource Pack also contains a similar supplemental data analysis
tool which provides additional information. We show how to use this tool in Example 1
of Confidence Interval for ANOVA.
Example 2: A school district uses four different methods of teaching their students how
to read and wants to find out if there is any significant difference between the reading
scores achieved using the four methods. It creates a sample of 8 students for each of the
four methods. The reading scores achieved by the participants in each group are as
follows:
Figure 3 – Data and output from Anova: Single Factor data analysis tool
This time the p-value = .04466 < .05 = α, and so we reject the null hypothesis, and
conclude that there are significant differences between the methods (i.e. all four
methods don’t have the same mean).
Note that although the variances are not the same, as we will see shortly, they are close
enough to use ANOVA.
and so
Observation: In both ANOVA examples, all the group sizes were equal. This doesn’t
have to be the case, as we see from the following example.
Example 3: Repeat the analysis for Example 2 where the last participant in group 1 and
the last two participants in group 4 leave the study before their reading tests were
recorded.
Using Excel’s data analysis tool we see that p-value = .07276 > .05, and so we cannot
reject the null hypothesis and conclude there is no significant difference between the
means of the four methods.
From Figure 6, we see that we obtain a value for MSW in Example 3 of 177.1655, which
is the same value that we obtained in Figure 5.
We first find the total mean (the value in cell P10 of Figure 7), which can be calculated
either as =AVERAGE(A4:D11) from Figure 5 or =SUMPRODUCT(O6:O9,P6:P9)/O10
from Figure 7. We then calculate the square of the deviation of each group mean from
the total mean. E.g. for group 1, this value (located in cell Q6) is given by =(P6-P10)^2.
Finally, SSB can now be calculated as =SUMPRODUCT(O6:O9,Q6:Q9).
Real Statistics Functions: The Real Statistics Resource Pack contains the following
supplemental functions for the data in range R1:
Here b is an optional argument. When b = True (default) then the columns denote the
groups/treatments, while when b = False, the rows denote the groups/treatments. This
argument is not relevant for SSTot, dfTot and MSTot (since the result is the same in
either case).
Real Statistics Data Analysis Tool: As mentioned above, the Real Statistics Resource
Pack also contains the Single Factor Anova and Follow-up Tests data analysis tool
which is illustrated in Example 1 and 2 of Confidence Interval for ANOVA.