Vous êtes sur la page 1sur 16

ANOVA (Analysis of Variance)

- Allows researcher to compare two or more populations of interval or ratio data.


- Determines whether differences exist between population means.
- One-way analysis of variance (randomized block design), and
- two-way analysis of variance (two factor analysis of variance)
- It is a statistical procedure that test to determine whether differences exist between
two or more population means.
- Analysis of variance tests whether there is enough statistical evidence to show that
the null hypothesis is false.
- If the null hypothesis is true, the population means would be equal or we should
expect that the sample means are close to one another.

Assumptions of one-way ANOVA


- The samples are randomly selected and independently assigned to groups
- Population should have approximately equal standard deviation
- Population distribution are normal

Assumptions of two-way ANOVA


- The samples are randomly selected with repeated measures in factor B.
- Populations should have approximately equal standard deviation
- Population distributions are normal
- Population covariances are equal

Procedure for Analysis of Variance:


1. Set up the Hypothesis
2. Set the level of significance
3. Calculate the means of each group, grand mean, sum of squares, and mean squares
Calculate the value of F-test
4. Calculate the degree of freedoms and determine the critical value of F
5. Statistical decision for hypothesis testing
6. State the conclusion

A. Total Sum of Squares (SST)


- the statistic that measures the total variations of all the data

B. Sum of Squares Between groups (SSB)


- The statistic that measures the variation attributed to the differences between the
treatment means (or between-treatment variations)

C. Sum of Squares Within Groups (SSW)


- Measure the variation with sample (or within-treatment variations/error)

D. Mean Squares Between Groups (MSB)


- is computed by dividing SSB by the number of groups minus 1

E. Mean Squares within Groups (MSW)


- Is determined by dividing SSW by the number of sample size (labeled n) minus the
number of groups.

F. Test Statistic
- The test statistics is defined as the ratio of the two mean squares
estimated population variance based
F = on variance among the sample means or F = MSB
estimated population variance based MSW
on variation within samples

where: MSB = mean square between groups,


MSW = mean square within groups.
F = Fcomputed (or F ratio).

The degree of freedom from F Test are dfb = c-1 and dfw = n – c.
The sample sizes need not be equal in all groups
The F test to compare means is always right-tailed.

Example 1: Performing these calculations for the foreign freighter example yields the
following. Determine the computed Fvalue. Compare the Fvalue with the critical table Fvalue and
decide whether to reject the null hypothesis using the α = .05.

NUMBER OF FOREIGN FREIGTHERS PER DAY


No. 1 (Long 2(Houston) 3(new 4(New
Beach) York) Orleans)
1 5 2 8 3
2 7 3 4 5
3 4 5 6 3
4 2 4 7 4
5 6 9 2
6 8
Total 18 20 42 17
Mean 4.5 4 7 3.4
Solution:
1. State the Hypothesis
Ho: µ1 = µ2 = µ3 = µ4
H1: µ1 ≠ µ2 ≠ µ3 ≠ µ4
2. Determine the level of significance
α = .05
3. Determine the degrees of freedom and the critical value of F
Dfb= k-1 = 4-1 = 3
Dfw= N-k = 20-4 = 16
DfT = N-1 = 20-1 = 19
Fcritical = 3.24

4. Compute for the F-test.

Mean 1 = 4.5
Mean 2 = 4
Mean 3 = 7
Mean 4 = 3.4
Grand Mean = 4.85

Compute for the SSB, SSW SST


SSB = 42.35
SSW = 44.20
SST = 86.55

MSB = 14.12
MSW = 2.76
Fcomputed = 5.12

Anova Table

Source of Df SS MS F
Variation
Between 3 42.35 14.12 5.12
Error 16 44.20 2.76
Total 19 86.55

5. Decision Rule:
Since the computed F-value of 5.12 is greater than critical F-values at .05 level of
significance, reject the null hypothesis and accept the alternative hypothesis.
6. Since the null hypothesis is rejected, we conclude that there is enough evidence that the
three freights are equal.

Requirement #1) https://www.chegg.com/homework-help/apple-juice-manufacturer-


developed-new-product-liquid-concen-chapter-15.a-problem-6e-solution-9781285425450-
exc

Answer the following problem:

Example #2: An apple juice manufacturer has developed a new product-a liquid concentrate
that, when mixed with water, produces 1 liter of apple juice. The product has several attractive
features. First, it is more convenient than canned juice, which is the way apple juice is currently
sold. Second, because the apple juice that is sold in cans is made from concentrate, the quality of
the new product is at least as high as that of canned apple juice. Third, the cost of the new
product is slightly lower than that of canned apple juice. The marketing manager must decide
how to market the new product. She can create advertising that emphasizes convenience, quality,
or price. To facilitate a decision, she conducts an experiment in three different small cities. In
one city, she launches the product with advertising stressing the convenience of the concentrate
(e.g., easy to carry from store to home and takes up less room n the freezer). In the second city,
the advertisements emphasize the quality of the product (“average” shoppers are depicted
discussing how good the apple juice tastes). Advertising that highlights the relatively low cost of
the liquid concentrate are used in the third city. The number of packages sold weekly is recorded
for the 20 weeks following the beginning of the campaign. These data are listed in the
accompanying table. The marketing manager wants to know whether differences in sales exist
between the three advertising strategies. (We assume that except for the type of advertising, the
three cities are identical.)

City 1 City 2 City 3


(Convenience) (Quality) (Price)
529 804 672
658 630 531
793 774 443
514 717 596
663 679 602
719 604 502
711 620 659
606 697 689
461 706 675
529 615 512
498 492 691
663 719 733
604 787 698
495 699 776
486 572 561
557 523 572
353 584 469
557 634 581
542 580 679
614 624 532

Requirements #2:

Prepare five (5) problems with solution on the following topics, please indicate your
sources. No duplication allowed.

1. Normal Distribution (Area of Normal Curve and Z-Value)

Find the area under the standard normal distribution curve to the left of z = -1.02.

Solution:

Find the area to the left of z = -1.02 is the same as the area to the right of z = 1.02.

P(0 < z < 1.02) = 0.1539

P(0 < z < -1.02) = 0.5000 – P(0 < z < 1.02)

P(0 < z < -1.02) = 0.5000 – 0.3461


P(0 < z < -1.02) = 0.1539

Hence, the area is 0.1539 or 15.39%

Find P(0<Z≤1.5)

Solution:

This problem essentially asks the area between 0 and any z value to the right of 0, we
only need to look up the z value in the table. Find 1.5 in the column and 0.00 in the top row.The
value where the column and row meet in the table is the answer, 0.4332.

Hence, the area is 0.4332 or 43.32%.

2. Hypothesis Testing (t-test)

Boys of a certain age are known to have a mean weight of μ = 85 pounds. A complaint is
made that the boys living in a municipal children's home are underfed. As one bit of evidence, n
= 25 boys (of the same age) are weighed and found to have a mean weight of ¯x = 80.94 pounds.
It is known that the population standard deviation σ is 11 pounds Based on the available data,
what should be concluded concerning the complaint?

Solution:

Step 1: The null hypothesis is H0: μ = 85

Step 2: The alternative hypothesis is HA: μ < 85.

Step 3: The level of significance at α = 0.05 is at 2.064 and the df = (25-1) 24.

Step 4: The value of the sample t-test is:

T=80.94−85/11/√ 25 =−1.85

Step 5: Decision: The critical region approach tells us not to reject the null hypothesis at the α =
0.05 level since (1.85<2.064).
Step 6: Conclusion: Since we fail to reject the null hypothesis, we can conclude that there is not
enough evidence to reject the claim that the boys living in a municipal children's home are
underfed.

3. Hypothesis Testing (z-test)

A principal at a certain school claims that the students in his school are above average
intelligence. A random sample of thirty students IQ scores have a mean score of 112.5. Is there
sufficient evidence to support the principal’s claim? The mean population IQ is 100 with a
standard deviation of 15.

Solution:

Step 1: State the Null hypothesis: The accepted fact is that the population mean is 100, so: H0:
μ=100.

Step 2: State the Alternate Hypothesis: The claim is that the students have above average IQ
scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this is a one-
tailed test.

Step 3: State the level of significance. If you aren’t given a significance level, use 5% (0.05).

Step 4: Find the rejection region area from the z-table. An area of .05 is equal to a z-score of
1.645.

Step 5: Find the test statistic using this formula:


For this set of data: z= (112.5-100) / (15/√30)=4.56.
Step 6: Decision: In this case, it is (4.56 > 1.645), so we can reject the null hypothesis.

Step 7: Conclusion: Since we reject the null hypothesis, we can conclude that there is enough
evidence to support the claim that the students in his school are above average intelligence.

4. Correlation and Regression

Find the value of the correlation coefficient from the following table:

Subject Age x Glucose Level y


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Solution:

Step 1:Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

Subject Age x Glucose Level y xy x2 y2


1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99
= 4,257.

Subject Age x Glucose Level y xy x2 y2


1 43 99 4257
2 21 65 1365
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the x2
column.

Subject Age x Glucose Level y xy x2 y2


1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481

Step 4: Take the square of the numbers in the y column, and put the result in the y2
column.

Subject Age x Glucose Level y xy x2 y2


1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561

Step 5: Add up all of the numbers in the columns and put the result at the bottom of the
column. The Greek letter sigma (Σ) is a short way of saying “sum of.”

Subject Age x Glucose Level y xy x2 y2


1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809

From our table:

 Σx = 247
 Σy = 486
 Σxy = 20,485
 Σx2 = 11,409
 Σy2 = 40,022
 n is the sample size, in our case = 6

The correlation coefficient =

 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]

= 0.5298

The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which
means the variables have a moderate positive correlation.

Regression

You have to examine the relationship between the age and price for used cars sold in the last year
by a car dealership company.

Here is the table of the data:

Car Age (in years) Price (in dollars)


4 6300
4 5800
5 5700
5 4500
7 4500
7 4200
8 4100
9 3100
10 2100
11 2500
12 2200

Solution:

Now, we see that we have a negative relationship between the car price (Y) and car age(X) – as
car age increases, price decreases.

When we use the simple linear regression equation, we have the following results:

Y = Β0 + Β1X

Y = 7836 – 502.4*X

Let’s use the data from the table and create our Scatter plot and linear regression line:

Diagram 1:

Result Interpretation:

With an estimated slope of – 502.4, we can conclude that the average car price decreases $502.2
for each year a car increases in age.

5. Chi-Squre Test

6. Analysis of Variance
A clinical trial is run to compare weight loss programs and participants are randomly
assigned to one of the comparison programs and are counseled on the details of the assigned
program. Participants follow the assigned program for 8 weeks. The outcome of interest is
weight loss, defined as the difference in weight measured at the start of the study (baseline) and
weight measured at the end of the study (8 weeks), and measured in pounds.  

Three popular weight loss programs are considered. The first is a low calorie diet. The
second is a low fat diet and the third is a low carbohydrate diet. For comparison purposes, a
fourth group is considered as a control group. Participants in the fourth group are told that they
are participating in a study of healthy behaviors with weight loss only one component of interest.
The control group is included here to assess the placebo effect (i.e., weight loss due to simply
participating in the study). A total of twenty patients agree to participate in the study and are
randomly assigned to one of the four diet groups. Weights are measured at baseline and patients
are counseled on the proper implementation of the assigned diet (with the exception of the
control group). After 8 weeks, each patient's weight is again measured and the difference in
weights is computed by subtracting the 8 week weight from the baseline weight. Positive
differences indicate weight losses and negative differences indicate weight gains. For
interpretation purposes, we refer to the differences in weights as weight losses and the observed
weight losses are shown below.

Low Calorie Low Fat Low Carbohydrate Control


8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3

Is there a statistically significant difference in the mean weight loss among the four diets?  We
will run the ANOVA using the five-step approach.

 Step 1. Set up hypotheses.

H0: μ1 = μ2 = μ3 = μ4 Means are all equal

H1: Means are not all equal           

 Step 2. Set up the level of significance.  

α=0.05

The test statistic is the F statistic for ANOVA, F=MSB/MSE.


The appropriate critical value can be found in a table of probabilities for the F distribution. In
order to determine the critical value of F we need degrees of freedom, df1=k-1 and df2=N-k. In
this example, df1=k-1=4-1=3 and df2=N-k=20-4=16. The critical value is 3.24.

 Step 3. Compute the test statistic.  

To organize our computations we complete the ANOVA table. In order to compute the sums of
squares we must first compute the sample means for each group and the overall mean based on
the total sample.  

Low Low Low


  Control
Calorie Fat Carbohydrate
n 5 5 5 5
Group
6.6 3.0 3.4 1.2
mean

If we pool all N=20 observations, the overall mean is = 3.6.

We can now compute

So, in this case:

Next we compute,

SSE requires computing the squared differences between each observation and its group mean.
We will compute SSE in parts. For the participants in the low calorie diet:  

Low Calorie (X - 6.6) (X - 6.6)2


8 1.4 2.0
9 2.4 5.8
6 -0.6 0.4
7 0.4 0.2
3 -3.6 13.0
Totals 0 21.4

Thus,

For the participants in the low fat diet:  

Low Fat (X - 3.0) (X - 3.0)2


2 -1.0 1.0
4 1.0 1.0
3 0.0 0.0
5 2.0 4.0
1 -2.0 4.0
Totals 0 10.0

 Thus,

For the participants in the low carbohydrate diet:  

Low Carbohydrate (X - 3.4) (X - 3.4)2


3 -0.4 0.2
5 1.6 2.6
4 0.6 0.4
2 -1.4 2.0
3 -0.4 0.2
Totals 0 5.4

Thus,

For the participants in the control group:

Control (X - 1.2) (X - 1.2)2


2 0.8 0.6
2 0.8 0.6
-1 -2.2 4.8
0 -1.2 1.4
3 1.8 3.2
Totals 0 10.6

Thus,

Therefore,  

We can now construct the ANOVA table.

Sums of Degrees of Means


Source of Squares Freedom Squares
F
Variation
(SS) (df) (MS)
Between
75.8 4-1=3 75.8/3=25.3 25.3/3.0=8.43
Treatment
Error (or
47.4 20-4=16 47.4/16=3.0
Residual)
Total 123.2 20-1=19

 Step 5. Decision.  

We reject H0 because 8.43 > 3.24. We have statistically significant evidence at α=0.05 to
show that there is a difference in mean weight loss among the four diets.    

 Step 6: Conclusion:
Since the null hypothesis has been rejected, we can conclude that there is
evidence that shows significant difference in mean weight loss among the four diets
considered.

.
Sources:

 http://intellspot.com/linear-regression-examples/
 https://www.statisticshowto.datasciencecentral.com/probability-and-ss/correlation-
coefficient-formula/
 https://newonlinecourses.science.psu.edu/stat414/node/269/
 https://www.statisticshowto.datasciencecentral.com/probability-and-
statistics/hypothesis-testing/
 https://courses.lumenlearning.com/boundless-statistics/chapter/the-normal-curve/
 https://www.thoughtco.com/calculate-probabilities-standard-normal-distribution-
table-3126378
 http://sphweb.bumc.bu.edu/otlt/MPH-Modules/BS/BS704_HypothesisTesting-
ANOVA/BS704_HypothesisTesting-Anova_print.html

Vous aimerez peut-être aussi