Vous êtes sur la page 1sur 20

Chapter 10 Introduction to Hypothesis Testing

Repeated Measures Design


Also known as replicated measures, or correlated groups design
 There are paired scores in the conditions
 Differences between the paired conditions are analysed.
 Can be within-subject, where subjects served as their own controls. Or between
subject, where we can have identical twins or matched subjects (using some
criteria)

Type 1 error: Rejecting the null hypothesis when the null hypothesis is true.

Type 2 error: Failing to reject the null hypothesis when the null hypothesis is false.
(there is a real effect)

State of Reality
Decision 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒
Retain 𝐻0 Correct Decision Type 2 error = 𝛽
Reject 𝐻0 Type 1 error = 𝛼 Correct Decision= Power
Sum of Columns 1 1

Alpha level (𝛼):The level to which the scientist wishes to limit the probability of making
a Type 1 Error. Set at the beginning of the experiment.
 The more stringent the alpha level, the lower the probability of making a Type 1
error. BUT
 The more stringent the alpha level, the higher the probability of making a Type 2
error.
 Probability of making a Type 1 error decreases greatly with independent
replication.

Evaluating the results


 It is incorrect to evaluate the specific outcome against how probable it is if the
null hypothesis were true.
 Rather, we should determine the probability of getting an outcome as extreme as
or more extreme than the specific outcome.  That's why most experiments are
two-tailed.
 The evaluation should always be two-tailed unless the experimenter will retain
𝐻0 when results are extreme in the direction opposite to the predicted direction.
E.g. New formula brand X makes babies smarter. If the results show that babies
are less smart after consuming formula X, then we reject the null hypothesis
anyway.
 Do not ever switch alternative hypothesis after seeing the data. E.g. from a
direction one tailed test (increase) to a non-directional two-tailed test. If 𝛼 =
0.05,this increases Type 1 error to 0.075.

Sign Test
Step 1: Calculate the pluses and the minuses
Step 2: Evaluate the probability of getting an outcome as extreme as or more extreme
than what was observed, assuming the null hypothesis is true.
Use the binomial distribution.
Since we assume 𝐻0 is true, P = 0.5
Step 3: Compare this probability against 𝛼, if it is smaller than 𝛼, then we reject the null
hypothesis.

1
Size of effect
We must not confuse statistically significant with ‘important’! Real effect may exist but
it may be too small to be important (e.g. this drug helps u lose weight, but 10 calories a
day)

Chapter 11: Power

Power is:
 A measure of the sensitivity of the experiment to detect a real effect of the IV.
 The probability that the results of an experiment will allow rejection of the null
hypothesis if the IV has a real effect.
 Goes from 0 to 1 (since it is a probability)
 Useful to determine when:
o Initially designing the experiment
o Interpreting the results of experiments that fail to detect any real effect
of the IV

𝑃𝑛𝑢𝑙𝑙 is the probability of getting a plus with any subject in the sample of the experiment
when the IV has no effect.
- 𝑃𝑛𝑢𝑙𝑙 is always 0.5
𝑃𝑟𝑒𝑎𝑙 is the probability of getting a plus with any subject in the sample of the experiment
when the IV has a real effect.
- in reality, you don’t know 𝑃𝑟𝑒𝑎𝑙 , but we provide estimates for it using pilot work
or other research. Then we design experiment that has power high enough to
detect that size of effect.
- It is also the proportion of pluses in the population if the experiment were done
on the entire population and the independent variable has a real effect.

Power varies with…


- Varies directly with N.
- Varies directly with size of real effect.
o When the size of the effect approaches that predicted by the null
hypothesis, power gets very low.
- Varies directly with alpha level.
o As alpha gets more stringent, power decreases.
o So as Type 1 error decreases, Type 2 error increases (since power
decreases and 𝛽 = 1 − 𝑃𝑜𝑤𝑒𝑟). To increase power and decrease Type 2
error, we can increase the sample size N.
- 𝛽 + 𝑃𝑜𝑤𝑒𝑟 = 1

Interpreting non-significant results


Failure to reject 𝐻0 may be because:
o 𝐻0 is indeed true.
o 𝐻0 is false, but the experiment was of low power.
It is due to the second reason that we can never say that we accept 𝐻0 when the results
are non-significant.

Calculation of power
Step 1: Assume the null hypothesis is true. Using 𝑃𝑛𝑢𝑙𝑙 = 0.5, determine the possible
sample outcomes in the experiment that allow 𝐻0 to be rejected.
o If the test is two-tailed, start with both ends of N.
o If the test is one-tailed, start with the end of N in your predicted direction.

2
Step 2: For the level of 𝑃𝑟𝑒𝑎𝑙 under consideration, determine the probability of getting
the sample outcomes in Step 1.

Chapter 12: Sampling Distributions, SDM and the Normal (Z) Test

Sampling distribution gives


1. All the values that the statistic can take
2. The probability of getting each value under the assumption that it resulted from
chance alone

How to do the Z test:


1. Calculate the test statistic:
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
𝑧𝑜𝑏𝑡 =
𝜎𝑋̅

̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝜎
√𝑁
2. Find 𝑍𝑐𝑟𝑖𝑡 𝑔𝑖𝑣𝑒𝑛 𝛼, which can be either 1-tailed or 2-tailed.
3. Evaluate the statistic
If |𝑧𝑜𝑏𝑡 | ≥|𝑧𝑐𝑟𝑖𝑡 |, reject 𝐻0 .

Note: if the test is 2-tailed. Quote 𝑧𝑐𝑟𝑖𝑡 as ±.


Unlike t-test and other tests, we don't require degrees of freedom.

Conditions under which the Z-test is appropriate


1. Parameters of the null hypothesis population are known. (i.e. 𝜇 and 𝜎 are
known)
2. Single Sample Mean (𝑋 ̅̅̅̅̅̅
𝑜𝑏𝑡 )
3. Sampling distribution of the Mean should be normally distributed.
a. N ≥ 30
b. Null hypothesis is normally distributed.

3
Chapter 13: Single sample T-test

Main difference: Sometimes 𝜎 is unknown. In that case, we estimate it with s, the sample
standard deviation.

̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
𝑡𝑜𝑏𝑡 =
𝑠𝑋̅

̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝑠
√𝑁
Degrees of Freedom
 Number of scores that are free to vary in calculating that statistic
 For sample s.d, since the sum of deviations about the mean must equal zero, only
N-1 of the deviation scores are free to take on any value.
 For mean, it has N degrees of freedom since even after we know the N-1th score,
the Nth score can still take on any value.
 The t dist varies uniquely with degrees of freedom.

T vs Z distributions
 As df increases, the t distribution approximates the normal curve.
 Hence, when df → ∞, the t dist is identical to the z dist.
 At any df other than ∞, the t distribution has more extreme t values than the z
dist.
o Cuz the tails of the t distribution are elevated relative to the z
distribution.
 The t test is less powerful than the z test. For any alpha level, 𝑡𝑜𝑏𝑡 must be higher
than 𝑧𝑜𝑏𝑡 to reject the null hypothesis.

How to do the T test:


1. Calculate the test statistic:
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
𝑡𝑜𝑏𝑡 =
𝑠𝑥̅

̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
= 𝑠
√𝑁
̅̅̅̅̅̅
𝑋𝑜𝑏𝑡 − 𝜇
=
𝑆𝑆

𝑁(𝑁 − 1)
2. Find 𝑡𝑐𝑟𝑖𝑡 𝑔𝑖𝑣𝑒𝑛 𝛼, which can be either 1-tailed or 2-tailed.
If 𝛼 is two-tailed, you need to supply the ± sign for 𝑡𝑐𝑟𝑖𝑡 .
3. Find the degrees of freedom. df = N - 1
4. Evaluate the statistic
If |𝑡𝑜𝑏𝑡 | ≥|𝑡𝑐𝑟𝑖𝑡 |, reject 𝐻0 .

Conditions under which the t-test is appropriate


1. 𝜇 is specified and 𝜎 is unknown
2. Single Sample Mean (𝑋 ̅̅̅̅̅̅
𝑜𝑏𝑡 )
3. Sampling distribution of the Mean should be normally distributed.
a. N ≥ 30

4
b. Null hypothesis is normally distributed.

Size of effect using Cohen’s d


Cohen’s d determines the magnitude of the real effect. Uses the relationship between
size of real effect and size of mean difference.

|𝑚𝑒𝑎𝑛 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑐𝑒|
𝑑=
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
|𝑋𝑜𝑏𝑡 − 𝜇|
𝑑̂ =
𝑠

Value of 𝑑̂ Interpretation of 𝑑̂
0-0.2 Small effect
0.21-0.79 Medium effect
≥ 0.8 Large effect

Confidence Interval
 Range of values that probably contains the population value.
 A 95% CI is an interval such that the probability is 0.95 that the interval contains
the population value.
 WRONG: the probability is 0.95 that the population mean lies within the interval.
 The larger the interval, the more confidence we have that it contains the
population mean. That's why 99% CI is wider than 95% CI.
 95% CI:
𝜇𝑙𝑜𝑤𝑒𝑟 = (𝑋𝑜𝑏𝑡 ) − 𝑠̅̅̅̅
𝑋1 𝑡0.025
𝜇𝑢𝑝𝑝𝑒𝑟 = (𝑋𝑜𝑏𝑡 ) + 𝑠̅̅̅̅
𝑋1 𝑡0.025
 FYI: If test is one-tailed, then
95% CI is [-∞, 𝑋𝑜𝑏𝑡 + 𝑠̅̅̅̅
𝑋1 𝑡0.05 ] or [𝑋𝑜𝑏𝑡 − 𝑠̅̅̅̅
𝑋1 𝑡0.05 , ∞]

5
Pagano Chapter 14: Correlated groups T-test & Independent groups T-test.

The above are also known as two conditions experiments.

Advantages of two-sample T-tests over single sample T-test:


1. No need to specify population parameter, 𝜇, which is not always known.
2. Correct scientific methodology does not often allow an investigator to use previously
acquired population data when conducting an experiment. There may be differences in
data collection and other confounds.
3. Eliminates 3rd variable explanations. We can be sure that it was the experimental
technique, and not the extra stimulation/attention that came in conjunction with the
technique.

Correlated groups design


- Same subject used in both conditions OR Pairs of subjects are matched on
characteristics serve in two conditions.
- tests the assumption that the difference scores are a random sample from a population
of difference scores that have a mean of zero.

Correlated groups design vs Sign Test:


Sign test had lower power because it ignores the magnitude of the difference scores.
Correlated groups test uses both the magnitude and the direction of the difference scores.

̅𝑜𝑏𝑡 = 𝑥 is a random sample from a population of


Null Hypothesis: The sample with 𝐷
difference scores where 𝜇𝐷 = 0.

𝐻0 : 𝜇𝐷 = 0

Alternative Hypothesis: The sample with 𝐷̅𝑜𝑏𝑡 = 𝑥 is a random sample from a


population of difference scores where 𝜇𝐷 ≠ 0.

𝐻1 : 𝜇𝐷 ≠ 0

Calculating the t-statistic for correlated groups:


̅𝑜𝑏𝑡 − 𝜇𝐷
𝐷
𝑡𝑜𝑏𝑡 = 𝑠𝐷
√𝑁
̅𝑜𝑏𝑡 − 𝜇𝐷
𝐷
𝑡𝑜𝑏𝑡 =
𝑆𝑆𝐷

𝑁(𝑁 − 1)
2
(∑ 𝐷)2
𝑆𝑆𝐷 = ∑ 𝐷 −
𝑁
where D =each individual difference score,
̅ obt = mean of the sample difference scores
D
𝜇𝐷 = mean of the population of the difference scores = 0
𝑠𝐷 = sample standard deviation of the sample difference scores
N = number of difference scores
𝑆𝑆𝐷 = sum of squares of sample difference scores

Evaluating the statistic:


If |𝑡𝑜𝑏𝑡 | ≥|𝑡𝑐𝑟𝑖𝑡 |, reject 𝐻0 .

How to find 𝑡𝑐𝑟𝑖𝑡 :

6
Use df = N -1 with the relevant 𝛼, 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝛼 = 0.052 𝑡𝑎𝑖𝑙

Size of effect using Cohen’s d:


Since we don’t know the population standard deviation, we will use sD to estimate it.
|𝐷𝑜𝑏𝑡 |
𝑑̂ =
𝑠𝐷

|𝐷𝑜𝑏𝑡 | is the absolute value of the mean of sample difference scores.


Value of 𝑑̂ Interpretation of 𝑑̂
0-0.2 Small effect
0.21-0.79 Medium effect
≥ 0.8 Large effect

T-test for Correlated groups vs Sign Test:


 Sign test has lower power, meaning there is higher chance of making a Type II
error. (i.e. retaining 𝐻0 when 𝐻0 𝑖𝑠 𝑓𝑎𝑙𝑠𝑒)
 We want to use the more powerful test. So all else equal, we tend to use the t-test
for correlated groups.

Assumptions underlying the t-test for correlated groups:


1. Sampling distribution of 𝐷 is normally distributed.
2. This means that N should be ≥ 30, assuming that population shape doesn’t differ
greatly from normality OR population scores are known to be normally distributed.

Independent groups design

 Subjects are randomly selected from the population


 Subjects are randomly assigned to either the experimental or control condition.
 Between-subject design
 The t-test for independent groups computes the mean of each group and analyses
the difference between these two group means to determine whether chance alone
is a reasonable explanation of the difference between the two means.

Null Hypothesis:
𝐻0 : 𝜇1 = 𝜇2

Alternative Hypothesis:

𝐻1 : 𝜇1 ≠ 𝜇2

Sampling distribution of the difference between sample means:


1. If the populations from which the samples taken are normal, then the sampling
distribution of the difference between sample means is also normal.
2. 𝜇̅̅̅̅ ̅̅̅̅
𝑋1 −𝑋 2
= 𝜇1 − 𝜇2
where 𝜇̅̅̅̅ ̅̅̅̅
𝑋1 −𝑋 2
is the mean of the sampling distribution of the difference between
sample means
3. 𝜎̅̅̅̅
𝑋1 −𝑋̅̅̅̅
2
= √𝜎̅̅̅̅ 2
𝑋1 + 𝜎̅̅̅̅
𝑋2
2

where 𝜎̅̅̅̅ ̅̅̅̅


𝑋1 −𝑋 2
is the standard deviation of the sampling distribution of the
difference between sample means.

7
𝑋1 = variance of the sampling distribution of the mean for samples of size
2
𝜎̅̅̅̅
𝑛1 taken from the first population.
𝑋2 = variance of the sampling distribution of the mean for samples of size
2
𝜎̅̅̅̅
𝑛2 taken from the first population.

Assuming the equality of variances,


𝜎̅̅̅̅ = √𝜎̅̅̅̅ 2 2
𝑋1 −𝑋̅̅̅̅
2 𝑋1 + 𝜎̅̅̅̅
𝑋2

𝜎1 2 𝜎2 2
= √ +
𝑛1 𝑛2

1 1
= √𝜎 2 ( + )
𝑛1 𝑛2
We can estimate 𝜎 2 𝑤𝑖𝑡ℎ 𝑠𝑤 2 , a weighted average of the sample variances 𝑠1 2 and 𝑠2 2 .
Weighting is done using degrees of freedom as the weights.

𝑆𝑆1 + 𝑆𝑆2
𝑠𝑤 2 =
𝑛1 + 𝑛2 − 2
Evaluating the test statistic:

𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 =
1 1
√𝑠𝑤 2 ( + )
𝑛1 𝑛2
𝑋̅1 −𝑋̅2
= 𝑆𝑆1 + 𝑆𝑆2 1 1
√𝑛 +𝑛 ( + )
1 2 −2 𝑛1 𝑛2

2
(∑ 𝑆𝑆1 )
Recall that 𝑆𝑆1 = ∑ 𝑆𝑆1 2 − 𝑛1

How to find 𝑡𝑐𝑟𝑖𝑡 :


Use df = N -2 (since 1 degree of freedom is lost each time a standard deviation is
calculated) with the relevant 𝛼, 𝑢𝑠𝑢𝑎𝑙𝑙𝑦 𝛼 = 0.052 𝑡𝑎𝑖𝑙

Note: The t distribution varies both with N and degrees of freedom, but it varies
uniquely only with degrees of freedom. Hence, the t distribution corresponding to 13 df
is the same whether it is derived from the single sample situation with N = 14 or in the
two-sample situation with N = 15.

Special case for when 𝑛1 = 𝑛2 ,


𝑋̅1 −𝑋̅2
𝑡𝑜𝑏𝑡 = 𝑆𝑆1 + 𝑆𝑆2
√ 𝑛(𝑛−1)

Assumptions underlying the t-test for independent groups:


1. The sampling distribution of 𝑋̅1 − 𝑋̅2 is normally distributed. This means the
population from which samples were taken should be normally distributed.
2. There is homogeneity of variance. The t-test assumes that the IV affects the means of
the population but not their standard deviation. Hence, the t test for independent groups
also assumes that the variances of the two populations are equal.

What if the assumptions of the t-test are violated:

8
 the t-test is a robust test, meaning that it is relatively insensitive to violations of its
underlying mathematical assumptions.
 The t-test is relatively insensitive to normality and homogeneity of variance
assumption.
 Depending on sample size and the type and magnitude of the violation.
 If 𝑛1 = 𝑛2 and the size of each sample is ≥ 30 , the t test for independent groups
may be used without appreciable error despite moderate violation of the normality
and/or the homogeneity of variance assumptions.

Size of effect using Cohen’s d:


Since we don’t know the population standard deviation, we will use sw to estimate it.
|𝑋̅1 − 𝑋̅2 |
𝑑̂ =
√𝑠𝑤 2

̂
Value of 𝒅 ̂
Interpretation of 𝒅
0-0.2 Small effect
0.21-0.79 Medium effect
≥ 0.8 Large effect

Power of the t Test


The greater the effect of the independent variable, the higher the power of the t test.
Hence, we have to use the level of IV that the experimenter believes is the most effective
to maximise the chances of detecting its effect.
Also, given meagre resources, if the IV has a large enough effect, it would still be
powerful enough to detect the effect even if sample size is small.
It is easy to see that the larger 𝑡𝑜𝑏𝑡 𝑖𝑠, the further away |𝑡𝑜𝑏𝑡 | would be from |𝑡𝑐𝑟𝑖𝑡 |,
making the t test more powerful.
 Increasing sample size increases the power of the t test
 High sample variability decreases power

Comparing the power of the correlated groups t test and the independent groups t test:

Correlated groups t-test Independent groups t-test


̅𝑜𝑏𝑡 − 𝜇𝐷
𝐷 𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 = 𝑡𝑜𝑏𝑡 =
𝑆𝑆𝐷 𝑆𝑆 + 𝑆𝑆2
√ √ 1
𝑁(𝑁 − 1) 𝑛(𝑛 − 1)

Since 𝑆𝑆𝐷 measures the variability of the difference scores, it is much smaller than 𝑆𝑆1 +
𝑆𝑆2 which are measures of the variability of the raw scores. Hence, generally, the
correlated groups t-test is more powerful.

However, the independent groups design is more efficient from a df perspective (i.e. you
can get more degrees of freedom for the same sample size.) Since as df increases, the
lower the 𝑡𝑐𝑟𝑖𝑡 , the easier it is to reject 𝐻0 .
Also, there are situations in which you can’t use correlated groups design. For example,
if your IV is men vs women, or if the effect of the first condition persists for too long, or
in learning experiments (where learning is irreversible). Experimenter also doesn’t
know which variables are important for matching so as to produce a higher correlation.

Confidence Interval

95% CI:

9
𝜇𝑙𝑜𝑤𝑒𝑟 = (𝑋̅1 − 𝑋̅2 ) − 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅ 𝑡
2 0.025
̅ ̅
𝜇𝑢𝑝𝑝𝑒𝑟 = (𝑋1 − 𝑋2 ) + 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅ 𝑡
2 0.025

𝑆𝑆1 + 𝑆𝑆2 1 1
where 𝑠̅̅̅̅
𝑋1 −𝑋̅̅̅̅
2
= √𝑛 ( +𝑛 )
1 +𝑛2 −2 𝑛1 2
Interpretation: We are 95% confident that the range of 𝜇𝑙𝑜𝑤𝑒𝑟 - 𝜇𝑢𝑝𝑝𝑒𝑟 contains the real
effect of the IV. If so, then the real effect of the IV is to cause
𝜇𝑙𝑜𝑤𝑒𝑟 - 𝜇𝑢𝑝𝑝𝑒𝑟 more matings than the placebo.
If CI contains 0 , then the effect is insignificant.

Pagano Chapter 15: The One-Way ANOVA

With t-test and z-test, we have been using the mean as the basic statistic for evaluating
the null hypothesis.

However, it is also possible to use the variance of the data for hypothesis testing. For
this, we use the F-test.

Generating the Sampling distribution of F:


1. Taking all possible samples of size 𝑛1 𝑎𝑛𝑑 𝑛2 from the same population.
2. Estimating the population variance using sample variances 𝑠1 2and 𝑠2 2 .
3. Calculating 𝐹𝑜𝑏𝑡 for all possible combinations of 𝑠1 2and 𝑠2 2.
4. Calculating p(F) for each different value of 𝐹𝑜𝑏𝑡 .

Hence, the sampling distribution of F gives all the possible F values along with the p(F) for
each value, assuming sampling is random from the population.

Generally,
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 1 𝑜𝑓 𝜎 2
𝐹𝑜𝑏𝑡 =
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 2 𝑜𝑓 𝜎 2
and has one value of df for numerator and denominator each. 𝑑𝑓1 = 𝑛1 − 1, 𝑑𝑓2 = 𝑛2 − 1

Several characteristics of the F-distribution:


 F-state is never negative. It is a ratio of variance estimates, which are non-negative.
 F-distribution is positively skewed.
 Median F-value ≈1.
 Unlike 𝑡𝑐𝑟𝑖𝑡 values, 𝐹𝑐𝑟𝑖𝑡 values are one tailed.

F test is appropriate when:

 There are more than 2 levels of IV.


 Using multiple t test comparisons would increase the probability of Type 1 error.
(We can be wrong 5% of the time in the first pairwise comparison, and we can also
be wrong 5% of the time in the second pairwise comparison, so total type 1 error
in our experiment is more than 5%)

Overview of the One-way ANOVA:


 F-test allows us to make one overall comparison that tells us whether there is a
significant difference between any of the means of the groups.
 Used in both independent groups and repeated measures designs.
 One way ANOVA, independent groups design = simple randomized group design=
single factor experiment, independent groups design

10
 Null Hypothesis: The different conditions are all equally effective; The scores in
each group are random samples from populations with the same mean value.
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3 … = 𝜇𝑘
 Alternative hypothesis: At least one situation affects the DV differently from other
situations. Samples are random samples from populations where not all population
means are equal. (non-directional)
 Note: the ANOVA assumes that variance of the populations from which the samples
are taken are equal. 𝜎1 2 = 𝜎2 2 = ⋯ = 𝜎𝑘 2

𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 + 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛

𝐵𝑒𝑡𝑤𝑒𝑒𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛


𝐹𝑜𝑏𝑡 = =
𝑊𝑖𝑡ℎ𝑖𝑛 𝑔𝑟𝑜𝑢𝑝𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛

𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 increases with the magnitude of the IV’s effect, whereas the 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 is
unaffected. Thus, the larger the F-ratio, the more unreasonable the null hypothesis
becomes.

If 𝐹𝑜𝑏𝑡 ≥ 𝐹𝑐𝑟𝑖𝑡 , reject 𝐻0 . Otherwise, retain 𝐻0 .

𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑠𝑤 2 for a two-group experiment. Recall that 𝑠𝑤 2 is a weighted average of


𝑠1 2 and 𝑠2 2 .

𝑆𝑆1 + 𝑆𝑆2
𝑠𝑤 2 =
𝑛1 + 𝑛2 − 2

𝑆𝑆1 + 𝑆𝑆2 + … + 𝑆𝑆𝑘


𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 =
𝑛1 + 𝑛2 + ⋯ + 𝑛𝑘 − 𝑘

𝑆𝑆1 + 𝑆𝑆2 + … + 𝑆𝑆𝑘


=
𝑁−𝑘
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
=
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛

𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 can be computed as


𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠
(∑ 𝑋1 )2 (∑ 𝑋2 )2 (∑ 𝑋3 )2 (∑ 𝑋𝑘 )2
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = ∑ 𝑋2 − [ + + …+ ]
𝑛1 𝑛2 𝑛3 𝑛𝑘

Looking at 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 :
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = estimate of 𝜎 2
= 𝑛𝜎𝑋 2
which can be estimated as

𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑛𝑠𝑋 2

11
∑(𝑋−𝑋𝐺 ) 2
=𝑛[ 𝑘−1
]

𝑆𝑆
= 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑏𝑒𝑡𝑤𝑒𝑒𝑛

2
(∑ 𝑋1 )2 (∑ 𝑋2 )2 (∑ 𝑋3 )2 (∑ 𝑋𝑘 )2 (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = [ + + …+ ]−
𝑛1 𝑛2 𝑛3 𝑛𝑘 𝑁

𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝜎 2 + 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒 𝑒𝑓𝑓𝑒𝑐𝑡𝑠


𝐹𝑜𝑏𝑡 = =
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 𝜎2
We note that F ratio increases with the effect of the IV. Another way of saying this is that
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 is really an estimate of 𝜎 2 plus the effects of the independent variable,
whereas 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 is just an estimate of 𝜎 2 . This is because 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 provides us with an
estimate of the inherent variability of the scores themselves.
The larger 𝐹𝑜𝑏𝑡 becomes, the more reasonable it is that the IV has had a real effect. If
𝐹𝑜𝑏𝑡 is less than or equal to 1, it is obvious that the treatment has not had a significant
effect.

Degrees of freedom:
The critical value is obtained using a set of two dfs, one in the numerator and one in the
denominator.
𝑑𝑓𝑛𝑢𝑚𝑒𝑟𝑎𝑡𝑜𝑟 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
=
𝑑𝑓𝑑𝑒𝑛𝑜𝑚𝑖𝑛𝑎𝑡𝑜𝑟 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛

𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 + 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑑𝑓𝑡𝑜𝑡𝑎𝑙

Steps to solving with one-way ANOVA:


1. State null hypothesis and alternative hypothesis
2. Calculate 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
3. Calculate 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
4. Calculate 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 . Check if 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 + 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 . (optional)
5. Calculate degrees of freedom.
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑁 − 𝑘

𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑘 − 1

𝑑𝑓𝑡𝑜𝑡𝑎𝑙 = 𝑁 − 1
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛
6. Calculate 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛

𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
7. Calculate 𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛 = 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
8. Calculate 𝐹𝑜𝑏𝑡 = 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛
9. Evaluate 𝐹𝑜𝑏𝑡 . Find 𝐹𝑐𝑟𝑖𝑡 using 𝛼 = 0.05, 𝑑𝑓𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑎𝑛𝑑 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 .
If 𝐹𝑜𝑏𝑡 ≥ 𝐹𝑐𝑟𝑖𝑡 , reject 𝐻0 . Otherwise, retain 𝐻0 .

Relationship between ANOVA and t-test for independent groups:


When there are only 2 conditions, both can be used. In fact, it can be shown algebraically
that 𝑡 2 = 𝐹.

Assumptions underlying the Analysis of Variance:


1. The populations from which the samples were taken are normally distributed.

12
2. The samples are drawn from populations of equal variances. (Homogeneity of
variance assumption.)
Note: Like the t test, the ANOVA is a robust test. It is minimally affected by violations of
population normality. It is also relatively insensitive to violations of homogeneity of
variance, provided the samples are of equal size.

Size of effect:
We can use Eta-squared (𝜂 2 ) to determine size of effect in a one-way, independent
groups. It is conceptually very similar to 𝑅 2, as it also provides an estimate of the
proportion of the total variability of Y that is accounted for by X.

Disadvantages of 𝜂 2 :
 More biased estimate as compared to omega squared.
 Biased estimate is larger than true size of the effect.

How to calculate 𝜂 2 :

𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
𝜂2 =
𝑆𝑆𝑡𝑜𝑡𝑎𝑙

Intepretation: e.g. if 𝜂 2 is 0.79. The situations (provided by change in level of IV) account
for 79% of the variance in Y.

Cohen’s criteria:
𝜼𝟐 (proportion of variance accounted for) Interpretation
0.01-0.05 Small effect
0.06-0.13 Medium effect
≥0.14 Large effect

Power of the ANOVA:


1. Increases in N and n result in increased power.
2. The larger the real effect of the IV, the higher the power.
3. Increases in within-group variability result in decreases in power.
* Same as t test.

Multiple Comparisons
A significant F value tells us that at least one condition differs from at least one of the
others. In addition, we are also interested in determining which of the conditions differ
from each other. Hence, we need to make multiple comparisons between pairs of group
means.

A Priori or Planned Comparisons:


 Planned in advanced of the experiment, based on predictions based on theory and
prior research
 May be directional or non-directional
 We do not correct for the higher probability of Type 1 error that arises due to
multiple comparisons. (unlike post hoc)

We employ the t test for independent groups. So when comparing conditions 1 and 2,
we use the following equation:

13
𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 =
1 1
√𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 ( + )
𝑛1 𝑛2
We use 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 instead of 𝑠𝑤 2 as it is a better estimate with 3 or more groups instead
of just 2.

When 𝑛1 = 𝑛2 ,the general t equation can be simplified to:


𝑋̅1 − 𝑋̅2
𝑡𝑜𝑏𝑡 =
√2𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 /𝑛
To find the 𝑡𝑐𝑟𝑖𝑡, we use the degrees of freedom for 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 . Hence, 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛 = 𝑁 − 𝑘

A Posteriori or Post Hoc Comparisons:


 Investigator decides which groups to compare after looking at the data.
 Need to correct for inflated Type 1 error as comparisons are unplanned.

Tukey HSD:
One of the methods used for post hoc comparisons.
 Maintains Type 1 error rate at 𝛼. However, the Tukey HSD maintains the Type 1
error at 𝛼 when controlling for all possible comparisons between pairs of means.
(vs controlling for all possible comparisons in the Scheffe Test)
 Uses the Q-statistic: Studentized range distribution.

𝑋̅𝑖 − 𝑋̅𝑗
𝑄𝑜𝑏𝑡 =
√𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 /𝑛
where X ̅ i is the larger of the two means being compared.
̅ j is the smaller of the two means being compared.
X
 Since the smaller means is always subtracted from the larger mean, 𝑄𝑜𝑏𝑡 is always
positive.
 To get 𝑄𝑐𝑟𝑖𝑡, we must know the df, k and the alpha level. The degrees of freedom are
associated with 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 .

Decision rule:
If 𝑄𝑜𝑏𝑡 ≥ 𝑄𝑐𝑟𝑖𝑡 , reject 𝐻0 . Otherwise, retain 𝐻0 .

Additional notes on planned comparisons vs post hoc comparisons:


 Planned comparisons are more powerful than post-hoc comparisons.
 Planned comparisons can be directional, but post hoc comparisons must be
nondirectional.
 Planned comparisons should be relatively few and should flow meaningfully and
logically from the experimental design.

Chapter 16 2-way ANOVA


The two-way analysis of variance allows us in one experiment to evaluate the effect of
two IV and the interaction between them.
- so a 2-way ANOVA is essentially two 1-way ANOVAs plus the interaction between the
two independent variables.

Factorial experiment:
Main effect: The effect of factor A and the effect of factor B are called main effects.
Interaction effect: The effect of one factor is not the same at all levels of the other factor.

14
Variable B1 (light) Variable B2 (heavy)
Variable A1 (morning) A1B1 A1B2
Variable A2 (evening) A2B1 A2B2

To measure if the main effect of Variable A is significant:


𝑀𝑆𝑟𝑜𝑤𝑠
We calculate 𝐹𝑜𝑏𝑡 = 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠

To measure if the main effect of Variable B is significant:


𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠
We calculate 𝐹𝑜𝑏𝑡 = 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠

To measure if the interaction effect is significant:


𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
We calculate 𝐹𝑜𝑏𝑡 =
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠

Recall that from the One-way ANOVA, we can partition the total sum of squares into:

𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 + 𝑆𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛

Same logic for the 2-way ANOVA, we can partition the 𝑆𝑆𝑡𝑜𝑡𝑎𝑙 :

𝑆𝑆𝑡𝑜𝑡𝑎𝑙 = 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛 𝑐𝑒𝑙𝑙𝑠 + 𝑆𝑆𝑟𝑜𝑤𝑠 + 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 + 𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛

𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 𝑐𝑒𝑙𝑙 , the within-cells variance estimate


 Corresponds to 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛 in a one-way ANOVA.
 Independent estimate of 𝜎 2
 Measure of the inherent variability of the scores within each cell, hence it is
unaffected by the effects of factor A and B.

𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 =
𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
Conceptually,
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = 𝑆𝑆11 + 𝑆𝑆12 + ⋯ 𝑆𝑆𝑟𝑐
Computationally,
𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 2 2 2
2
(∑𝑐𝑒𝑙𝑙 11 𝑋) + (∑𝑐𝑒𝑙𝑙 12 𝑋) + ⋯ + (∑𝑐𝑒𝑙𝑙 𝑟𝑐 𝑋)
𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = ∑ 𝑋 − [ ]
𝑛𝑐𝑒𝑙𝑙

𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠 = 𝑟𝑐(𝑛 − 1)

𝑀𝑆𝑟𝑜𝑤𝑠 , the row variance estimate


 Based on the differences between the row means
 Analogous to the between-groups variance estimate (𝑀𝑆𝑏𝑒𝑡𝑤𝑒𝑒𝑛
 It is an estimate of 𝜎 2 plus the effect of factor A.

𝑆𝑆𝑟𝑜𝑤𝑠
𝑀𝑆𝑟𝑜𝑤𝑠 =
𝑑𝑓𝑟𝑜𝑤𝑠

15
Computational equation of the row of sum of squares:

2 2 2
(∑𝑟𝑜𝑤 1 𝑋) + (∑𝑟𝑜𝑤 2 𝑋) + ⋯ + (∑𝑟𝑜𝑤 𝑟 𝑋)2 (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑟𝑜𝑤𝑠 =[ ]−
𝑛𝑟𝑜𝑤 𝑁

𝑑𝑓𝑟𝑜𝑤𝑠 = 𝑟 − 1 (row degrees of freedom)

𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 , the column variance estimate


 The column variance estimates is an estimate of 𝜎 2 plus the effects of factor B.
 If the levels of factor B have no differential effect, then the population column means
are equal (𝜇𝑏1 = 𝜇𝑏2 = 𝜇𝑏3 = ⋯ = 𝜇𝑏𝑐 ). Then 𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 would be an estimate of 𝜎 2
alone.

𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠
𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 =
𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠
2 2 2 2
(∑𝑐𝑜𝑙𝑢𝑚𝑛 1 𝑋) + (∑𝑐𝑜𝑙𝑢𝑚𝑛 2 𝑋 ) + ⋯ + (∑𝑐𝑜𝑙𝑢𝑚𝑛 𝑟 𝑋) (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠 =[ ]−
𝑛𝑐𝑜𝑙𝑢𝑚𝑛 𝑁

𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠 = 𝑐 − 1 (column degrees of freedom)

𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 , the interaction variance estimate

 An interaction exists when the effect of the combined action of the variables is
different from that which would be predicted by the individual effects of the
variables.
 It is an estimate of 𝜎 2 plus the interaction of A and B
 If there is no interaction after any main effects are removed, then the population cell
means are equal and differences among cell means must be due to random sampling
from identical populations. In this case, 𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 would be an estimate of 𝜎 2
alone.

𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 =
𝑑𝑓𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
2 2 2 2
(∑𝑐𝑒𝑙𝑙 11 𝑋) + (∑𝑐𝑒𝑙𝑙 12 𝑋) + ⋯ + (∑𝑐𝑒𝑙𝑙 𝑟𝑐 𝑋) (∑𝑎𝑙𝑙 𝑠𝑐𝑜𝑟𝑒𝑠 𝑋)
𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 =[ ]−
𝑛𝑐𝑒𝑙𝑙 𝑁
− 𝑆𝑆𝑟𝑜𝑤𝑠 − 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠

𝑑𝑓𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛 = (𝑟 − 1)(𝑐 − 1)

Solving a problem with 2-way ANOVA:


1. Null Hypothesis:
Main effect: The time of day when exercise is done does not affect nighttime sleep. The
population row means for morning and evening exercise averaged over the different
levels of exercise intensity are equal. (𝜇𝑎1 = 𝜇𝑎2 = 𝜇𝑎3 )

16
Main effect: The different levels of exercise intensity have the same effect on nighttime
sleep. The population column means for averaged over the different levels of exercise
intensity are equal. (𝜇𝑏1 = 𝜇𝑏2 = 𝜇𝑏3 )
Interaction effect: There is no interaction between time of day and exercise intensity.
With any main effects removed, the population cell means are equal.
(𝜇𝑎1𝑏1 = 𝜇𝑎1𝑏2 = 𝜇𝑎1𝑏3 = 𝜇𝑎2𝑏1 = 𝜇𝑎2𝑏2 = 𝜇𝑎2𝑏3 )

2. Calculate the row sum of squares, 𝑆𝑆𝑟𝑜𝑤𝑠


3. Calculate the column sum of squares, 𝑆𝑆𝑐𝑜𝑙𝑢𝑚𝑛
4. Calculate the interaction sum of squares, 𝑆𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛
5. Calculate the within sum of squares, 𝑆𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
6. Calculate the total sum of squares, 𝑆𝑆𝑡𝑜𝑡𝑎𝑙
7. Calculate the degrees of freedom for each variance estimate:
𝑑𝑓𝑟𝑜𝑤𝑠, 𝑑𝑓𝑐𝑜𝑙𝑢𝑚𝑛𝑠 , 𝑑𝑓𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛, 𝑑𝑓𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠
check against 𝑑𝑓𝑡𝑜𝑡𝑎𝑙

8. Calculate the variance estimates 𝑀𝑆𝑟𝑜𝑤𝑠, 𝑀𝑆𝑐𝑜𝑙𝑢𝑚𝑛𝑠, 𝑀𝑆𝑖𝑛𝑡𝑒𝑟𝑎𝑐𝑡𝑖𝑜𝑛, 𝑀𝑆𝑤𝑖𝑡ℎ𝑖𝑛−𝑐𝑒𝑙𝑙𝑠,

9. Calculate the F ratio for row effect, column effect and interaction effect.
10. Evaluate the 𝐹𝑜𝑏𝑡 values. If 𝐹𝑜𝑏𝑡 > 𝐹𝑐𝑟𝑖𝑡 , then we reject null hypothesis.

Note: If there is a significant interaction effect, caution must be exercised in interpreting


main effects. If there is a significant interaction effect, a mean effect might be due solely
the independent variable having an effect at only one level of the other variable and no
effect at the other levels.

To resolve this issue, it is necessary to statistically compare mean difference at each


level of the other variable. We evaluate pairs of row means or column means (multiple
comparisons).

Assumptions underlying the two-way ANOVA:


1. The populations from which the samples were taken are normally distributed.
2. The population variances for each of the cells are equal. This is the homogeneity of
variance assumption.
*The two-way ANOVA is robust wrt the violation of these assumptions, provided the
samples are of equal size.

Chapter 6 and 7: Correlation and Linear Regression

Correlation: Finding out whether a relationship exists and determining its magnitude
and direction.

Regression: Using the relationship for prediction.

Uses of correlation:
 For prediction purposes
 First step towards proving that two variables are casually related.
 “test-retest reliability”

Correlation coefficient:
 Also known as ‘r’, expresses quantitatively the magnitude and direction of the
relationship

17
 How to tell if a linear relationship has high r? Look at the scatterplot. The closer
the points are to the regression line, the higher the magnitude of the correlation
coefficient and the more accurate the prediction.
 The X and Y values can be transformed into standardised Z scores. If the paired
scores have the same z value, then they occupy the same relative position within
their own distributions, then the correlation is perfect (r=1)
 Pearson r is a measure of the extent to which paired scores occupy the same or
opposite positions within their own distributions.
 Only calculate r where the data are of interval or ratio scaling

Second interpretation for r:


Pearson r can also be interpreted in terms of the variability of Y accounted for by X.

∑(𝑌𝑖 − 𝑌̅)2 = ∑(𝑌𝑖 − 𝑌 ′ )2 + ∑(𝑌𝑖 − 𝑌̅)2


Total variability of Y = Variability of Prediction Errors + Variability of Y accounted for by X

∑(𝑌𝑖 − 𝑌̅)2 𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑌 𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑒𝑑 𝑓𝑜𝑟 𝑏𝑦 𝑋


𝑟= √ = √
∑(𝑌𝑖 − 𝑌̅) 2 𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑌
= √𝑃𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑌 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑒𝑑 𝑓𝑜𝑟 𝑏𝑦 𝑋

𝑅 2 = 𝑝𝑟𝑜𝑝𝑜𝑟𝑡𝑖𝑜𝑛 𝑜𝑓 𝑡ℎ𝑒 𝑡𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑡𝑦 𝑜𝑓 𝑌 𝑡ℎ𝑎𝑡 𝑖𝑠 𝑎𝑐𝑐𝑜𝑢𝑛𝑡𝑒𝑑 𝑓𝑜𝑟 𝑏𝑦 𝑋

Other notes on correlation


 Effect of range restriction on correlation
o Restricting the range of either of the variables will have the effect of
lowering the correlation.
o One should check to be sure that range restriction is not responsible for
low value of correlation.
 Effect of Extreme scores
o Extreme scores can drastically alter the magnitude of the r
 Correlation does not imply causation. When two variables are correlated,
there are 4 possible explanations:
o Correlation between X and Y is spurious
o X causes Y
o Y causes X (reverse casuality)
o A third variable is the cause of the correlation between X and Y

How to draw the best-fit line?


The least squares regression line is the prediction line that minimizes the total error of
prediction, according to the least squares criterion of ∑(𝑌 − 𝑌 ′ )2 .

Constructing the Least-squares regression line:


𝑌 ′ = 𝑏𝑌 𝑋 + 𝑎𝑌
Step 1: Calculate 𝑏𝑌
Step 2: Calculate 𝑎𝑌 , 𝑤ℎ𝑒𝑟𝑒 𝑎𝑌 = 𝑌̅ − 𝑏𝑌 𝑋̅

Quantifying Prediction Errors: Standard Error of Estimate


Standard error of estimate gives us a measure of the average deviation of the prediction
errors about the regression line.

18
∑(𝑌 − 𝑌 ′ )2
𝑠𝑌|𝑋 = √
𝑁−2
We divide by N-2 because calculate of the S.E involves fitting the data to a straight line.
To do so requires estimation of two parameters, slope and intercept, leaving the
deviations about the line with N-2 degrees of freedom.

 For S.E to be meaningful, we must assume that the variability of Y remains


constant as we go from one X score to the next. (Homoskedasticity)
 Graphically 68% of the points will fall within ±1SE of the regression line.

Considerations:
 Basic computation group must be representative of the prediction group. (Can
be achieved through random sampling of the population)
 Linear regression equation is properly used just for the range of the variable for
which the relationship was computed. E.g. GPA on IQ scores was computed for
110-138.  cannot use to predict GPA of 140.

Chapter 17: Chi-squared and Other non-parametric tests

Parametric vs non-parametric tests:

Parametric Non-parametric
Depends considerably on population Does not depend on knowing population
characteristics, or parameters for use. distributions Distribution-free tests
E.g. Z test requires us to specify the mean E.g. Chi-squared and sign tests
and SD of the null-hypothesis population,
as well as requiring that population scores
must be normally distributed when N is
small.
More powerful than non-parametric tests.
E.g sign test has less power compared to t-
test for correlated groups.
Always use parametric tests if the data
meet the assumptions of the test.
Good for ordinal, interval or ratio data Good for nominal (categorical) data.
Can test 3, 4 or more variables and their No comparable technique exists for non-
interactions parametric.

Conducting the chi-squared test:

Step 1: Calculated expected frequency for each cell (𝑓𝑒 )


(𝑓0 −𝑓𝑒 )2
Step 2: Calculate 𝜒 2 𝑜𝑏𝑡 = ∑ 𝑓𝑒
Step 3: Obtain 𝜒 2 𝑐𝑟𝑖𝑡 where df = k -1, where k equals the number of groups
Step 4: Decision rule: If 𝜒 2 𝑜𝑏𝑡 ≥ 𝜒 2 𝑐𝑟𝑖𝑡 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 .

Note: Since the direction of the difference doesn’t matter, because we have already
squared the difference, chi-squared test is a non-directional test.

The chi-squared test can also be conducted to determine whether two categorical
variables are independent or related.

19
Null hypothesis: Variable A and Variable B are independent.

Step 1: Calculated expected frequency for each cell (𝑓𝑒 ). If we do not know the
population proportions, we can estimate them from the sample. You can do this by
multiplying the marginals for that cell (row total and column total) and dividing by N.
(𝑓0 −𝑓𝑒 )2
Step 2: Calculate 𝜒 2 𝑜𝑏𝑡 = ∑ 𝑓𝑒
Step 3: Obtain 𝜒 2 𝑐𝑟𝑖𝑡 where df = (r-1)(c-1) for contingency tables
Step 4: Decision rule: If 𝜒 2 𝑜𝑏𝑡 ≥ 𝜒 2 𝑐𝑟𝑖𝑡 , 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 .

Conclusion: (if 𝐻0 𝑟𝑒𝑗𝑒𝑐𝑡𝑒𝑑) Variable A and Variable B are related.

Assumptions underlying 𝜒 2 :
1. There is independence between each observation recorded in the contingency
table. (i.e. each subject can have only one entry in the table)
2. The expected frequency in each cell is at least 5 where r or c is greater than 2.
The expected frequency in each cell is at least 10 if both r and c are smaller than
2. (if not, should use Fisher’s exact probability test)
Note: Chi-squared can be used with ordinal, interval and ratio data. But it must be
reduced to mutually exclusive categories.

20