= Z
p=.10
Confidence interval
Smoker (E)
Nonsmoker
(~E)
Stroke (D)
15 35
No Stroke (~D)
8 42
50
50
92 . 5 , 85 . 0 , CI % 95
78 . 1 , 16 . 0 494 . 0 * 96 . 1 81 . 0 ln CI % 95
78 . 1 16 .
= =
= =
e e OR
OR
Final answer: 2.25 (0.85,5.92)
Practice problem:
Suppose the following data were collected in a casecontrol study of brain tumor and
cell phone usage:
Brain tumor
No brain
tumor
Own a cell
phone
20
60
Dont own a
cell phone
10
40
Is there sufficient evidence for an association between cell phones and brain tumor?
Answer
1. What is your null hypothesis?
Null hypothesis: OR=1.0; lnOR = 0
Alternative hypothesis: OR= 1.0; lnOR>0
2. What is your null distribution?
lnOR~ N(0, ) ; =SD (lnOR) = .44
3. Empirical evidence: = 20*40/60*10 =800/600 = 1.33
lnOR = .288
4. Z = (.2880)/.44 = .65
pvalue = P(Z>.65 or Z<.65) = .26*2
5. Not enough evidence to reject the null hypothesis of no association
40
1
60
1
20
1
10
1
+ + +
40
1
60
1
20
1
10
1
+ + +
TWOSIDED TEST
TWOSIDED TEST: it
would be just as extreme
if the sample lnOR were
.65 standard deviations
or more below the null
mean
Key measures of relative risk:
95% CIs OR and RR:
(
(
+ + + +
(
(
+ + +
d c b a d c b a
1 1 1 1
96 . 1
1 1 1 1
96 . 1
exp * OR , exp * OR
(
(
+
+
+
+
(
(
+
+
+
c
d c c
a
b a a
c
d c c
a
b a a ) /( 1 ) /( 1
96 . 1
) /( 1 ) /( 1
96 . 1
exp * RR , exp * RR
For an odds ratio, 95% confidence limits:
For a risk ratio, 95% confidence limits:
Continuous outcome (means)
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent
groups
ANOVA: compares means
between more than two
independent groups
Pearsons correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
multivariate regression technique
used when the outcome is
continuous; gives slopes
Paired ttest: compares means
between two related groups (e.g.,
the same subjects before and
after)
Repeatedmeasures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two
or more groups; gives rate of
change over time
Nonparametric statistics
Wilcoxon signrank test:
nonparametric alternative to the
paired ttest
Wilcoxon sumrank test
(=MannWhitney U test): non
parametric alternative to the ttest
KruskalWallis test: non
parametric alternative to ANOVA
Spearman rank correlation
coefficient: nonparametric
alternative to Pearsons correlation
coefficient
The twosample ttest
The twosample Ttest
Is the difference in means that we
observe between two groups more than
wed expect to see based on chance
alone?
The standard error of the
difference of two means
**First add the variances and then take the square
root of the sum to get the standard error.
m n
y
x
y x
2
2
o
o
o + =
o
Just plug in the sample
standard deviations for each
group.
Case 1: unpooled variance
Question: What are your degrees of freedom here?
Answer: Not obvious!
Case 1: ttest, unpooled
variances
It is complicated to figure out the degrees of freedom here! A good
approximation is given as df harmonic mean (or SAS will tell you!):
v
t
m
s
n
s
Y X
T
y
x
m n
~
2
2
+
=
m n
1 1
2
+
Case 2: pooled variance
If you assume that the standard deviation of the characteristic
(e.g., IQ) is the same in both groups, you can pool all the data
to estimate a common standard deviation. This maximizes your
degrees of freedom (and thus your power).
2
) ( ) (
) ( ) 1 ( and
1
) (
) ( ) 1 ( and
1
) (
: variances pooling
1
2
1
2
2
1
2 2 1
2
2
1
2 2 1
2
2
+
+
=
=
=
=
= =
=
=
=
=
m n
y y x x
s
y y s m
m
y y
s
x x s n
n
x x
s
m
i
m i
n
i
n i
p
m
i
m i y
m
i
m i
y
n
i
n i x
n
i
n i
x
2
) 1 ( ) 1 (
2 2
2
+
+
=
m n
s m s n
s
y x
p
Degrees of Freedom!
Estimated standard error
(using pooled variance estimate)
m
s
n
s
p p
y x
2 2
+ ~
o
2
) ( ) (
:
1
2
1
2
2
+
+
=
= =
m n
y y x x
s
where
m
i
m i
n
i
n i
p
The degrees
of freedom
are n+m2
Case 2: ttest, pooled variances
2
2 2
~
+
+
=
m n
p p
m n
t
m
s
n
s
Y X
T
2
) 1 ( ) 1 (
2 2
2
+
+
=
m n
s m s n
s
y x
p
Alternate calculation formula:
ttest, pooled variance
2
~
+
+
=
m n
p
m n
t
mn
n m
s
Y X
T
) ( ) ( )
1 1
(
2 2
2 2
mn
m n
s
mn
m
mn
n
s
n m
s
n
s
m
s
p p p
p p
+
= + = + = +
Pooled vs. unpooled variance
Rule of Thumb: Use pooled unless you have a
reason not to.
Pooled gives you more degrees of freedom.
Pooled has extra assumption: variances are
equal between the two groups.
SAS automatically tests this assumption for you
(Equality of Variances test). If p<.05, this
suggests unequal variances, and better to use
unpooled ttest.
Example: twosample ttest
In 1980, some researchers reported that
men have more mathematical ability than
women as evidenced by the 1979 SATs,
where a sample of 30 random male
adolescents had a mean score 1 standard
deviation of 43677 and 30 random female
adolescents scored lower: 41681 (genders
were similar in educational backgrounds,
socioeconomic status, and age). Do you
agree with the authors conclusions?
Data Summary
n
Sample
Mean
Sample
Standard
Deviation
Group 1:
women
30 416 81
Group 2:
men
30 436 77
Twosample ttest
1. Define your hypotheses (null,
alternative)
H
0
:  math SAT = 0
Ha:  math SAT 0 [twosided]
Twosample ttest
2. Specify your null distribution:
F and M have similar standard
deviations/variances, so make a pooled
estimate of variance.
6245
58
81 ) 29 ( 77 ) 29 (
2
) 1 ( ) 1 (
2 2
2 2
2
=
+
=
+
+
=
m n
s m s n
s
f m
p
)
30
6245
30
6245
, 0 ( ~
58 30 30
+ T F M
4 . 20
30
6245
30
6245
= +
Twosample ttest
3. Observed difference in our experiment = 20
points
Twosample ttest
4. Calculate the pvalue of what you observed
98 .
4 . 20
0 20
58
=
= T
data _null_;
pval=(1probt(.98, 58))*2;
put pval;
run;
0.3311563454
5. Do not reject null! No evidence that men are better
in math ;)
Example 2: Difference in means
Example: Rosental, R. and Jacobson, L.
(1966) Teachers expectancies:
Determinates of pupils I.Q. gains.
Psychological Reports, 19, 115118.
The Experiment
(note: exact numbers have been altered)
Grade 3 at Oak School were given an IQ test at
the beginning of the academic year (n=90).
Classroom teachers were given a list of names of
students in their classes who had supposedly
scored in the top 20 percent; these students
were identified as academic bloomers (n=18).
BUT: the children on the teachers lists had
actually been randomly assigned to the list.
At the end of the year, the same I.Q. test was re
administered.
Example 2
Statistical question: Do students in the
treatment group have more improvement
in IQ than students in the control group?
What will we actually compare?
Oneyear change in IQ score in the treatment
group vs. oneyear change in IQ score in the
control group.
Academic
bloomers
(n=18)
Controls
(n=72)
Change in IQ score:
12.2 (2.0) 8.2 (2.0)
Results:
12.2 points
8.2 points
Difference=4 points
The standard deviation
of change scores was
2.0 in both groups. This
affects statistical
significance
What does a 4point
difference mean?
Before we perform any formal statistical
analysis on these data, we already have
a lot of information.
Look at the basic numbers first; THEN
consider statistical significance as a
secondary guide.
Is the association statistically
significant?
This 4point difference could reflect a
true effect or it could be a fluke.
The question: is a 4point difference
bigger or smaller than the expected
sampling variability?
Hypothesis testing
Null hypothesis: There is no difference between
academic bloomers and normal students (=
the difference is 0%)
Step 1: Assume the null hypothesis.
Hypothesis Testing
These predictions can be made by
mathematical theory or by computer
simulation.
Step 2: Predict the sampling variability assuming the null
hypothesis is true
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is truemath theory:
0 . 4
2
=
p
s
) 52 . 0
72
4
18
4
, 0 ( ~
88 " "
= + T
control gifted
Hypothesis Testing
In computer simulation, you simulate
taking repeated samples of the same
size from the same population and
observe the sampling variability.
I used computer simulation to take
1000 samples of 18 treated and 72
controls
Step 2: Predict the sampling variability assuming the null
hypothesis is truecomputer simulation:
Computer Simulation Results
Standard error is
about 0.52
3. Empirical data
Observed difference in our experiment =
12.28.2 = 4.0
4. Pvalue
tcurve with 88 dfs has slightly wider
cutoffs for 95% area (t=1.99) than a
normal curve (Z=1.96)
pvalue <.0001
8
52 .
4
52 .
2 . 8 2 . 12
88
= =
= t
If we ran this
study 1000 times,
we wouldnt
expect to get 1
result as big as a
difference of 4
(under the null
hypothesis).
Visually
5. Reject null!
Conclusion: I.Q. scores can bias
expectancies in the teachers minds and
cause them to unintentionally treat
bright students differently from those
seen as less bright.
Confidence interval (more
information!!)
95% CI for the difference: 4.01.99(.52) =
(3.0 5.0)
tcurve with 88 dfs
has slightly wider cut
offs for 95% area
(t=1.99) than a normal
curve (Z=1.96)
What if our standard deviation
had been higher?
The standard deviation for change
scores in treatment and control were
each 2.0. What if change scores had
been much more variablesay a
standard deviation of 10.0 (for both)?
Standard error is
0.52
Std. dev in
change scores =
2.0
Std. dev in
change scores =
10.0
Standard error is 2.58
With a std. dev. of 10.0
LESS STATISICAL POWER!
Standard
error is 2.58
If we ran this
study 1000 times,
we would expect to
get >+4.0 or s4.0
12% of the time.
Pvalue=.12
Dont forget: The paired Ttest
Did the control group in the previous
experiment improve
at all during the year?
Do not apply a twosample ttest to answer
this question!
AfterBefore yields a single sample of
differences
withingroup rather than betweengroup
comparison
Continuous outcome (means);
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent
groups
ANOVA: compares means
between more than two
independent groups
Pearsons correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
multivariate regression technique
used when the outcome is
continuous; gives slopes
Paired ttest: compares means
between two related groups (e.g.,
the same subjects before and
after)
Repeatedmeasures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two
or more groups; gives rate of
change over time
Nonparametric statistics
Wilcoxon signrank test:
nonparametric alternative to the
paired ttest
Wilcoxon sumrank test
(=MannWhitney U test): non
parametric alternative to the ttest
KruskalWallis test: non
parametric alternative to ANOVA
Spearman rank correlation
coefficient: nonparametric
alternative to Pearsons correlation
coefficient
Data Summary
n
Sample
Mean
Sample
Standard
Deviation
Group 1:
Change
72 +8.2 2.0
Did the control group in the
previous experiment improve
at all during the year?
28
29 .
2 . 8
72
2
0 2 . 8
2
71
= =
= t
pvalue <.0001
Normality assumption of ttest
If the distribution of the trait is normal, fine to
use a ttest.
But if the underlying distribution is not normal
and the sample size is small (rule of thumb:
n>30 per group if not too skewed; n>100 if
distribution is really skewed), the Central Limit
Theorem takes some time to kick in. Cannot use
ttest.
Note: ttest is very robust against the normality
assumption!
Alternative tests when normality
is violated: Nonparametric tests
Continuous outcome (means);
Outcome
Variable
Are the observations independent or correlated?
Alternatives if the normality
assumption is violated (and
small sample size):
independent correlated
Continuous
(e.g. pain
scale,
cognitive
function)
Ttest: compares means
between two independent
groups
ANOVA: compares means
between more than two
independent groups
Pearsons correlation
coefficient (linear
correlation): shows linear
correlation between two
continuous variables
Linear regression:
multivariate regression technique
used when the outcome is
continuous; gives slopes
Paired ttest: compares means
between two related groups (e.g.,
the same subjects before and
after)
Repeatedmeasures
ANOVA: compares changes
over time in the means of two or
more groups (repeated
measurements)
Mixed models/GEE
modeling: multivariate
regression techniques to compare
changes over time between two
or more groups; gives rate of
change over time
Nonparametric statistics
Wilcoxon signrank test:
nonparametric alternative to the
paired ttest
Wilcoxon sumrank test
(=MannWhitney U test): non
parametric alternative to the ttest
KruskalWallis test: non
parametric alternative to ANOVA
Spearman rank correlation
coefficient: nonparametric
alternative to Pearsons correlation
coefficient
Nonparametric tests
ttests require your outcome variable
to be normally distributed (or close
enough), for small samples.
Nonparametric tests are based on
RANKS instead of means and standard
deviations (=population parameters).
Example: nonparametric tests
10 dieters following Atkins diet vs. 10 dieters following
Jenny Craig
Hypothetical RESULTS:
Atkins group loses an average of 34.5 lbs.
J. Craig group loses an average of 18.5 lbs.
Conclusion: Atkins is better?
Example: nonparametric tests
BUT, take a closer look at the individual data
Atkins, change in weight (lbs):
+4, +3, 0, 3, 4, 5, 11, 14, 15, 300
J. Craig, change in weight (lbs)
8, 10, 12, 16, 18, 20, 21, 24, 26, 30
Jenny Craig
30 25 20 15 10 5 0 5 10 15 20
0
5
10
15
20
25
30
P
e
r
c
e
n
t
Weight Change
Atkins
300 280 260 240 220 200 180 160 140 120 100 80 60 40 20 0 20
0
5
10
15
20
25
30
P
e
r
c
e
n
t
Weight Change
ttest inappropriate
Comparing the mean weight loss of the
two groups is not appropriate here.
The distributions do not appear to be
normally distributed.
Moreover, there is an extreme outlier
(this outlier influences the mean a great
deal).
Wilcoxon ranksum test
RANK the values, 1 being the least weight
loss and 20 being the most weight loss.
Atkins
+4, +3, 0, 3, 4, 5, 11, 14, 15, 300
1, 2, 3, 4, 5, 6, 9, 11, 12, 20
J. Craig
8, 10, 12, 16, 18, 20, 21, 24, 26, 30
7, 8, 10, 13, 14, 15, 16, 17, 18, 19
Wilcoxon ranksum test
Sum of Atkins ranks:
1+ 2 + 3 + 4 + 5 + 6 + 9 + 11+ 12 +
20=73
Sum of Jenny Craigs ranks:
7 + 8 +10+ 13+ 14+ 15+16+ 17+ 18+19=137
Jenny Craig clearly ranked higher!
Pvalue *(from computer) = .018
*For details of the statistical test, see appendix of these slides
Binary or categorical outcomes
(proportions)
Outcome
Variable
Are the observations correlated? Alternative to the chi
square test if sparse
cells:
independent correlated
Binary or
categorical
(e.g.
fracture,
yes/no)
Chisquare test:
compares proportions between
two or more groups
Relative risks: odds ratios
or risk ratios
Logistic regression:
multivariate technique used
when outcome is binary; gives
multivariateadjusted odds
ratios
McNemars chisquare test:
compares binary outcome between
two correlated groups (e.g., before
and after)
Conditional logistic
regression: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., matched data)
GEE modeling: multivariate
regression technique for a binary
outcome when groups are
correlated (e.g., repeated measures)
Fishers exact test: compares
proportions between independent
groups when there are sparse data
(some cells <5).
McNemars exact test:
compares proportions between
correlated groups when there are
sparse data (some cells <5).
Difference in proportions (special
case of chisquare test)
Standard error of the difference of two proportions=
2 1
2 2 1 1
2 1 2
2 2
1
1 1
) ( ) (n
where ,
) 1 ( ) 1 (
or
) 1 ( ) 1 (
n n
p n p
p
n
p p
n
p p
n
p p
n
p p
+
+
=
+
+
=
2 group in number
1 group in number
2 group in proportion
1 group in proportion
) proportion average (just
2
1
2
1
2 1
2 2 1 1
=
=
=
=
+
+
=
n
n
p
p
n n
p n p n
p
Recall, variance of a
proportion is p(1p)/n
Use average (or pooled)
proportion in standard
error formula, because
under the null
hypothesis, groups have
equal proportions.
Follows a normal
because binomial can be
approximated with
normal
Recall casecontrol example:
Smoker (E)
Nonsmoker
(~E)
Stroke (D)
15 35
No Stroke (~D)
8 42
50
50
Absolute risk: Difference in
proportions exposed
% 14 % 16 % 30
50 / 8 50 / 15 ) ~ / ( ) / (
= =
= D E P D E P
Smoker (E)
Nonsmoker
(~E)
Stroke (D)
15 35
No Stroke (~D)
8 42
50
50
Difference in proportions
exposed
67 . 1
084 .
14 .
50
77 . * 23 .
50
77 . * 23 .
% 0 % 14
= =
+
= Z
.31 to 03 . 0 084 . * 96 . 1 14 . 0 : CI % 95 =
Example 2: Difference in
proportions
Research Question: Are antidepressants
a
risk factor for suicide attempts in
children and adolescents?
Example modified from: Antidepressant Drug Therapy and Suicide in Severely
Depressed Children and Adults ; Olfson et al. Arch Gen Psychiatry.2006;63:865
872.
Example 2: Difference in
Proportions
Design: Casecontrol study
Methods: Researchers used Medicaid records
to compare prescription histories between
263 children and teenagers (618 years) who
had attempted suicide and 1241 controls who
had never attempted suicide (all subjects
suffered from depression).
Statistical question: Is a history of use of
antidepressants more common among cases
than controls?
Example 2
Statistical question: Is a history of use of
antidepressants more common among
heart disease cases than controls?
What will we actually compare?
Proportion of cases who used antidepressants
in the past vs. proportion of controls who did
No (%) of
cases
(n=263)
No (%) of
controls
(n=1241)
Any antidepressant
drug ever
120 (46%) 448 (36%)
46%
36%
Difference=10%
Results
Is the association statistically
significant?
This 10% difference could reflect a true
association or it could be a fluke in this
particular sample.
The question: is 10% bigger or smaller
than the expected sampling variability?
Hypothesis testing
Null hypothesis: There is no association
between antidepressant use and suicide
attempts in the target population (= the
difference is 0%)
Step 1: Assume the null hypothesis.
Hypothesis Testing
Step 2: Predict the sampling variability assuming the null
hypothesis is true
) 033 . =
1241
)
1504
568
1 (
1504
568
+
263
)
1504
568
1 (
1504
568
= , 0 ( N ~ p
controls cases
Also: Computer Simulation Results
Standard error is
about 3.3%
Hypothesis Testing
Step 3: Do an experiment
We observed a difference of 10% between
cases and controls.
Hypothesis Testing
Step 4: Calculate a pvalue
003 . = p ; 0 . 3 =
033 .
10 .
= Z
When we ran this
study 1000 times,
we got 1 result as
big or bigger than
10%.
Pvalue from our simulation
We also got 3
results as small
or smaller than
10%.
Pvalue
From our simulation, we
estimate the pvalue to be:
4/1000 or .004
Here we reject the null.
Alternative hypothesis: There is an association
between antidepressant use and suicide in the
target population.
Hypothesis Testing
Step 5: Reject or do not reject the null hypothesis.
What would a lack of
statistical significance mean?
If this study had sampled only 50 cases
and 50 controls, the sampling variability
would have been much higheras
shown in this computer simulation
Standard error is
about 10%
50 cases and 50
controls.
Standard error is
about 3.3%
263 cases and
1241 controls.
With only 50 cases and 50 controls
Standard
error is
about 10%
If we ran this
study 1000 times,
we would expect to
get values of 10%
or higher 170 times
(or 17% of the
time).
Twotailed pvalue
Twotailed
pvalue =
17%x2=34%
Practice problem
An August 2003 research article in
Developmental and Behavioral Pediatrics
reported the following about a sample of UK
kids: when given a choice of a nonbranded
chocolate cereal vs. CoCo Pops, 97% (36) of
37 girls and 71% (27) of 38 boys preferred
the CoCo Pops. Is this evidence that girls are
more likely to choose brandnamed products?
Answer
1. Hypotheses:
H
0
: p
p
= 0
Ha: p
p
0 [twosided]
2. Null distribution of difference of two proportions:
3. Observed difference in our experiment = .97.71= .26
4. Calculate the pvalue of what you observed:
085 .
38
) 16 (. 84 .
37
) 16 (. 84 .
)
38
)
75
63
1 (
75
63
37
)
75
63
1 (
75
63
, 0 ( ~
= +
= o N p p
m f
data _null_;
pval=(1probnorm(3.06))*2;
put pval;
run;
0.0022133699
5. pvalue is sufficiently low for us to reject the null; there does appear to be a difference in
gender preferences here.
Null says ps are equal so
estimate standard error using
overall observed p
06 . 3
085 .
0 26 .
=
= Z
Key twosample Hypothesis
Tests
Test for H
o
:
x

y
= 0 (
2
unknown, but roughly equal):
Test for H
o
: p
1
p
2
= 0:
2
) 1 ( ) 1 (
;
2 2
2
2 2
2
+
=
+
n
s n s n
s
n
s
n
s
y x
t
y y x x
p
y
p
x
p
n
2 1
2 2 1 1
2 1
2 1
;
) 1 )( ( ) 1 )( (
n n
p n p n
p
n
p p
n
p p
p p
Z
+
+
=
=
Corresponding confidence
intervals
For a difference in means, 2 independent
samples (
2
s unknown but roughly equal):
For a difference in proportions, 2 independent
samples:
y
p
x
p
n
n
s
n
s
t y x
2 2
2 / , 2
) ( + 
o
2 1
2 / 2 1
) 1 )( ( ) 1 )( (
)
(
n
p p
n
p p
Z p p
+

o
Appendix: details of ranksum
test
Wilcoxon Ranksum test
) , min(
12
) 1 (
2
Z
2
) 1 (
U
, 10 , 0 1 for
2
) 1 (
U
) (n population larger the from ranks the of sum the is T
) (n population smaller from ranks the of sum the is T
n. to 1 from order in ns observatio the of all Rank
2 1 0
2 1 2 1
2 1
0
2
2 2
2 1 2
2 1 1
1 1
2 1 1
2 2
1 1
U U U
n n n n
n n
U
T
n n
n n
n n T
n n
n n
=
+ +
=
+
+ =
> >
+
+ =
Find P(U U
0
) in MannWhitney U tables
With n
2
= the bigger of the 2 populations
Example
For example, if team 1 and team 2 (two gymnastic
teams) are competing, and the judges rank all the
individuals in the competition, how can you tell if
team 1 has done significantly better than team 2 or
vice versa?
Answer
Intuition: under the null hypothesis of no difference between the two
groups
If n
1
=n
2
, the sums of T
1
and T
2
should be equal.
But if n
1
n
2
, then T
2
(n
2=
bigger group) should automatically be
bigger. But how much bigger under the null?
For example, if team 1 has 3 people and team 2 has 10, we could
rank all 13 participants from 1 to 13 on individual performance. If
team1 (X) and team2 dont differ in talent, the ranks ought to be
spread evenly among the two groups, e.g.
1 2 X 4 5 6 X 8 9 10 X 12 13 (exactly even distribution if team1
ranks 3
rd
, 7
th
, and 11
th
)
(larger) 2 group of ranks of sum
(smaller) 1 group of ranks of sum
2
1
=
=
T
T
2 1
2 2 1 1 2
2
2 2 1 1 2 1
2
1
2 1 2 1
1
2 1
2
) 1 (
2
) 1 (
2
) (
2
) 1 )( (
2 1
n n
n n n n n n n n n n n n
n n n n
i T T
n n
i
+
+
+
+
=
+ + + + +
=
+ + +
= = +
+
=
Remember
this?
sum of withingroup ranks for smaller
group.
2
) 1 (
1 1
1
1
+
=
=
n n
i
n
i
sum of withingroup ranks for larger
group.
2
) 1 (
2 2
1
2
+
=
=
n n
i
n
i
30 6 55 91
2
) 14 )( 13 (
: here e.g.,
13
1
2 1
+ + = = = = +
= i
i T T
2 1
2 2 1 1
2 1
2
) 1 (
2
) 1 (
n n
n n n n
T T +
+
+
+
= +
Takehome point:
49 6 55
6
2
) 4 ( 3
55
2
) 11 ( 10
3
1
10
1
=
=
= =
=
=
i
i
i
T1 = 3 + 7 + 11 =21
T2 = 1 + 2 + 4 + 5 + 6 + 8 + 9 +10 + 12 +13 = 70
7021 = 49 Magic!
The difference between the sum of the
ranks wi thin each individual group is 49.
The difference between the sum of the
ranks of the two groups is also equal to 49
if ranks are evenly interspersed (null is
true).
It turns out that, if the null hypothesis is true, the difference between
the largergroup sum of ranks and the smallergroup sum of ranks is
exactly equal to the difference between T
1
and T
2
2
) 1 (
2
) 1 (
null, Under the
1 1 2 2
1 2
+
+
=
n n n n
T T
. equal should sum Their
2
) 1 (
U define
2
) 1 (
U define
2 2
) 1 (
2 2
) 1 (
2
) 1 (
2
) 1 (
2
) 1 (
2
) 1 (
2 1
1 2 1
1 1
1
2 2 1
2 2
2
2 1 1 1
1
2 1 2 2
2
1 1 2 2
1 2
2 1
2 2 1 1
1 2
n n
T n n
n n
T n n
n n
n n n n
T
n n n n
T
n n n n
T T
n n
n n n n
T T
+
+
=
+
+
=
+
+
=
+
+
=
+
+
=
+
+
+
+
= +
From slide 23
From slide 24
Define new
statistics
Here, under null:
U2=55+3070
U1=6+3021
U2+U1=30
under null hypothesis, U
1
should equal U
2
:
0 )] T ( )
2
) 1 (
2
) 1 (
[( ) U  E(U
1 2
1 1 2 2
1 2
=
+
+
= T
n n n n
E
The Us should be equal to each other and will equal n
1
n
2
/2:
U
1
+ U
2
= n
1
n
2
Under null hypothesis, U
1
= U
2
= U
0
E(U
1
+ U
2
) = 2E(U
0
) = n
1
n
2
E(U
1
= U
2
=U
0
) = n
1
n
2
/2
So, the test statistic here is not quite the difference in the
sumofranks of the 2 groups
Its the smaller observed U value: U
0
For small ns, take U
0
, and get pvalue directly from a U table.
For large enough ns (>10 per
group)
) (
2
) (
) (
Z
0
2 1
0
0
0 0
U Var
n n
U
U Var
U E U
=
=
2
) (
2 1
0
n n
U E =
12
) 1 (
) (
2 1 2 1
0
+ +
=
n n n n
U Var
Add observed data to the
example
Example: If the girls on the two gymnastics teams were ranked as follows:
Team 1: 1, 5, 7 Observed T
1
= 13
Team 2: 2,3,4,6,8,9,10,11,12,13 Observed T
2
= 78
Are the teams significantly different?
Total sum of ranks = 13*14/2 = 91 n
1
n
2
=3*10 = 30
Under the null hypothesis: expect U
1
 U
2
= 0 and U
1
+ U
2
= 30 (each should equal about 15 under
the null) and U
0
= 15
U
1
=30 + 6 13 = 23
U
2
= 30 + 55 78 = 7
U
0
= 7
Not quite statistically significant in U tablep=.1084 (see attached) x2 for twotailed test
Example problem 2
A study was done to compare the Atkins Diet (lowcarb) vs. Jenny Craig
(lowcal, lowfat). The following weight changes were obtained; note
they are very skewed because someone lost 100 pounds; the mean loss
for Atkins is going to look higher because of the bozo, but does that
mean the diet is better overall? Conduct a MannWhitney U test to
compare ranks.
Atkins
Jenny Craig
100
11
8
15
4
5
+5
+6
+8
20
+2
Answer
Corresponding Ranks (lower is more weight loss!):
Atkins
Jenny Craig
1
4
5
3
7
6
9
10
11
2
8
Sum of ranks for JC = 25 (n=5)
Sum of ranks for Atkins=41 (n=6)
n
1
n
2
=5*6 = 30
under the null hypothesis: expect U
1
 U
2
= 0 and
U
1
+ U
2
= 30 and U
0
= 15
U
1
=30 + 15 25 = 20
U
2
= 30 + 21 41 = 10
U
0
= 10; n
1
=5, n
2
=6
Go to MannWhitney chart.p=.2143x 2 = .42