Vous êtes sur la page 1sur 8

Stat 101 Fall 2013

Final Exam Review Session


Solutions
1. For each of the situations described below, select the inference technique that you believe is the most
applicable. If it is a statistical hypothesis test, state the null and alternative hypotheses. (Define all terms
specific to the example, rather than just giving a response in general terms such as 1 = 2). Do not go
into details of the computations required.

(a) A biologist wants to determine whether the cavity size of nests is different across 9 different species of
rodents.
One-way ANOVA F-test
H0: 1 = 2 = = 9
HA: at least one pair i j

(b) A biologist wants to determine whether the cavity size of nests is different between birds and rodents,
and whether the species within those types are different as well (6 types of birds and 9 types of rodents).
This ones complicated, but likely a pair of Two-way ANOVA F-tests: one for birds vs. rodents
H0: 1 = 2
HA: at least one pair i j
And one for species
H0: 1 = 2 = = 15
HA: at least one pair i j

(c) A Harvard student is interested in determining whether students are fans of Miley Cyrus or not. She
wants to know if this is related to gender.
Two sample proportion z-test
H0: males = females
HA: males females

(d) A Harvard student is interested in determining whether students are fans of Miley Cyrus or not. She
wants to know if this is related to concentration.
2 test for independence of two categorical variables
H0: being a fan of Miley is independent of concentration
HA: being a fan of Miley is dependent on concentration

(e) A Harvard student is interested in determining whether students are fans of Miley Cyrus or not. She
wants to know if this is related to GPA.
Logistic regression z-test for
H0: = 0
HA: 0
(f) A survey of patients who went through hip replacement surgery were asked on two surveys, once
before and once after surgery, whether they had pain or not in their hip while sitting. You want to know if
the average response changed after surgery.
Paired proportion test (McNemars)
H0: before = after
HA: before after

(g) A Harvard student is interested in determining how much money recent Harvard graduates make in
their first job after graduation. He wants to determine if this is related to gender.
Two sample t-test
H0: female = male
HA: female male

(h) A Harvard student is interested in determining how much money recent Harvard graduates make in
their first job after graduation. He want to determine if this is related to concentration at Harvard.
One-way ANOVA F-test
H0: econ = statistics = = psychology
HA: at least one pair i j

(i) A Harvard student is interested in determining how much money recent Harvard graduates make in
their first job after graduation. He want to determine if this is related to GPA at Harvard.
Simple regression t-test
H0: 1 = 0
HA: 1 0

(j) A Harvard student is interested in determining how much money recent Harvard graduates make in
their first job after graduation. He want to determine if this is related to gender, concentration, and GPA
all at once.
Multiple Regression (F-test most likely)
H0: 1 = 2 = 3 = 0 (there may be lots of s for the binary variables for all the concentrations)
HA: At least one j 0

2. Kevins dog (a mix-breed Akita named Rio) often barks when people are at the front door. If the
person at the front door is a stranger, Rio barks 90% of the time. If the person at the front door is Kevins
friend, Rio barks only 20% of the time. About 75% of people who come to the front door are Kevins
friends. (Note: for this problem, everyone is either Kevins friend or a stranger).

(a) What is the probability that Rio barks at the next person at the front door?

Start by defining the two random phenomena (whos at the door and whether Rio barks).

Let F be the event that the person at the door is a friend. Let B be the event that Rio barks. Then
this question is asking for the overall P(B), which can be solved as follows:
P( B) P( B and F ) P( B and F C )
P( B | F ) P( F ) P( B | F C ) P( F C )
0.20(0.75) 0.90(0.25) 0.375
(b) If Rio is barking at someone at the front door, what is the probability that person is Kevins friend?

P( B and F )
P( F | B)
P( B)
0.20(0.75) 0.15
0.40
0.375 0.375

3. In the survey given in lecture we measured the following variables:


looks - the percent of Harvard students that a student think is better looking than him or her
relationship - a binary variable indicating whether the student is in a significant relationship
(relationship = 1) or single (relationship = 0)
female - a binary variable indicating whether the student is female (female = 1) or male (female = 0)

(a) To the right is the histogram of the response


variable, looks. Comment on the plot.

The histogram looks moderately right-skewed. We


may want to consider transforming this variable;
however, the skew is not overwhelming so we should
wait to see if the residuals show this skewness as well
(if it were extremely right-skewed, than doing a log-
transform right away would make sense).

(b) Based on this model, what is the estimated mean of looks for women?

From the model with just female as a predictor:


y b0 b1 x1 24.414 4.824(1) 29.238 . So women report that 29.2% of the Harvard population is
better looking than themselves, on average.

(c) Is there a significant difference in average looks between men and women? How you know?

No, there is not a significant difference in the average looks of women and men. This can be determined
because the t-statistic (t = 1.468) for the coefficient for the slope (which estimates the average difference
in the two groups here) is not significant (p = 0.144).
(d) What is the interpretation of the coefficient for female in this model?

Here, the coefficient of 4.769 can be interpreted as the expected difference in looks comparing women to
men when relationship status is unchanged.

(e) Kevin is in a relationship. What is the estimated value of looks for Kevin?

y a b1 x1 b2 x2 23.413 4.769(0) 2.640(1) 26.053

(f) Based on this model, what are the estimated models for the effect of relationship status on looks for
men? For women?

For men:
y a b1 x1 b2 x2 b3 ( x1 * x2 ) 21.167 9.542(0) 8.561( x2 ) 12.238(0 * x2 ) 21.167 8.561( x2 )

For women:
y a b1 x1 b2 x2 b3 ( x1 * x2 ) 21.167 9.542(1) 8.561( x2 ) 12.238(1 * x2 )
(21.167 9.542) (8.561 12.238)( x2 ) 30.709 3.677( x2 )

Notice, the slope relating relationship to looks for women is negative (-3.677) while it is positive for
men (8.561).
(g) Is there evidence of a significant difference in the effect of relationship on looks between the sexes?

This is simply a test to see if the interaction term is different from zero. Here we find that the t-statistic (t
= -1.826) is not significant since the p-value (p = 0.070) is greater than 0.05. We cannot conclude that the
effect of relationship status on perceived looks is different in the sexes, and in fact (but we are pretty darn
close).
(h) Briefly comment on the model's assumptions based on these 2 graphs.

There are 4 assumptions in linear regression:

(1) Residuals are normally distributed: the histogram of the residuals on the left looks moderately
right-skewed. A transformation may be helpful.

(2) Residuals have constant variance: the graph on the right shows the residuals ( ei ) vs. fitted
values ( y i ). Here we see that the graph may show a slight more spread in the right side of the
graph than the left side of the graph.

(3) Linear relationships: we look at the residuals vs. fitted values for this one. Here we that the
residuals seem to be centered at zero no matter where you are along the x-axis (aka, no curvature).

(4) Independence of observations. We cannot check it via graphs. Based on the sampling design
(hopefully a good random sample), we can imagine this assumption is safe.

(i) Propose a transformation on the data to try to fix any of the issues you mentioned in part (h) above.

Due to the fact that there is a right-skewed histogram of the residuals, its a pretty good bet that the
distribution of looks is right-skewed as well. We could consider taking the log of the variable looks to fix
that (and it very well may fix the constant variance assumption as well).
4. Based on the same survey, we want to try to determine if ones opinion of their looks depends on class
year. Below is the relevant SPSS output:

a) If a regression model was fit to predict looks with sophomore, junior, and senior dummy variables as
the predictors, what would be the formula for the estimated regression model?

This set-up means that freshmen comprise the reference group, so we know the intercept for this model
will be a = 26.964. All of the slop estimates will be in reference to that value (so the coefficient for
sophomores will be b1 = 26.156 26.964 = -0.808):

y a b1 x1 b2 x2 b3 x3 26.964 0.808( x1 ) 0.297( x2 ) 2.194( x3 )

Where x1 is the indicator/dummy variable for sophomores, x2 is the indicator/dummy variable for juniors,
and x3 is the indicator/dummy variable for seniors.

b) Calculate and write-out the ANOVA table for this dataset.

g
i ) SSB SSM ni ( xi x ) 2
i 1

28(26.964 26.725) 2 30(26.667 26.725) 2 19(29.158 26.725) 2 90(26.156 26.725) 2 143.3


ii ) dfM g 1 4 1 3
SSM 143.3
iii ) MSM 47.77
g 1 3
g
iv ) SSW SSE (ni 1) si2 (27)17.6162 2 (29)20.2712 2 (18)20.9345 2 (89)22.9616 2 75108.13
i 1

v) dfE n g 167 4 163


SSE 75108.13
vi) MSE 460.79
(n g ) 163

47.77
vii) F 0.1037
460.79
Heres the ANOVA table from SPSS to double-check:

c) Is there evidence of a difference among the 4 class years? Perform a formal hypothesis test.

H0: 1 = 2 = 3 = 4
HA: at least one pair i j
F = 0.104
p-value = 0.958
Since the p-value > 0.05, we cannot reject the null hypothesis. It looks like the mean looks is may
truly be the same in the 4 class years.

d) Ignoring your results in part (c) above, perform a formal hypothesis test to determine whether freshmen
have a different perceived looks than the other 3 class years combined.

This can be determined based on a contrast test:

H0: fr 1/3(so + jr + sr) = 0


HA: fr 1/3(so + jr + sr) 0
1 1 1 1
x fr ( x so x jr x sr ) 0 (26.964) (26.667) (29.158) (26.156)
t 3 3 3 3 0.078
2 2 2
a fr a so a jr a sr 2
12
(1 / 3) 2
(1 / 3) 2
(1 / 3) 2
sw 460.785
n fr n so n jr nsr 28 30 19 90
(df = n g = 167 3 = 164)
p-value > 2(0.25) = 0.50
Since p-value > 0.05, we cannot reject the null hypothesis (not surprising given the results of the
F-statistic). It looks like freshmen may have the same mean looks as the other 3 groups combined.

5. Below is the SPSS output for a logistic regression model trying to determine what is related to whether
a student is in a relationship or not. The predictors are whether or not the individual is female, the fastest
(s)he has ever driven, and how many foreign countries (s)he has ever visited (n = 172).
(a) Briefly summarize the model results above.

From the output above, we see that female is negatively related to relationship status but is not significant
while adjusting for the other 2 variables in the model (p = 0.0811), and countries and fastest_drive are
both positively related to relationship status while controlling for the other factors in the model.
Countries is not significant (p = 0.0731), and fastest_drive is the only significant predictor (p = 0.0206)

(b) What is the estimated chance that an individual is in a relationship if she is female, has visited 5 other
countries, and has driven 100mph? What if she thinks she has never driven (fastest_drive = 0)?

e a b1x1 b2 x2 b3 x3 e 2.5290.63802(1)0.01711(100)0.04711(5) e 1.220


0.228 22.8%
1 e a b1x1 b2 x2 b3 x3 1 e 2.5290.63802(1)0.01711(100)0.04711(5) 1 e 1.220
e a b1x1 b2 x2 b3 x3 e 2.5290.63802(1)0.01711( 0)0.04711(5) e 2.931
0.0506 5.06%
1 e a b1x1 b2 x2 b3 x3 1 e 2.5290.63802(1)0.01711( 0)0.04711(5) 1 e 2.931
(c) What is the estimated odds ratio for fastest_drive in this model?

OR e b e 0.01711 1.017

(d) Interpret what this odds ratio means.

For every extra mile per hour someone has driven, their odds of being in a relationship goes up by a
multiplicative factor of 1.017 (or a 1.7% increase) controlling for the other two x-variables in the model.

(e) Is there evidence of an effect of fastest_drive on whether a student is in a relationship or not?

Yes, there is significant evidence that fastest_drive is related to relationship status. In fact, people who
[report that they] drive faster tend to [report to] be in a relationship more often, controlling for sex and #
countries they have visited.

Vous aimerez peut-être aussi