Vous êtes sur la page 1sur 15

-2-

Question 1 (15 Marks)


Post-traumatic stress disorder (or PTSD) is a severe anxiety disorder that
can develop after exposure to a traumatic event. Diagnostic symptoms for PTSD
include re-experiencing the original trauma(s) through flashbacks or nightmares.
A psychologist gathers data from a sample of Vietnam War veterans in Australia.
The veterans had all been diagnosed with PTSD at some stage following their
service in the Vietnam War. All subjects completed a General Health
Questionnaire with 28 questions.
There were 107 males in the sample.
The variables considered are described below:
Variable

Description

Subject_Id

Identification number of subject

Age

Age in years

GHQ-Total

Total score for all four sub-sections (GHQ-A, GHQ-B, GHQ-C


and GHQ-D)

Combat

Number of days exposed to combat

PTSD

1: Early Onset PTSD,

2: Late Onset PTSD

a) Identify the following variables as Continuous, Ordinal or Nominal.


Variable

Type

Age:
GHQ-Total:
Combat:
PTSD:

b) Is this an experimental or observational study? Explain.

c) What is the target population for this study?

-3-

Question 1 continued
d) Why may the results from this study not be used to draw conclusions
about the population of all people with PTSD in Australia?

e) If you were to carry out a similar study about Australian veterans of the
current war in Afghanistan, describe how you might obtain a
representative sample of 200 subjects from the population of interest.

f) What type of display may be used to investigate the relation between


GHT-Total and Age in the target population?

g) If we wanted to investigate whether we can predict general health


(GHQ-Total) from age, what would be the dependent variable? Explain.

-4-

Question 2 (9 Marks)
Total scores on the General Health Questionnaire (GHQ-Total) in the general
adult population are known to be normally distributed with a mean of 19 and
standard deviation of 2.51. Answer the following questions showing your
working.
a) What is the probability that an individual from the general adult
population would obtain a score which is higher than 22 on the
GHQ-Total?

b) What is the probability that the GHQ-Total scores of five randomly


selected individuals have an average score which is lower than 18?

c) What is the upper quartile of the distribution of GHQ-Total scores in the


general adult population?

-5-

Question 3 (9 Marks)
Research Question: Is the Ecological Footprint for countries in the Americas,
Central Asia and Europe less than 3.0 on average?
According to the World Wildlife Foundation, a countrys Ecological Footprint is
the sum of all the cropland, grazing land, forest and fishing grounds required to
produce the food, fibre and timber it consumes, to absorb the wastes emitted
when it uses energy and to provide space for its infrastructure.
The average Ecological Footprint (measured in hectares per person) for a
random sample of 25 countries from the Americas, Central Asian and Europe was
found to be 2.832 with a standard deviation of 1.281.
A histogram and a boxplot for the Ecological Footprints for these countries are
given below.
Histogram of Eco Footprint (ha per person)

Boxplot of Eco Footprint (ha per person)

10

Eco Footprint (ha per person)

Frequency

0.0

1.2
2.4
3.6
Eco Footprint (ha per person)

4.8

Source: Watkins et al., Statistics, from data to decision, Second Edition (2011), Wiley (adapted).

Use the above information to test the claim that the average Ecological Footprint
for countries in the Americas, Central Asia and Europe is less than 3.0 hectares
per person. For any calculations, show your working.
Hypothesis Test:

-6-

Question 4 (8 Marks)
Research Question: Is a new diet effective in reducing cholesterol?
Fifteen people began a new diet to reduce their cholesterol levels. The table
below shows the cholesterol readings for these fifteen people both before the
new diet and again one month after the diet began. The differences between the
two readings are given as well.
Dieter

Before

After

Differences
(Before - After)

255

197

58

230

225

290

250

40

242

215

27

300

270

30

250

235

15

215

190

25

230

220

10

225

200

25

10

219

203

16

11

236

223

13

12

240

220

20

13

215

180

35

14

217

195

22

15

231

235

-4

Stem-and-Leaf Display:
Differences (Before - After)
Stem-and-leaf of
Differences
(Before - After)
N = 15
Leaf Unit = 1.0

1 -0
2
0
6
1
(5) 2
4
3
2
4
1
5

4
5
0356
02557
05
0
8

Test of mu = 0 vs > 0
Variable
Differences
(Before After)

N
15

Mean
22.47

StDev
15.05

SE Mean
3.89

95% Lower
Bound
15.62

T
*

P
0.000

Use the stem-and-leaf plot for the differences and the Minitab output (which
gives results based a one-sample t-test for the differences) to answer the
following questions.
a) State the null and the alternative hypothesis being tested in the Minitab
output.

b) Comment on the normality assumption for the differences.

c)

Calculate the test statistic. Show your working.

d) Do you reject or not reject the null hypothesis stated in part a? Give a
reason for your answer and clearly state a conclusion.

-7-

Question 5 (8 Marks)
Two samples of female students participated in an experiment to investigate
alternative treatments for the eating disorder, bulimia. One sample consisted of
11 students known to suffer from bulimia; the other sample consisted of 14
students with normal eating habits. Each student completed a questionnaire
from which a fear of negative evaluation (FNE) score was produced. The higher
the score, the greater was the fear of negative evaluation. A summary table and
a histogram for the results from this experiment are presented below:
Histogram of FNE score
5
Bulimic

10

15

20

25

Normal

Frequency

n
Mean
StDev

Normal
Eating
habits
(N)
14
14.14
5.29

Degrees of freedom = 22

Bulimic
eating
habits
(B)
11
17.82
4.92

10

15

20

25
FNE score

Panel variable: Eating habits

Source: McClave, J.T. and T. Sincich, Statistics, Eleventh Edition (2009), Pearson (adapted).

a) Calculate a two-sided 95% confidence interval for the difference between


the population means of the FNE scores for students with bulimic eating
habits (B) and students with normal eating habits (N). Interpret the
results.

b) Based on your results in the previous part, what conclusion can you make
H1: B N
regarding the following hypothesis:
H0: B = N,

c) What assumptions are required for the interval of part a to be statistically


valid? Are these assumptions reasonably satisfied? Explain.

-8-

Question 6 (11 Marks)


Research Question: Do more than 50% of car crashes occur within 8km of
home?
A large insurance company conducted a study into car crashes. It found that out
of a random selection of 2200 car crashes, 1144 of them occurred within 8km of
home.
Source: Triola, M.F., Elementary Statistics, Third Edition (2007), Pearson (adapted).

a) Use a z-test for a proportion to test the claim that more than 50% of car
crashes occur within 8km of home.
Hypothesis Test

b) The lower 95% confidence bound is found to be 0.5025. Interpret the


meaning of this confidence bound in the context of this problem.

-9-

Question 7 (8 marks)
Recall the data used in Assignment 1. The data, which are described below were
recorded on 1151 AIDS patients between 1996 and 1997. These patients were
recruited to take part in a clinical trial to compare survival times for AIDS
patients treated with a standard two-drug regimen, with survival times for
patients treated with a new three-drug regimen. For this problem, we will only
use the variables Time (which has been categorised here) and Treatment.
Variable Name

Variable Description

ID

Subject ID

Time

Time to AIDS diagnosis or death:


0 = Less than 6 months
1 = 6 months or longer

Treatment

Treatment: 0 = Two-drug treatment regime


1 = Three drug treatment regime

Source: Hosmer, D.W. and Lemeshow, S. and May, S. (2008), Applied Survival Analysis: Regression Modeling of
Time to Event Data: Second Edition, John Wiley and Sons Inc., New York, NY

The following Minitab output was obtained from the AIDS study described above.
Use this output to answer the questions on this question.
Rows: Treatment

Columns: Time

<6 months

6 months+

All

Chart of Time, Treatment


500

166

411

577

150.4

426.6

577.0

1.6201

0.5711

400

Count

2 Drug

300

200

100

3 Drug

All

Cell Contents:

134

440

574

149.6

424.4

574.0

1.6285

0.5741

300

851

1151

300.0

851.0

1151.0

Count
Expected count
Contribution to Chi-square

a) Comment on the clustered bar chart.

0
Treatment
Time

2 Drug
3 Drug
< 6 months

2 Drug
3 Drug
6 months+

- 10 -

Question 7 continued
b) Use the output on the previous page to carry out an appropriate
hypothesis test to answer the research question below.
Research Question: Is there an association between the time to diagnosis or
death and the treatment regime prescribed to AIDS patients?
Hypothesis Test

- 11 -

Question 8 (8 marks)
An AIDS specialist in 1997 claimed that 5% of patients presenting with AIDS
were aged under 25 years and 10% were aged over 50 years, with the
remainder aged between 25 and 50 years. Of the 1151 patients in the study
described in the last question, 34 patients were aged under 25 years and 113
patients were aged over 50 years. Assuming the sample is representative of
AIDS patients in 1997, carry out an appropriate hypothesis test to test the claim
made by the AIDS specialist.
Research Question: Were 5% of AIDS patients in 1997 aged under 25, 10%
aged over 50 years, and the remainder aged between 25 and 50 years?

Hypothesis Test

- 12 -

Question 9 (24 marks)


Research Question: What variables are useful to predict oxygen consumption
during exercise?
During exercise, your body uses large amounts of oxygen. It is both difficult and
expensive to measure the volume of oxygen used. A study was undertaken to
determine whether the amount of oxygen used during exercise could be
predicted from other variables which were easier to measure. A random sample
of males enrolled in a physical fitness course was selected for the study. Each of
the 31 males in the study was asked to run 2.4km and the following information
was recorded:
Variable Name

Variable Description

Oxygen

Oxygen consumption (ml per kg bodyweight per minute)

Age

Age (years)

Runtime

Time to run 2.4km (minutes)

RestPulse

Heart rate while resting (beats per minute)

RunPulse

Heart rate while running (beats per minute at same time


oxygen rate measured)

Age, running time, resting pulse rate and running pulse rate were all
investigated as possible determinants of oxygen consumption.
The following descriptive statistics were obtained:
Variable

mean

St.dev

Oxygen

31

47.38

5.33

Age

31

47.68

5.21

Runtime

31

10.59

1.39

RestPulse

31

53.45

7.62

RunPulse

31

169.95

10.25

The following Minitab outputs were obtained to investigate the relationship


between Oxygen consumption and Age.
The regression equation is

Scatterplot of Oxygen vs Age

Oxygen = 62.2 - 0.311 Age

60

Oxygen

55

50

Predictor

Coef

SE Coef

Constant

62.221

8.670

7.18

0.000

-0.3114

0.1808

-1.72

0.096

Age
45

R-Sq = 9.3%

40

40

44

48
Age

52

56

- 13 -

Question 9 continued
a) Describe the target population for this study.

b) Use the information on the previous page to:


i.

Calculate the correlation coefficient to assess the linear relation


between Oxygen and Age.

ii. Comment on the linear relationship (or lack thereof) between


Oxygen and Age.

iii. Give the best possible prediction for the oxygen consumption
during exercise for a male in the target population who is aged
42 years.

Now we will consider running time as a predictor of oxygen consumption. The


following Minitab outputs were obtained to investigate the relationship between
Oxygen and RunTime. The values for the test statistics and the p-value have
deliberately been replaced by a *. Use the output below to answer the
questions on the following page.
The regression equation is

Scatterplot of Oxygen vs RunTime

Oxygen = 82.4 - 3.31 RunTime

60

Oxygen

55

Predictor

Coef

Constant

82.422

RunTime

-3.3106

SE Coef

3.855

21.38

0.000

0.3612

50

45

40

10

11
RunTime

12

13

14

R-Sq = 74.3%

- 14 -

Question 9 continued
c) Use the scatterplot provided to comment on the relation between oxygen
consumption and running time.

d) Write down the goodness of fit statistic and clearly explain what this value
means in relation to oxygen consumption and running time.

e)

Predict the oxygen consumption for a male in the target population who
runs for 10 minutes.

f)

Use the output above to carry out an appropriate hypothesis test to


determine whether running time is a useful predictor of oxygen
consumption.
Hypothesis Test:

- 15 -

Question 9 continued
Finally, we will consider resting pulse rates and running pulse rates as predictors
of oxygen consumption. The following Minitab outputs were obtained to
investigate these relations.
Scatterplot of Oxygen vs RunPulse

60

60

55

55

Oxygen

Oxygen

Scatterplot of Oxygen vs RestPulse

50

50

45

45

40

40

40

45

50

55
RestPulse

60

65

70

140

150

160

170

180

The regression equation is

The regression equation is

Oxygen = 62.3 - 0.279 RestPulse

Oxygen = 82.5 - 0.207 RunPulse

Predictor

Coef

Constant

62.300

RestPulse -0.2792

R-Sq = 15.9%

190

RunPulse

SE Coef

6.425

9.70

0.1190

-2.35

Predictor

Coef

0.000

Constant

82.46

0.026

RunPulse -0.20680

SE Coef
15.04
0.08552

5.48

0.000

-2.42

0.022

R-Sq = 20.3%

g)

Circle the observation in the plot on the right hand side that represents a
male with an average running pulse rate of 170 beats per minute and
oxygen consumption of 60.055 ml per kg. This male had the largest
residual. Calculate the residual for this observation.

h)

Which is the better predictor of oxygen consumption during exercise,


resting pulse rate or running pulse rate? You must give a reason for your
answer.

- 16 -

Question 9 continued
i)

Write a thorough summary statement of your findings in regard to the


relation between oxygen consumption and each of the four predictors you
have investigated. Your summary should explain which predictors are
useful and which are not. Your summary should also explain which, if
any, of the predictors gives the best predictions on oxygen consumption
and give a reason for your choice. Your summary only needs to be one
paragraph.

Vous aimerez peut-être aussi