Vous êtes sur la page 1sur 15

ST3241

NATIONAL UNIVERSITY OF SINGAPORE

FINAL EXAMINATION

ST3241 CATEGORICAL DATA ANALYSIS I

(Semester 2: AY 2008–2009)

April 2009 — Time Allowed : 2 Hours

INSTRUCTIONS TO CANDIDATES

1. This examination paper contains THREE (3) questions and comprises FIFTEEN
(15) printed pages.

2. Answer ALL the questions correctly for a TOTAL of 60 marks.

3. Read the questions CAREFULLY, and label some important quantities clearly at
your own convenience.

4. All NOTATION in this paper is the same as that in lecture notes.

5. This is an OPEN BOOK examination.

Matriculation No:

Question 1 2 3 Total
Max. marks 17 25 18 60
Marks scored

1
ST3241

1. [17 marks] Suppose that whether there are insect problems in grass fields being
sprayed by 4 different insecticides, T = k, for k = A, B, C, D, is of interest. It is
believed that size S of the grass fields, either small or large, may also affect this
response. The following logit model M is fitted.

logit(π) = α + βiS + βkT ,

for i = small, large and k = A, B, C, D, where π = π(i, k) is the probability that a


grass field with size i and treated by insecticide k is free of any insect problems. The
following is part of the SAS output by proc logistic for fitting the above model.

_____________________________________________________________________________

Class Level Information

Class Value Design Variables

T A 1 0 0
B 0 1 0
C 0 0 1
D -1 -1 -1

S large 1
small -1

Model Convergence Status

Convergence criterion (GCONV=1E-8) satisfied.

Deviance and Pearson Goodness-of-Fit Statistics

Criterion Value DF Value/DF Pr > ChiSq

Deviance 7.9645
Pearson 7.9348

Number of unique profiles: 8

2
Model Fit Statistics

Intercept
Intercept and
Criterion Only Covariates

AIC 5127.626 3993.880


SC 5133.920 4025.351
-2 Log L 5125.626 3983.880

Testing Global Null Hypothesis: BETA=0

Test Chi-Square DF Pr > ChiSq

Likelihood Ratio 1141.7459 4 <.0001


Score 1052.5743 4 <.0001
Wald 846.9566 4 <.0001

Type 3 Analysis of Effects

Wald
Effect DF Chi-Square Pr > ChiSq

T 833.2472
S 50.0683 <.0001

Analysis of Maximum Likelihood Estimates

Standard Wald
Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -0.9442 0.0437 466.5444 <.0001


T A 1 1.3915 0.0638 475.7982 <.0001
T B 1 1.1323 0.0630 322.5619 <.0001
T C 1 -1.2289 0.0853 207.3704 <.0001
S large 1 -0.2777 0.0392 50.0683 <.0001

3
ST3241

(a) Construct a Wald confidence interval for exp(βAT − βCT ). Interpret.


[Hint: COV(β̂AT , β̂CT ) = −0.00188.]

4
ST3241
(b) Estimate the probability that there is insect problem in a small grass field treated
by insecticide D.

(c) Test at 1% significance level whether the response and different kinds of insecti-
cides are conditional independent. What is your conclusion?

5
ST3241
(d) Write the saturated model and find its maximized log-likelihood.

(e) Construct a model goodness-of-fit test, and interpret.

6
ST3241

2. [25 marks] Based on the “horseshoe crab” data displayed in P.76-77 of textbook or
P.64 of lecture notes, we are interested in understanding the probability (π) of pres-
ence of satellites residing nearby a female horseshoe crab by using two explanatory
variables, namely, spine condition (S) and weight in kg (Wt). There are 3 different
levels for S, namely, “both good” (S = 1), “one worn or broken” (S = 2), and “both
worn or broken” (S = 3). Given below is part of the SAS output by proc genmod for
fitting the logistic regression model
logit(π) = α + β1 Wt + β2 s2 + β3 s3 (1)
( (
1, if S = 2 1, if S = 3
where π = π(Wt, s2, s3), s2 = , and s3 = .
0, otherwise 0, otherwise
______________________________________________________________________________

Estimated Covariance Matrix

Prm1 Prm2 Prm3 Prm4

Prm1 1.01461 -0.35315 -0.27096 -0.20893


Prm2 -0.35315 0.14578 0.04617 0.02057
Prm3 -0.27096 0.04617 0.47104 0.16562
Prm4 -0.20893 0.02057 0.16562 0.20444

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF

Deviance 169 195.3327 1.1558


Log Likelihood -97.6663

Analysis Of Parameter Estimates

Likelihood Ratio
Standard 95% Confidence Chi-
Parameter DF Estimate Error Limits Square Pr > ChiSq

Intercept 1 -3.6527 1.0073 -5.7215 -1.7489 13.15 0.0003


Wt 1 1.7881
s2 1
s3 1 0.0708 0.4522 -0.8372 0.9498 0.02 0.8755

7
ST3241

(a) Suppose that the prediction equation of model (1) says that with equal weight,
the estimated odds of presence of satellites for a crab with one spine worn or
broken is 0.741 times that for a crab with both spines in good condition. State
the equation.

(b) Construct a 95% confidence interval for the effect of weight on presence of satel-
lites. Interpret.

8
ST3241
(c) (i) Based on the available SAS output, is it possible that one can carry out the
likelihood-ratio test on the effect of spine condition (S) given the presence of
Wt in model (1)? Give your answer as “Yes” or “No”.

(ii) If your answer in (i) is “Yes”, carry out the test; or


If your answer in (i) is “No”, describe the test and state clearly what
additional information is needed.

(d) Show that the asymptotic standard error of the sample logit, logit(π̂(2.55, 0, 0))
in model (1), is given by 0.4018 (up to 4 decimal places).

9
ST3241
(e) Construct a 95% confidence interval for the probability of presence of satellites
residing nearby a female crab with both spines in good condition and weight
2.55kg to test whether the chance is equal to one-half.

(f) Among a group of 5 female crabs, all with both spines in good condition and
the same weight wkg, estimate the chance that there are 3 out of 5 of them with
no satellite in terms of w. [Hint: Assume independence between all the crabs.]

10
ST3241
(g) Is it valid to use the deviance of value 195.3327 from the SAS output to perform an
overall goodness-of-fit test of the model (1) against the saturated model? Explain.

(h) In the likelihood-ratio test for H0 : β3 = 0, there is not enough evidence to reject
H0 at 1% significance level.
(i) write the resulting model, and
(ii) describe the association between spine condition and presence of satellites in
terms of the parameters in your written model for (h)(i).

11
ST3241

3. [18 marks] Suppose that whether there are imperfections in wafers being treated by 4
different lubricants, A, B, C and D, is of interest. Given the number of imperfections in
a sample of 4000 wafers, denoted by y1 , y2 , . . . , y4000 , such that the sample proportions
of “good” wafers (i.e., without imperfections) in 1000 wafers treated by A is pA = 0.554,
in 1000 wafers treated by B is pB = 0.546, in 1000 wafers treated by C is pC = 0.105,
and in 1000 wafers treated by D is pD , respectively. Suppose that the following logit
model is fitted.
logit(π) = βC TC + βD TD , (2)

where π = πk , k = A, B, C, D, is the probability that there is no imperfection in a


wafer treated by lubricant k,
(
1 if a wafer is treated by lubricant C
TC = ,
0 otherwise

and TD is a similarly defined indicator variable for the event whether a wafer is treated
by lubricant D.

(a) Fill in the following table to summarize the binary data:

Wafer
Lubricant Good Bad Total
A
B
C
D
Total 1304

(b) Estimate the parameters in model (2).

12
ST3241
Suppose that an alternative model is also of interest,

logit(π) = α + βTAB , (3)

where (
1 if a wafer is treated by lubricants A or B
TAB = .
0 otherwise

Answer parts (c)–(e) based on model (3).

(c) Write down πk , for k = A, B, C, D.

(d) Show that the likelihood function of the data under model (3) is given by

e1304α+1100β
.
[(1 + eα )(1 + eα+β )]2000

13
ST3241
(e) Is it possible that you can find the fitted equation? Why? If “Yes”, find it.

(f) Suppose that the maximized log-likelihoods of both models (2) and (3), respec-
tively denoted by L(2) and L(3) , are given. Describe how to choose between the
two models.

14
ST3241

[END OF PAPER]

15

Vous aimerez peut-être aussi