Vous êtes sur la page 1sur 15

STAT 431

Homework # 3
Fall 2016

Submitted by:
Sandarva Murti Sharma
Graduate Student
Civil Engineering Department
Nov 28, 2016
Problem 1: The production manager of a large investment casting firm is studying
different methods to increase productivity in the workforce of the company. The
process engineer and personnel in the human resource department develop three new
incentive plans (B, C, D) for which they will design a study to compare the incentive
plans with the current plan (plan A). Twenty workers are randomly assigned to each
of the four plans. The response variable is the total number of units produced by each
worker during one month on the incentive plans. Use the information given above to
fill in the following ANOVA table.

Source SS DF MS F p value
Plan 236758 ? ? ? ?
Error 2972125 ? ?

Total ? ?

Answer:
Source SS DF MS F p value
Plan 236758 3 78919.33 2.02 0.12
Error 2972125 76 39106.91

Total 3208883 79

Problem 2: Researchers conducted an experiment to compare the average oral body


temperature for persons taking one of nine different medications often prescribed for
high blood pressure. The researchers were concerned that the effect of the drug may
be different depending on the severity of the patients high blood pressure disorder.
Patients with high blood pressure who satisfy the studys entrance criteria were
classified into one of the three levels of severity of the blood pressure disorder. The
patients were then randomly assigned to receive one of the nine medications. Each
patient in the study was given the assigned medication at 6am of the designated study
day. Temperatures were taken at hourly intervals beginning at 8am and continuing
for 10 hours. During this time, the patients were not allowed to do any physical
activity and had to lie in bed. To eliminate the variability of temperature readings
within a day, the average of the hourly determination was the recorded response for
each patient. The data set is in file medication.TXT.

1. (3 points) Identify the design of this experiment and briefly explain your reasoning.
Answer:
The design of this experiment is completely randomized design with two factors. This is
because it has two factors of interest and no block.
2. (3 points) Produce a profile plot with appropriate labels and a caption. Use the R code
provided with this assignment and fill in R commands where necessary. Briefly describe
the features of the profile plot.
Answer:

We have 3 severity levels. We can observe interaction between the variables in each
of the severity levels. We can also observe that there is difference in temperature level
within 9 medication levels.

3. (15 points) Construct the ANOVA table with numbers filled in (Can use anova) in R to
generate this table or calculate by hand). Include in the table the p values for relevant F
statistics.
Answer:
4. (3 points) Write down the ANOVA model, using and to represent the two variables,
respectively.
Answer:
yijk = + i + j + ij + Eijk
: overall mean (reference)
: additional effect of ith level of factor of interest (severity)
: additional effect of jth level of factor of interest (medication)
Eijk : random error

5. (6 points) Use the ANOVA table to answer the following hypothesis testing questions:
(a) Does the effect of the drug depend on the severity of the patients high blood pressure
disorder at significance level of I = 0.05? Write down the null hypothesis using math
symbols from part 4. Is the conclusion supported by the profile plot?
Answer:
H0 : ()11= ()12=()13=()19=()21=.=()39=0
The p value for interaction is 0.22 which is greater than 0.05. By this we can say that we fail
to reject the null hypothesis. The profile plot doesnt support the conclusion because we can
see the interactions in the profile plots but we failed to establish that fact (null hypothesis).
The effect of the drug does not depend on the severity of the patients high blood pressure
disorder at significance level of I = 0.05.

(b) Do different medications have different effect on the average oral body temperature at
significance level of I = 0.05? Write down the null hypothesis using math symbols from
part 4. Is the conclusion supported by the profile plot?
Answer:
H0 : 1= 2= 3=0
The p value for medication alone is less than 0.05. Therefore, we reject null hypothesis. Hence,
we conclude that medications have different effect on the average oral body temperature at
significance level of I = 0.05. It is statistically significant. Yes, the profile plot supports the
conclusion.

(c) Does the severity of high blood pressure have an impact on the average oral body
temperature at significance level of I = 0.05? Write down the null hypothesis using math
symbols from part 4. Is the conclusion supported by the profile plot?
Answer:
:
H0 1= 2= 3..= 0
The p value severity level is 0.0009 which is less than 0.05. Hence, we reject the null
hypothesis. Hence, we conclude that the severity of high blood pressure have an impact on the
average oral body temperature at significance level of I = 0.05.
6. (10 points; 2 points per question) Use a general linear model to estimate the effect sizes
of different medications and severity levels. Break down your answer into the following
parts:
(a) Write down the linear model before estimating the coefficients. Use s to represent the
coefficients. Use dummy variables to represent each level. Specify the baseline.
Y= 0 + 1X1 + 2X2 + 3X3+ 4X4+ 5X5 + 6X6 + 7X7 + 8X8 + 9X9 + 10X10 + 11X1X3 + 12X2X3
+ 13X1X4 + 14X2X4 + 15X1X5 + 16X2X5 + 17X1X6+ 18X2X6 + 19X1X7 + 20X2X7 + 21X1X8

+ 22X2X8 + 23X1X9+ 24X2X9 + 25X1X10 + 26X2X10


X1 : Severity 1, ow 0
X2 : Severity 2, ow 0
X3: medicationB, ow 0
X4: medicationC, ow 0
X5: medicationD, ow 0
X6: medicationE, ow 0
X7 :medicationF, ow 0
X8 :medicationG, ow 0
X9 :medicationH, ow 0
X10 :medicationI, ow 0
0 = baseline
(b) Report the coefficient estimates and their p values.
Call:
lm(formula = Temperature ~ Severity + Medication + Severity *
Medication, data = med)

Residuals:
Min 1Q Median 3Q Max
-0.36 -0.09 0.00 0.10 0.64

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept ) 97.48000 0.07008 1390.995 < 2e-16 ***
Severity2 -0.02000 0.09911 -0.202 0.840451
Severity3 0.18000 0.09911 1.816 0.072111 .
MedicationB 0.44000 0.09911 4.440 2.18e-05 ***
MedicationC 0.44000 0.09911 4.440 2.18e-05 ***
MedicationD -0.04000 0.09911 -0.404 0.687302
MedicationE 0.32000 0.09911 3.229 0.001647 **
MedicationF 0.32000 0.09911 3.229 0.001647 **
MedicationG -0.04000 0.09911 -0.404 0.687302
MedicationH 0.46000 0.09911 4.641 9.78e-06 ***
MedicationI 0.36000 0.09911 3.632 0.000431 ***
Severity2: MedicationB -0.20000 0.14016 -1.427 0.156478
Severity3: MedicationB -0.32000 0.14016 -2.283 0.024380 *
Severity2: MedicationC -0.06000 0.14016 -0.428 0.669441
Severity3: MedicationC -0.04000 0.14016 -0.285 0.775891
Severity2: MedicationD 0.12000 0.14016 0.856 0.393798
Severity3: MedicationD 0.16000 0.14016 1.142 0.256160
Severity2: MedicationE -0.02000 0.14016 -0.143 0.886797
Severity3: MedicationE -0.08000 0.14016 -0.571 0.569333
Severity2: MedicationF 0.06000 0.14016 0.428 0.669441
Severity3: MedicationF -0.10000 0.14016 -0.713 0.477089
Severity2: MedicationG 0.08000 0.14016 0.571 0.569333
Severity3: MedicationG -0.06000 0.14016 -0.428 0.669441
Severity2: MedicationH -0.02000 0.14016 -0.143 0.886797
Severity3: MedicationH -0.26000 0.14016 -1.855 0.066318 .
Severity2: MedicationI 0.06000 0.14016 0.428 0.669441
Severity3: MedicationI -0.02000 0.14016 -0.143 0.886797
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Residual standard error: 0.1567 on 108 degrees of freedom


Multiple R-squared: 0.6226, Adjusted R-squared: 0.5318
F-statistic: 6.854 on 26 and 108 DF, p-value: 2.684e-13
(c) Apply Bonferroni correction and decided which coefficient estimate is significantly at E
= 0.05.
(E)0.05/26
( Intercept) 0 0.001923
Severity2 0.032325 0.001923
Severity3 0.072111 0.001923
MedicationB 2.18e-05 *** 0.001923
MedicationC 2.18e-05 *** 0.001923
MedicationD 0.687302 0.001923
MedicationE 0.001647 ** 0.001923
MedicationF 0.001647 ** 0.001923
MedicationG 0.687302 0.001923
MedicationH 9.78e-06 *** 0.001923
MedicationI 0.000431 *** 0.001923
Severity2: MedicationB 0.156478 0.001923
Severity3: MedicationB 0.024380 * 0.001923
Severity2: MedicationC 0.669441 0.001923
Severity3: MedicationC 0.775891 0.001923
Severity2: MedicationD 0.393798 0.001923
Severity3: MedicationD 0.25616 0.001923
Severity2: MedicationE 0.886797 0.001923
Severity3: MedicationE 0.569333 0.001923
Severity2:MedicationF 0.669441 0.001923
Severity3:MedicationF 0.477089 0.001923
Severity2:MedicationG 0.569333 0.001923
Severity3:MedicationG 0.669441 0.001923
Severity2:MedicationH 0.886797 0.001923
Severity3:MedicationH 0.066318 0.001923
Severity2:MedicationI 0.669441 0.001923
Severity3:MedicationI 0.886797 0.001923

The significant variables are


Intercept
Medication B
Medication C
Medication E
Medication F
Medication I
Medication H
(d) Report the fitted linear model. You may either retain only the significant predictors, or
include all the predictors and specify which of them have a significant coefficient.
Answer:
The fitted linear model with only significant variables is
Y= 0 + 3x3 + 4x4+ 6x6 + 7x7 + 9x9 + 10x10

(e) Generate appropriate plots to check the normality and constant variance assumptions.

The normal Q-Q plot shows that the data are normally distributed and data point 45 is an outlier.
The residuals vs fitted plot shows that the residuals are more or less aligned to the zero line.
Problem 3: A quality control engineer is considering implementing a workshop to
instruct workers on the principles of total quality management (TQM). The program
would be quite expensive to implement across the whole corporation; hence the
engineer has designed a study to evaluate which of four types of workshops would be
most effective. The response variable will be the increase in productivity of the
worker after participating in the workshop. Since the effectiveness of the workshop
may depend on the workers preconceived attitude concerning TQM, the workers are
given an examination to determine their attitude prior to taking the workshop. Their
attitudes are classified into five groups. There are four workers in each group, and
the type of workshop is randomly assigned to the workers within each group. The
increases in productivity are given in workshop.TXT. No workshop-attitude
combination is expected to have extra effect on productivity.

(1) (3 points) Identify the design of this experiment and briefly explain your
reasoning.
The design of this experiment is randomized block design. Because it has 4
workshops as factor of interest, one blocking factor as attitude with five levels.

(2) (3 points) Write down the ANOVA model, using and to represent the two
variables.
yij = + i + j + Eij
: overall mean (reference)
: additional effect of ith level of factor of interest (workshops)
: additional effect of jth level of block (attitude)
Eij : random error

(3) (13 points) Construct the ANOVA table with appropriate numbers (Can use
anova() in R to generate this table). Include in the table the p values for
relevant F statistics.

> work.lm=lm(Productivity~Attitude+Workshop , data=work)


> anova(work.lm)
Analysis of Variance Table

Response: Productivity
Df Sum Sq Mean Sq F value Pr(>F)
Attitude 1 2160.90 2160.90 47.6459 5.039e-06 ***
Workshop 3 922.55 307.52 6.7805 0.004142 **
Residuals 15 680.30 45.35
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
(4) (6 points) Generate a profile plot with appropriate labels and a caption.
Modify the R code for Question 2. Briefly describe the features in this profile
plot.

The profile plot shows that there is no interaction between the attitudes except the attitude 1
and attitude 2. There are 5 attitude level and 4 workshops. The productivity seems increasing
as the workshop increases.

(5) (10 points; 2 points per question) Use a general linear model to estimate the
effect sizes of different workshops and attitudes. Break down your answer into
the following parts:
(a) Write down the linear model before estimating the coefficients. Use s to
represent the coefficients. Use dummy variables to represent each level.
Specify the baseline.
The linear model is
Y= 0+ 1X1 +2X2 + 3X3 + 4X4 + 5X5 + 6X6 + 7X7
The base line is Workshop 1.
(b) Report the coefficient estimates and their p values.

(c) Apply Bonferroni correction and decided which coefficient estimate is


significantly at E = 0.05.
Performing Bonferroni correction, we get 0.05/7=0.00714
The intercept, attitude 3, attitude 4, attitude 5, workshop C, workshop D are
significant variables.

(d) Report the fitted linear model. You may either retain only the significant
predictors, or include all the predictors and specify which of them have a
significant coefficient.
Y= 0 +2X2 + 3X3 + 4X4 + 6X6 + 7X7
Y= 0 +5.0X2 + 10.25X3 + 32.25X4 + 7.8X6 + 18.2X7
The fitted linear model above with significant variable.
(e) Generate appropriate plots to check the normality and constant variance
assumptions.
Problem 4: Exercise 16.18 (p 1037) A study was designed to evaluate whether socioeconomic
factors had an effect on verbalization skills of young children. Four socioeconomic
classes were defined and 20 children under the age of six were selected for the study. The
research hypothesis was that the mean verbalization skills would be different for the four
classes. The researchers determined that for young children there may be significant
gains in verbalization skills over only a few months. Thus, they decided to record the
exact age (in months) of each child. The verbalization skills (measured by testing) were
determined for each child. The data set is provided in verbal.TXT.

(1) (3 points) What is the experimental design? Why?


The experimental design is one way ANOVA and completely randomized design with one
factor and one covariate.

(2) (2 points) Generate a scatterplot of the data. Use different colors to distinguish
different socioeconomic classes (can use the R code provided with this assignment).
Provide an informative figure caption. Does the plot suggest a linear relationship in
each socioeconomic class?

Yes, the plot suggest a linear relationship in each economic class, but we could
also see some deviations in class 3.
(3) (3 points) Write down the general linear model for the data (NOT the fitted model).
Use s to represent the coefficients. Use dummy variables to represent each level.
Specify the baseline. Define the math symbols.

Y= 0+ 1X1 +2X2 + 3X3 + 4X4 + 5X2 X3+ 6 X2X4+ 7 X3X4+ E


0 : Intercept
: Coefficients

X1: Age
X2: 1 if class 2, ow 0
X3: 1 if class 3, ow 0
X4: 1 if class 4 , ow 0
The baseline is 0.

(4) Use the computer output on p1038-1039 to answer the following questions. Note that
there are a few typos in the computer output:
on p1039 the first time Model III appears. This should be Model II.
also on p1039, X2 (CD) appears twice. It should be X2 (C1).
(a) (2 points) Test whether the lines across the socioeconomic classes are parallel at
significance level of I = 0.05. Describe the null and alternative hypotheses. Provide the
observed test statistic, and p value.
H0: 5 = 6 = 7 =0
Ha: at least one of them is non-zero
F test: [(SSE2 SSE1)/ t-1] / [SSE1 / (N-2t)]
[(3316.8 3180.73)/ (4-1)] / [ 3180/ (80-2*4)]
=1.0269
p-value from R: 0.386
I = 0.05
Since, p-value from test is greater than significance level of 0.05. We fail to reject the null
hypothesis and say that lines across the socioeconomic classes are parallel at significance level
of I = 0.05.

(b) (2 points) Are there significant differences in the mean verbalization scores for the four
groups at significance level of I = 0.05? Describe the null and alternative hypotheses.
Provide the observed test statistic, and p value.
H0: 2=3 = 4 =0
Ha: at least one of them is non-zero
F test: [(SSE3 SSE2 )/ t-1] / [SSE2 / (N-t-1)]
[(8724.79 3316.83)/ (4-1)] / [ 3316.83/ (80-4-1)]
=40.762
p-value from R: 9.817987e-16
I = 0.05
Since, the p value is less than significant level 0.05. We reject the null hypothesis. We
conclude that there are significant differences in the mean verbalization scores for the four groups
at significance level of I = 0.05.

(c) (3 points) What is the fitted linear model for each socioeconomic class? You may either
retain only the significant predictors, or include all the predictors and specify which of
them have a significant coefficient. Use E = 0.05.

The fitted linear model is:


Y= 0+ 1X1 = 37.197 + 0.274 X1
Y= 0 + 2 + 1X1 = 37.197 + (-22.490) + 0.274 X1
Y= 0 + 3 + 1X1 = 37.197 + (-15.95) + 0.274 X1
Y= 0 + 4 + 1X1 = 37.197 + (-14.784) + 0.274 X1

(d) (2 points) Estimate the mean verbalization score in each socioeconomic class.
The mean verbalization score (from model 2) in each socioeconomic class are:

Class 2: (37.197-22.49) + (0.274 x 28.95) = 22.66


Class 3: (37.197-15.95) + (0.274 x 28.95) =29.20
Class 4: (37.197-14.784) + (0.274 x 28.95) =30.37