Vous êtes sur la page 1sur 2

HOMEWORK 2 PART A

(DUE BY THE END OF CLASS ON SEP 28)

1. Linear regression with one independent variable (35 points)


The focal point of an agricultural research study was the relationship between
when a crop is planted and the amount of crop harvested. If a crop is planted too
early or too late, farmers may fail to obtain optimal yield and hence not make a
profit. An ideal date for planting is set by the researchers, and the farmers then
record the number of days either before or after the designated date. Data is in file
crop.txt. The column Days is the deviation (in days) from the ideal planting date,
and the column Yield the yield (in bushels per acre) of a wheat crop.
We can use the R function lm() to perform regression. R code for this analysis
provided below:
# read in data
crop <- read.delim ("crop.txt", header=TRUE)
# print data
crop
# linear regression
crop.lm <- lm (Days ~ Yield, data=crop)
# print summary of linear regression
summary (crop.lm)
The partial output of the regression analysis is as follows:
Call:
lm(formula = Yield ~ Days, data = crop)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 45.5176
1.1090 41.044 < 2e-16 ***
Days
-0.4029
0.1091 -3.693 0.00167 **
--Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1
1
Residual standard error: 4.795 on 18 degrees of freedom
Multiple R-squared: 0.431
F-statistic: 13.64 on 1 and 18 DF, p-value: 0.001666
1

HOMEWORK 2 PART A (DUE BY THE END OF CLASS ON SEP 28)

(1) (15 points; 1 per value) Reproduce all the numbers in this output. For each
value, provide first the formula with math symbols, and then the formula
with numbers plugged in. For p value calculation, you may need R functions
pt() and pf(). If R functions are used, include the R command lines.
(2) (5 points) Is there a significant linear relationship between Yield and Days?
Use the numbers given in the output to perform hypothesis testing, and
provide the five parts of hypothesis testing.
(3) (3 points) If the actual planting date is delayed by 5 days, what yield do you
expect? Also provide a 95% CI.
(4) (6 points; 3 per plot) Use the R function plot() to draw two plots: one is
the scatterplot of yield versus days, and the other the residuals versus yield.
Provide the plots in your answer. Describe briefly (no more than 3 sentences)
the pattern in each plot.
(5) (4 points) Use the R function qqnorm() to check the normality assumption.
Provide the plot, briefly describe the pattern, and explain whether the plot
supports or does not support the normality assumption.
(6) (2 points) Explain briefly what you think about the linear fit.

Vous aimerez peut-être aussi