Vous êtes sur la page 1sur 3

STQS 2233/ STQS 2234: STATISTICAL MODELLING

ASSIGNMENT 1

1. Consider the data below.


i 1 2 3 4 5 6 7
X 5 7 4 8 6 3 9
Y 13 20 10 22 15 8 25
7 7 7

x y
i 1
i i  760 x i 1
i  42 y
i 1
i  113
7 7

 xi2  280
i 1
y
i 1
2
i  2067 SSE  2.714

From the above data,

a) Determine the least square estimates of the intercept,  0 and the slope,  1
b) Determine the least square estimates of the residual variance, ˆ 2
c) Test  0 at 90% confidence limit, and state the fitted regression model.
d) What assumptions that you should hold to say that the model that you fit is valid?

2. It is widely believed that students who spend more hours studying for an examination will
get higher score (%). Data below shows scores for 15 students taken randomly and the
studying time they spent before a particular examination.

i 1 2 3 4 5 6 7 8
Score 42 44 51 48 51 54 57 54
Hours 0 0 1 1 2 2 3 3

i 9 10 11 12 13 14 15
Score 57 63 61 69 70 70 70
Hours 4 4 5 5 6 6 7

X 2
i  231  X  49
i  X Y  3102
i i

Y i
2
 50687 Y  861
i MSE=6.5
Assume that the simple linear regression fits the data, answer the following questions.
a) What is the estimated change in the score for each additional hour of studying time?
b) What is the estimated regression function?
c) On the average, what is the score of a student who did not study at all for the
examination?
d) Obtain the 95% confidence interval for the score of this student. If you are going to sit for
the same examination, are you confident that you can obtain at least 50% if you don’t
study for it?

3. Table below present the data for 24 houses sold in Erie, Pennsylvania.

No. Sale price of the Taxes (local, school, county)/


house/1000, y 1000, x
1 25.9 4.9176
2 29.5 5.0208
3 27.9 4.5429
4 25.9 4.5573
5 29.9 5.0597
6 29.9 3.8910
7 30.9 5.8980
8 28.9 5.6039
9 35.9 5.8282
10 31.5 5.3003
11 31.0 6.2712
12 30.9 5.9592
13 30.0 5.0500
14 36.9 8.2464
15 41.9 6.6969
16 40.5 7.7841
17 43.9 9.0384
18 37.5 5.9894
19 37.9 7.5422
20 44.5 8.7951
21 37.9 6.0831
22 38.9 8.3607
23 36.9 8.1400
24 45.8 9.1416
(Source: “Prediction, Linear Regression and Minimum Sum of Relative Errors, by S.C. Narula
and J.F. Wellington, Technometrics,19, 1977. Also see “Letter to the Editor”, Technometrics,
22, 1980.

a. Fit a simple linear regression model relating selling price of the house to the current
taxes.
b. Test for significance of regression.
c. What percent of the total variability in selling price is explained by this model?
d. Find a 95% confidence interval on  1 .
e. Find a 95% CI on the mean selling price of a house for which the current taxes are $750.
4. Consider the data shown below. Construct a scatter diagram and suggest an appropriate
form for the regression model. Fit this model to the data and conduct the standard tests of
model adequacy.

x 10 15 18 12 9 8 11 6
y 0.17 0.13 0.09 0.15 0.2 0.21 0.18 0.24

5. A manager of a company wants to study the relationship between the size of a crystal
determined by its weight (gm) and the number of days it takes for the crystal to grow to its
final size. The data is listed below.

i 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Weight (y) 0.08 1.12 4.43 4.98 4.92 7.18 5.57 8.40 8.81 10.81 11.16 10.12 13.12 15.04

Time 2 2 6 8 10 12 14 16 18 20 22 24 26 28
(x)

Assume that linear regression through the origin model is appropriate.

a. Estimate the regression function.


b. Estimate  1 with a 95% CI. Interpret your interval.
c. Plot the fitted regression line and the data. Does the linear regression function appear to
be a good fit?
d. Obtain the residuals, ei . Do they sum to zero? Plot the residuals against the fitted values,
yˆ i . What conclusion can be drawn from your plot?

6. An accountant wishes to predict direct labor cost(y) on the basis of the batch size (x) of a
product produced in a job shop. Data for 12 production runs are given in below:

Direct labor cost, y(x $100) 71 663 381 138 861 145 493 548 251 1024 435 772

Batch size, x 5 62 35 12 83 14 46 52 23 100 41 75

a. Construct a scatter plot of y versus x. Discuss whether the plot suggests a simple linear
regression appropriately.
b. Explaining the meaning of µy|x=60=β0 + β1(60).
c. Explain the meaning of the slope parameter β1 and intercept β0.
d. Verify that b0=18.4875 and b1=10.14626 by using the formulas.