Académique Documents
Professionnel Documents
Culture Documents
(SEMESTER 2) 2010
S = 279.892
R-Sq = 95.9%
R-Sq(adj) = 92.6%
(b) Full estimated regression equation: Y = 518.1 + 186.74X1 + 42.14X2 + 97.71X3 - 40.85X4 + , (c) Significance of the predictor variables: Variable X4 (Morale) should be removed, since it has the largest p-value. Its p-value of 0.417 (or 41.7%) is greater than 0.10 (or 10%), hence we have no evidence against the null hypothesis, H0: 4 = 0. Thus, we retain H0 and conclude that 4 is the least significant coefficient in the model. Morale is not an important predictor of production. Hence we will remove X4 (Morale) from the model, and re-fit the model using only Shifts, Bonus and Overtime. Minitab output:
Regression Analysis: Production versus Shifts, Bonus, Overtime
The regression equation is Production = 250 + 201 Shifts + 36.5 Bonus + 101 Overtime Predictor Constant Shifts Bonus Overtime S = 274.725 Coef 250.1 201.40 36.49 100.986 SE Coef 318.4 46.05 13.26 9.569 T 0.79 4.37 2.75 10.55 P 0.462 0.005 0.033 0.000
N (0, 279.8922)
R-Sq = 95.2%
R-Sq(adj) = 92.8%
Variables X1, X2 and X3 have p-values of less than 0.05, we would reject the hypotheses: 2
and
Thus, number of shifts worked, bonus rates paid, and the average hours of overtime are important indicators of production. Final regression equation model: Y = 250.1 + 201.40 X1 + 36.49 X2 + 100.986 X3 + , N (0, 274.7252)
(d) The R2 statistic R2 = 95.2% The R2 has only reduced slightly from 95.9% to 95.2% after removing one predictor variable from the full regression model. 95.2% of the variation in production is explained by the variation in the number of shifts worked, bonus rates paid, and the average hours of overtime.
R-Sq = 95.2%
R-Sq(adj) = 92.8%
Analysis of Variance Source Regression Residual Error Total Source Shifts Bonus Overtime DF 1 1 1 DF 3 6 9 SS 9036966 452844 9489810 MS 3012322 75474 F 39.91 P 0.000
An overall test of the model: H0: 1 = 2 = 3 = 0 H1: At least one of 1, 2 and 3 is not zero F = 39.91 The p-value is 0.000 and hence it is less than 0.01 (or 1%) which is very small. Therefore we have strong evidence to reject the H0. Therefore we accept H1: some, or all, of the predictor variables used in the fit are useful in predicting the production, hence this model is useful for prediction.
(f) If X1 = 6, X2 = 22 and X3 = 15 Y = 250.1 + 201.40 X1 + 36.49 X2 + 100.986 X3 Y = 250.1 + (201.40 6) + (36.49 22) + (100.986 15) Y = 3776.07
[ ] (b) P-value = 0.031 Reject the null hypothesis because has a small p-value, smaller than 0.05 we have moderate evidence against , and so we reject in favour of the alternative : it appears that the number of hours worked is an important predictor of obesity.
(c) X = 40 ( | = 0.79793 There is a 0.79793 chance of a worker being obese if he usually works a 40 hour week.
5. Pennie Enterprises (a) Scatterplot of average weekly productivity against the length of service
Scatterplot of Productivity against Length of Service
400 350
300 250 200 150 100 0 20 40 60 80 Length of Service (months) 100 120
Comment: There is a non-linear relationship between productivity and length of service. It appears to have a negatively-curved relationship. (b) Minitab output:
Regression Analysis: Productivity versus Service, ServiceSquared
The regression equation is Productivity = 398 - 7.31 Service + 0.0414 ServiceSquared Predictor Constant Service ServiceSquared S = 29.6329 Coef 398.49 -7.3072 0.041411 R-Sq = 93.7% SE Coef 19.59 0.9099 0.007696 T 20.34 -8.03 5.38 P 0.000 0.000 0.000
R-Sq(adj) = 92.6%
( 6
(c) (i) Both X = Service and X2 = Service2 are important predictor variables in the model, since both 1 and 2 have small p-values, and so we would reject the null hypothesis: H0: B1=0 and H0: B2=0. Therefore, X and X2 are both significant predictors of Y (productivity). (ii) R2= 93.7% 93.7% of the variation in productivity (Y) is explained by the variation in the length of service (X) and service2 (X2). (d) Histogram of residuals
Histogram
Frequency
-40
-20
0 Residual
20
40
(response is Productivity) 50
25
Residual
-25
Comment: The quadratic regression model will only be valid, if the assumption that the random term is Normally distributed with mean zero and constant variance 2 is true. The histogram of residuals does not seem to peak at about zero, and is not normally distributed with irregular peaks around its highest point. Moreover, the graph is not bellshaped, hence the assumption of Normality does not seem to be plausible. However, the plot of residuals does not seem to show any pattern, it shows just a random scatter of points, and this means that the assumption that the residuals are distributed with constant variance is plausible. Since the assumptions regarding the residuals are not adequately verified, the quadratic regression model is invalid.
(e) A Scatterplot of average weekly productivity (Y) against length of service (X) with fitted quadratic regression line
Fitted Line Plot
350 300 250 200 150 100 0 20 40 60 80 Length of Service (months) 100 120
150
Revenue*
100
50
Q3 2001
Q3 2002
Q3 2003
Q3 2004
Q3 2005
Q3 2006
Q3 2007
Q3 2008
Q3 2009
Comment: There is an obvious positive trend, with clear and simple seasonality or cyclical variation. Revenue also seems to peak in the 4th quarter during around October to December every year. There is no obvious outlier.
R-Sq = 100.0%
R-Sq(adj) = 100.0%
10
Regression equation: Y* = 50.712 + 3.481T + , Significance of the slope: Yes, the slope term 1 is significant since the p-value is 0.000 it is less than 0.05 which means that we reject the null hypothesis, H0: 1 = 0 and hence we accept the alternative hypothesis that H1: 1 0, which means that the slope is significant. ~ N (0, 0.2441172)
4.
Time Series Plot of Revenue, MA, Trend
200
Variable Rev enue MA Trend
150
Data
100
50
0 Quarter Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
11
0.50
0.25
Residuals
0.00 -0.25
-0.50 Quarter Q3 Year 2000 Q3 2001 Q3 2002 Q3 2003 Q3 2004 Q3 2005 Q3 2006 Q3 2007 Q3 2008 Q3 2009
7. Partial Autocorrelation
Partial Autocorrelation Function for Residuals
(with 5% significance limits for the partial autocorrelations) 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 3 4 5 Lag 6 7 8 9 10
Partial Autocorrelation
12
8. AR(1)
12. Forecast revenue in April-June 2010 (time-point 40) Y* = 50.712 + 3.481T + , i. ii. Estimate of the trend: Full forecast: 0.017677 + 189.952 + (-26.3280) = 163.642, Or about 163, 642 Australian dollars 95% Confidence Interval for forecasted revenue in April- June 2010 Lower: -0.478117 + 189.952 + (-26.3280) = 163.146 Upper: 0.513472 + 189.952 + (-26.3280) = 164.137 i.e. about (163,146, 164,137) Australian dollars. ~ N (0, 0.2441172)
iii.
13
13. Forecast revenue in July-September 2010 (time-point 41) Y* = 50.712 + 3.481T + , i. ii. Estimate of the trend: Full forecast: 0.025440 + 193.433 + (-28.2082) = 165.250, Or about 165, 250 Australian dollars 95% Confidence Interval for forecasted revenue in July-September 2010 Lower: -0.546370 + 193.433 + (-28.2082) = 164.678 Upper: 0.597250 + 193.433 + (-28.2082) = 165.822 i.e. about (164,678, 165,822) Australian dollars. ~ N (0, 0.2441172)
iii.
14.
Histogram
(response is Residuals) 10
8
Frequency
-0.4
-0.2
0.0 Residual
0.2
0.4
0.6
14
Versus Fits
0.50
Residual
0.25
0.00
-0.25
-0.50 -0.4 -0.3 -0.2 -0.1 0.0 0.1 Fitted Value 0.2 0.3 0.4 0.5
Comment: The histogram of residuals does not seem to be normally distributed. It peaks roughly at about zero, however it does not tail-off smoothly at both sides. Hence, the assumption of Normality is not plausible. However, the plot of residuals does not seem to show any pattern, it shows just a random scatter of points, and this means that the assumption that the residuals are distributed with constant variance is plausible. However, since the assumptions regarding the residuals are not adequately verified, the quadratic regression model is invalid.
15
_ _ X=5
LCL=4.8994
1 1 1
23
Comment: It appears that chance variation occurs between samples 6 and 7, which these samples have gone out of the lower control limit however the successive samples have gone back up within the specification limits. However, samples 21, 22, 23 and 24 appear to be assignable variations as four consecutive samples go out of control, and here, a level shift seems to occur.
16
4.
X-bar Chart for Dulux Paint tin-filling Process
(2-Sigma Control Limits) 5.10 5.05 +2SL=5.0671 _ _ X=5
-2SL=4.9329
1 1 1
23
Comment: The mean = 5 does not change in the -charts for both both 3-sigma control limits and 2sigma control limits. However the lower- and upper-control limits of this chart have changed that the area within the specification limits becomes narrower as opposed to that of 3-sigma control limits. The 2-sigma control limits have led to more samples to appear out of control, with more chances of false alarm occurring.
17
8.
X-bar Chart for Dulux Paint tin-filling Process
5.10 5.05
1
UCL=5.0817
_ _ X=4.9613
LCL=4.8408
23
9. The -chart using and S shows a much higher UCL and a much lower LCL than when we assumed and were known, and it did not show the chance variation which appeared in the -chart with known and . The -chart using and S is more reliable as it uses estimates from the data, as it is unrealistic to assume that we know the and .
18