Statistics Practical Solutions - Correlations and Regression

ACE2013: STATISTICS FOR MARKETING AND MANAGEMENT
(SEMESTER 2) 2010
NAME: ATIQAH ISMAIL BSC MARKETING (N500)
PRACTICAL 1: CORRELATION AND REGRESSION

3. Atkins Service Corporation (a) Multiple linear regression analysis Minitab output:
Regression Analysis: Production versus Shifts, Bonus, Overtime, Morale
The regression equation is Production = 518 + 187 Shifts + 42.1 Bonus + 97.7 Overtime - 40.8 Morale Predictor Constant Shifts Bonus Overtime Morale Coef 518.1 186.74 42.14 97.71 -40.85 SE Coef 444.2 49.76 14.95 10.43 46.23 T 1.17 3.75 2.82 9.37 -0.88 P 0.296 0.013 0.037 0.000 0.417
S = 279.892
R-Sq = 95.9%
R-Sq(adj) = 92.6%
(b) Full estimated regression equation: Y = 518.1 + 186.74X1 + 42.14X2 + 97.71X3 - 40.85X4 + , (c) Significance of the predictor variables: Variable X4 (Morale) should be removed, since it has the largest p-value. Its p-value of 0.417 (or 41.7%) is greater than 0.10 (or 10%), hence we have no evidence against the null hypothesis, H0: 4 = 0. Thus, we retain H0 and conclude that 4 is the least significant coefficient in the model. Morale is not an important predictor of production. Hence we will remove X4 (Morale) from the model, and re-fit the model using only Shifts, Bonus and Overtime. Minitab output:
Regression Analysis: Production versus Shifts, Bonus, Overtime
The regression equation is Production = 250 + 201 Shifts + 36.5 Bonus + 101 Overtime Predictor Constant Shifts Bonus Overtime S = 274.725 Coef 250.1 201.40 36.49 100.986 SE Coef 318.4 46.05 13.26 9.569 T 0.79 4.37 2.75 10.55 P 0.462 0.005 0.033 0.000
N (0, 279.8922)
R-Sq = 95.2%
R-Sq(adj) = 92.8%
Variables X1, X2 and X3 have p-values of less than 0.05, we would reject the hypotheses: 2
H0: 1 = 0 H0: 2 = 0 H0: 3 = 0
and
Thus, number of shifts worked, bonus rates paid, and the average hours of overtime are important indicators of production. Final regression equation model: Y = 250.1 + 201.40 X1 + 36.49 X2 + 100.986 X3 + , N (0, 274.7252)
(d) The R2 statistic R2 = 95.2% The R2 has only reduced slightly from 95.9% to 95.2% after removing one predictor variable from the full regression model. 95.2% of the variation in production is explained by the variation in the number of shifts worked, bonus rates paid, and the average hours of overtime.
(e) F-test Minitab output:

Regression Analysis: Production versus Shifts, Bonus, Overtime
The regression equation is Production = 250 + 201 Shifts + 36.5 Bonus + 101 Overtime Predictor Constant Shifts Bonus Overtime S = 274.725 Coef 250.1 201.40 36.49 100.986 SE Coef 318.4 46.05 13.26 9.569 T 0.79 4.37 2.75 10.55 P 0.462 0.005 0.033 0.000
R-Sq = 95.2%
R-Sq(adj) = 92.8%
Analysis of Variance Source Regression Residual Error Total Source Shifts Bonus Overtime DF 1 1 1 DF 3 6 9 SS 9036966 452844 9489810 MS 3012322 75474 F 39.91 P 0.000
Seq SS 290858 340149 8405958
An overall test of the model: H0: 1 = 2 = 3 = 0 H1: At least one of 1, 2 and 3 is not zero F = 39.91 The p-value is 0.000 and hence it is less than 0.01 (or 1%) which is very small. Therefore we have strong evidence to reject the H0. Therefore we accept H1: some, or all, of the predictor variables used in the fit are useful in predicting the production, hence this model is useful for prediction.
(f) If X1 = 6, X2 = 22 and X3 = 15 Y = 250.1 + 201.40 X1 + 36.49 X2 + 100.986 X3 Y = 250.1 + (201.40 6) + (36.49 22) + (100.986 15) Y = 3776.07
PRACTICAL 2: FURTHER TOPICS IN REGRESSION

4. Obesity (a) Minitab output:
Logistic Regression Table Predictor Constant Hours worked Coef -8.19369 0.239177 SE Coef 4.17538 0.110792 Z -1.96 2.16 P 0.050 0.031 Odds Ratio 1.27 95% CI Lower Upper 1.02 1.58
[ ] (b) P-value = 0.031 Reject the null hypothesis because has a small p-value, smaller than 0.05 we have moderate evidence against , and so we reject in favour of the alternative : it appears that the number of hours worked is an important predictor of obesity.
(c) X = 40 ( | = 0.79793 There is a 0.79793 chance of a worker being obese if he usually works a 40 hour week.
5. Pennie Enterprises (a) Scatterplot of average weekly productivity against the length of service
Scatterplot of Productivity against Length of Service
400 350
Productivity (tens of units)
300 250 200 150 100 0 20 40 60 80 Length of Service (months) 100 120
Comment: There is a non-linear relationship between productivity and length of service. It appears to have a negatively-curved relationship. (b) Minitab output:
Regression Analysis: Productivity versus Service, ServiceSquared
The regression equation is Productivity = 398 - 7.31 Service + 0.0414 ServiceSquared Predictor Constant Service ServiceSquared S = 29.6329 Coef 398.49 -7.3072 0.041411 R-Sq = 93.7% SE Coef 19.59 0.9099 0.007696 T 20.34 -8.03 5.38 P 0.000 0.000 0.000
R-Sq(adj) = 92.6%
Full estimated regression equation:

,
( 6
(c) (i) Both X = Service and X2 = Service2 are important predictor variables in the model, since both 1 and 2 have small p-values, and so we would reject the null hypothesis: H0: B1=0 and H0: B2=0. Therefore, X and X2 are both significant predictors of Y (productivity). (ii) R2= 93.7% 93.7% of the variation in productivity (Y) is explained by the variation in the length of service (X) and service2 (X2). (d) Histogram of residuals
Histogram
(response is Productivity) 3.0 2.5 2.0 1.5 1.0 0.5 0.0
Frequency
-40
-20
0 Residual
20
40
Plot of the residuals against the fitted values

Versus Fits
(response is Productivity) 50
25
Residual
-25
-50 100 150 200 250 Fitted Value 300 350
Comment: The quadratic regression model will only be valid, if the assumption that the random term is Normally distributed with mean zero and constant variance 2 is true. The histogram of residuals does not seem to peak at about zero, and is not normally distributed with irregular peaks around its highest point. Moreover, the graph is not bellshaped, hence the assumption of Normality does not seem to be plausible. However, the plot of residuals does not seem to show any pattern, it shows just a random scatter of points, and this means that the assumption that the residuals are distributed with constant variance is plausible. Since the assumptions regarding the residuals are not adequately verified, the quadratic regression model is invalid.
(e) A Scatterplot of average weekly productivity (Y) against length of service (X) with fitted quadratic regression line
Fitted Line Plot
Productivity = 398.5 - 7.307 Service + 0.04141 Service**2 400

S R-Sq R-Sq(adj) 29.6329 93.7% 92.6%
Productivity (tens of units)
350 300 250 200 150 100 0 20 40 60 80 Length of Service (months) 100 120
PRACTICAL 4: FURTHER TIME SERIES AND FORECASTING

1.
Time Series Plot of Revenue
200
150
Revenue*
100
50
0 Quarter Q3 Year 2000
Q3 2001
Q3 2002
Q3 2003
Q3 2004
Q3 2005
Q3 2006
Q3 2007
Q3 2008
Q3 2009
*Revenue in thousands of Australian dollars
Comment: There is an obvious positive trend, with clear and simple seasonality or cyclical variation. Revenue also seems to peak in the 4th quarter during around October to December every year. There is no obvious outlier.
3. Regression table produced by Minitab:

Regression Analysis: MA versus Time index
The regression equation is MA = 50.7 + 3.48 Time index 35 cases used, 4 cases contain missing values Predictor Constant Time index S = 0.244117 Coef 50.7124 3.48142 SE Coef 0.0915 0.00409 T 553.99 852.10 P 0.000 0.000
R-Sq = 100.0%
R-Sq(adj) = 100.0%
10
Regression equation: Y* = 50.712 + 3.481T + , Significance of the slope: Yes, the slope term 1 is significant since the p-value is 0.000 it is less than 0.05 which means that we reject the null hypothesis, H0: 1 = 0 and hence we accept the alternative hypothesis that H1: 1 0, which means that the slope is significant. ~ N (0, 0.2441172)
4.
Time Series Plot of Revenue, MA, Trend
200
Variable Rev enue MA Trend
150
Data
100
50
0 Quarter Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Q3 Year 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
5. Seasonal Effects Minitab output:

Seasonal Indices Period 1 2 3 4 Index 22.7682 -26.3280 -28.2082 31.7679
11
6. Residual Series Time series plot of the residual series:

Time Series Plot of Residuals
0.75
0.50
0.25
Residuals
0.00 -0.25
-0.50 Quarter Q3 Year 2000 Q3 2001 Q3 2002 Q3 2003 Q3 2004 Q3 2005 Q3 2006 Q3 2007 Q3 2008 Q3 2009
7. Partial Autocorrelation
Partial Autocorrelation Function for Residuals
(with 5% significance limits for the partial autocorrelations) 1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 3 4 5 Lag 6 7 8 9 10
Partial Autocorrelation
12
8. AR(1)
9. Final Estimates of Parameters Minitab output:

Final Estimates of Parameters Type AR 1 Constant Mean Coef 0.5746 0.01528 0.03592 SE Coef 0.1344 0.04050 0.09520 T 4.27 0.38 P 0.000 0.708
10. Estimated model: Yt = 0.03592 + 0.5746(Yt-1 - 0.03592) + t
11. Forecasted values Minitab output:

Forecasts from period 39 Period 40 41 42 43 44 Forecast 0.017677 0.025440 0.029900 0.032463 0.033936 95% Limits Lower Upper -0.478117 0.513472 -0.546370 0.597250 -0.564876 0.624677 -0.569703 0.634630 -0.570651 0.638522 Actual
12. Forecast revenue in April-June 2010 (time-point 40) Y* = 50.712 + 3.481T + , i. ii. Estimate of the trend: Full forecast: 0.017677 + 189.952 + (-26.3280) = 163.642, Or about 163, 642 Australian dollars 95% Confidence Interval for forecasted revenue in April- June 2010 Lower: -0.478117 + 189.952 + (-26.3280) = 163.146 Upper: 0.513472 + 189.952 + (-26.3280) = 164.137 i.e. about (163,146, 164,137) Australian dollars. ~ N (0, 0.2441172)
iii.
13
13. Forecast revenue in July-September 2010 (time-point 41) Y* = 50.712 + 3.481T + , i. ii. Estimate of the trend: Full forecast: 0.025440 + 193.433 + (-28.2082) = 165.250, Or about 165, 250 Australian dollars 95% Confidence Interval for forecasted revenue in July-September 2010 Lower: -0.546370 + 193.433 + (-28.2082) = 164.678 Upper: 0.597250 + 193.433 + (-28.2082) = 165.822 i.e. about (164,678, 165,822) Australian dollars. ~ N (0, 0.2441172)
iii.
14.
Histogram
(response is Residuals) 10
8
Frequency
-0.4
-0.2
0.0 Residual
0.2
0.4
0.6
14
(response is Residuals) 0.75
Versus Fits
0.50
Residual
0.25
0.00
-0.25
-0.50 -0.4 -0.3 -0.2 -0.1 0.0 0.1 Fitted Value 0.2 0.3 0.4 0.5
Comment: The histogram of residuals does not seem to be normally distributed. It peaks roughly at about zero, however it does not tail-off smoothly at both sides. Hence, the assumption of Normality is not plausible. However, the plot of residuals does not seem to show any pattern, it shows just a random scatter of points, and this means that the assumption that the residuals are distributed with constant variance is plausible. However, since the assumptions regarding the residuals are not adequately verified, the quadratic regression model is invalid.
15
PRACTICAL 5: STATISTICAL PROCESS CONTROL

3.
X-bar Chart for Dulux Paint tin-filling Process
5.10 5.05 UCL=5.1006
Sample Mean (litres)
5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

1 1 1
_ _ X=5
LCL=4.8994
1 1 1
23
Comment: It appears that chance variation occurs between samples 6 and 7, which these samples have gone out of the lower control limit however the successive samples have gone back up within the specification limits. However, samples 21, 22, 23 and 24 appear to be assignable variations as four consecutive samples go out of control, and here, a level shift seems to occur.
16
4.
(2-Sigma Control Limits) 5.10 5.05 +2SL=5.0671 _ _ X=5
5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

1 1 1
-2SL=4.9329
1 1 1
23
Comment: The mean = 5 does not change in the -charts for both both 3-sigma control limits and 2sigma control limits. However the lower- and upper-control limits of this chart have changed that the area within the specification limits becomes narrower as opposed to that of 3-sigma control limits. The 2-sigma control limits have led to more samples to appear out of control, with more chances of false alarm occurring.
6. Probability of a false alarm 2-sigma control limits: Minitab output:

Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1 x -2 P( X <= x ) 0.0227501
Probability of a false alarm: 2 0.0227501 = 0.0455002
17
2.5-sigma control limits: Minitab output:

Cumulative Distribution Function
Normal with mean = 0 and standard deviation = 1 x -2.5 P( X <= x ) 0.0062097
Probability of a false alarm: 2 0.0062097 = 0.0124194
8.
5.10 5.05
1
UCL=5.0817
5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

1 1
_ _ X=4.9613
LCL=4.8408
23
9. The -chart using and S shows a much higher UCL and a much lower LCL than when we assumed and were known, and it did not show the chance variation which appeared in the -chart with known and . The -chart using and S is more reliable as it uses estimates from the data, as it is unrealistic to assume that we know the and .
18

Statistics Practical Solutions - Correlations and Regression

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Statistics Practical Solutions - Correlations and Regression

Transféré par

Droits d'auteur :

Formats disponibles

ACE2013: STATISTICS FOR MARKETING AND MANAGEMENT

NAME: ATIQAH ISMAIL BSC MARKETING (N500)

PRACTICAL 1: CORRELATION AND REGRESSION

H0: 1 = 0 H0: 2 = 0 H0: 3 = 0

(e) F-test Minitab output:

Seq SS 290858 340149 8405958

PRACTICAL 2: FURTHER TOPICS IN REGRESSION

Productivity (tens of units)

Full estimated regression equation:

(response is Productivity) 3.0 2.5 2.0 1.5 1.0 0.5 0.0

Plot of the residuals against the fitted values

-50 100 150 200 250 Fitted Value 300 350

Productivity = 398.5 - 7.307 Service + 0.04141 Service**2 400

Productivity (tens of units)

PRACTICAL 4: FURTHER TIME SERIES AND FORECASTING

0 Quarter Q3 Year 2000

*Revenue in thousands of Australian dollars

3. Regression table produced by Minitab:

5. Seasonal Effects Minitab output:

6. Residual Series Time series plot of the residual series:

9. Final Estimates of Parameters Minitab output:

10. Estimated model: Yt = 0.03592 + 0.5746(Yt-1 - 0.03592) + t

11. Forecasted values Minitab output:

(response is Residuals) 0.75

PRACTICAL 5: STATISTICAL PROCESS CONTROL

Sample Mean (litres)

5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

Sample Mean (litres)

5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

6. Probability of a false alarm 2-sigma control limits: Minitab output:

Probability of a false alarm: 2 0.0227501 = 0.0455002

2.5-sigma control limits: Minitab output:

Probability of a false alarm: 2 0.0062097 = 0.0124194

Sample Mean (litres)

5.00 4.95 4.90 4.85 4.80 1 3 5 7 9 11 13 15 Sample 17 19 21

Vous aimerez peut-être aussi