Vous êtes sur la page 1sur 5

# University of Waterloo

## Statistics 372 Fall 2005

Midterm Tuesday, November 1
st
, 1-2:20 pm

Student ID number: _______________________

Aids permitted: calculator

Tables of the Gaussian distribution and control chart constants are provided

Time permitted: 80 minutes

Instructions:

1. Check that your quiz has a total of 5 pages.

2. Answer all questions in the space provided. Use the back of the preceding page if necessary,
indicating clearly that you have done so.

Question Mark Possible
1 8
2 5
3 9
4 8
TOTAL 30

Control Chart Constants
r (sub-group size)
4
c
3
A
3
B
4
B
2 .7979 2.659 - 3.267
3 .8862 1.954 - 2.568
4 .9213 1.628 - 2.266
5 .9400 1.427 - 2.089
6 .9515 1.287 0.030 1.970
7 .9594 1.182 0.118 1.882
8 .9650 1.099 0.185 1.815
9 .9693 1.032 0.239 1.761
10 .9727 0.975 0.284 1.716
Constants for Xbar and s Charts

Xbar chart:
3
A s t s chart:
3 4
, B s B s

X chart: 2.66 y r t
2
1. Briefly answer each of the following unrelated questions.

a) Explain the purpose of phase I and II in control charting. [2 marks]

In phase I data are collected from a (hopefully) in-control process. The data are used to determine
control limits and setup a control chart.
In phase II we use the control chart produced in phase I for ongoing monitoring of the process. The
control limits are extended into the future and more data is collected from the process and added to the
control chart.

b) Suppose we fit the following regression model to a time series of quarterly sales data:
0 1 2 2 3 3
1 Y Q Q Q R + + + + , where
1 if first quarter
1
0 otherwise
Q

'

,
1 if second quarter
2
0 otherwise
Q

'

,
1 if third quarter
3
0 otherwise
Q

'

## , R ~ G(0, ) independent. Interpret the model parameter

2
. [2 marks]

2
represents the expected difference in average sales in the second quarter compared to the level in
Q4 (the baseline quarter with this model).

c) For an AR(1) model, i.e. Y Y A
t t t
+

( )
1
, where A G
t
~ ( , ) 0 derive the lag k autocorrelation,
denoted
k
. [4 marks]

With the restriction on the parameter | | 1 < we have

2 2
1 2 1 2
1
1 1
2
2
( , ) ( , )
( ..., ...)
( , ) ( , ) ...
, 1,2,...
1
t t k t t k
t t t t k t k t k
k k
t k t k t k t k
k
Cov Y Y Cov Y Y
Cov A A A A A A
Cov A A Cov A A
k

+

+ + + + + +
+ +

and similarly Var Y
t
( )

2
2
1

So the lag k autocorrelation, denoted
k
, betweenY
t
and Y
t k
is

k
t t k
t t k
k
Cov Y Y
Var Y Var Y
k

( , )
( ) ( )
, ,2, ... 1

3
2. Assume that the quality characteristic of interest X follows a Gaussian (Normal) distribution with mean
and standard deviation (assume and are known). We monitor the process using an X
control chart with subgroups of size 5.

a) If the process mean shifts up to + , what is the probability of detecting this magnitude of shift with
one subgroup? [2 marks]

The control limits for the X control chart are set at 3 5 t .
Denoting the plotted subgroup average as Y and assuming the process mean has shifted we have
( )
~ , 5 Y G + . Then, the chance of a signal is p =
( ) ( )
Pr 3 5 Pr 3 5 Y Y > + + < .
Standardizing, the probability is p =
3 5 3 5
Pr Pr
5 5
Z Z

_ _

> + <

, ,
=
( ) ( )
Pr 3 5 Pr 3 5 Z Z > + < , where
( )
~ 0,1 Z G . Using the Gaussian tables, p = 0.222

b) We may supplement the X chart in 2. a) with a runs rule that generates a signal if we see 2 out of 3
points in a row falling in zone A, where a point falls in zone A if it lies within the control limits and in
the upper or lower third of the in-control region (see figure).

centreline
upper control limit
lower control limit
zone A
zone A
X
b
a
r

With this runs rule what is the chance of a false alarm? [3 marks]

Prob(falling in zone A) = q =
( )
2Pr 2 5 3 5 Y + < < + . In control
( )
~ , 5 Y G . So,
Prob(falling in zone A) = q =
( )
2Pr 2 3 Z < < = 0.428, since
( )
~ 0,1 Z G .
To get 2 out of 3 in a row in Zone A the possibilities are: AA (signals before next observation), NAA,
or ANA. So chance of false alarm (i.e. signal when the process is in control) is
( )
2 2
2 1 q q q + = 0.005
4
3. At a delivery firm 40 drivers deliver packages. The data collected over the last year are:

Driver 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mistakes 6 1 0 14 0 2 18 2 5 13 1 4 6 5
Driver 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Mistakes 0 0 1 3 15 24 3 4 1 2 3 22 4 8
Driver 29 30 31 32 33 34 35 36 37 38 39 40
Mistakes 2 6 8 0 9 20 9 0 3 14 1 1

Some data summaries of the number of mistakes you may find useful include:
sd(mistakes) = 6.57
average moving range = 7.2, and directly from R
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 1.00 3.50 6.00 8.25 24.00

30
20
10
0
-10
-20
40 30 20 10
m
i
s
t
a
k
e
s
Index

a) Add appropriate control limits to the above plot. Explain your choice of control chart and show your
calculations. [3 marks]

Notice that the given data are not a time series. As such the given order is arbitrary. Also, the number of
mistakes is a count. As a result, the best choice is probably a control chart based on a Poisson
assumption. As in the exercises, using t3 sigma limits we get control limits at 3 c c t , where c is the
average number of mistakes. Since less than zero mistakes is not possible we use control limits at 0 and
13.3.

Without realizing the above, an X chart for individuals is the best. The control limits (derived from the
data summary) would be
( )
6 2.66 7.2 t = (0, 25.2).

b) The manager in charge of this operation has been issuing a disciplinary citation to drivers for each
mistake. What do you think of this managers approach? [2 marks]

The managers approach is poor. The manager is over reacting to natural variation in the system
whose causes are not under the control of the individual drivers. Punishing them for problems that are
likely not their fault makes no sense and will upset the drivers.

c) Explain how the control chart you created could help this manager analyze the performance of this group
of drivers. [2 marks]

Using the control chart the number of mistakes for each driver is put into some larger context. The
manager should concentrate attention on any drivers that fall outside the control limits. Otherwise to
improve the overall process s/he must address the system.

d) In this context explain what would be considered a special cause. [2 marks]

A special cause is an input that has a large effect on the performance of one (or a few) drivers but not
all the drivers. The special cause must act within a driver and not across all drivers.

5
4. In a machining process, problems occurred due to excess variation in the measurement system for the
diameter of a precision ground shaft. To explore the measurement system further the team conducted an
investigation where the diameter of the same (master) shaft was measured each hour for four days giving
a total of 32 diameter measurements. Since the true diameter of the master shaft is (assumed) known the
results are recorded as the measurement error (i.e. observed diameter minus true diameter). A plot of
the 32 measurement errors (denoted
t
y ) by time is given below (along with a table of the data).

Day 1 Day 2 Day 3 Day 4
0.18 -0.55 -1.53 0.49
0.21 -1.15 -1.43 0.05
-0.36 -1.94 -1.71 -0.02
0.26 -1.86 -1.27 0.32
0.39 -2.53 -1.23 -0.16
-0.13 -2.46 -1.26 0.16
-0.27 -1.59 -0.75 0.47
-0.35 -2.01 -0.65 0.22

The team could not find the cause of the pattern (they speculated the cause was in the environment).
They decided to improve the measurement system using a feedback controller. This required forecasting
the measurement error (for the master shaft) in the future and then recalibrating the measurement system
to compensate for large predicted measurement errors.

a) Given the observed data would you recommend forecasting with a regression model or a smoothing
method (e.g. moving average or exponentially weighted moving average - EWMA)? Explain. [2 marks]

There is no clear trend or seasonal pattern in the plot that is likely to continue into the future. As such a
regression model is not well suited to this data. Making a forecasting with a smoothing method is
preferred.

b) Suppose the team decided to use a feedback controller based on EWMA forecasts with the smoothing
parameter alpha equal to 0.2. In other words, the predicted measurement error at time t+1 (denoted
1

t
y
+
) made at time t is
1
0.2 0.8
t t t
y y y
+
+ . Using this approach, explain how could you determine an
approximate prediction interval for your prediction of the measurement error at time t+1. Note a
numerical answer is not required. [3 marks]

Using the EWMA smooth forecast on the existing data and calculate the mean squared error
( )
32
1

32
i i
i
y y
MSE

, where
i
y is the EWMA forecast for time i. Then an approximate prediction
interval for time t+1 is
1
2
t
y MSE
+
t . [Note in the notes MSE was defined with a square root.]

c) Another possible forecasting approach is to fit an AR(1) model, i.e. Y Y A
t t t
+

( )
1
, where
A G
t
~ ( , ) 0 independent. Fitting the model in R gives
ar1 intercept
0.8758 -0.4260
s.e. 0.0771 0.5341
sigma^2 estimated as 0.1899: log likelihood = -19.55, aic = 45.1

Use the above results to derive a prediction for
33
y and
34
y . [3 marks]

prediction for
33
y : To make a forecast we take the expected value of the model. Using the AR(1) model
and the R results:
( )
33 32

0 y y + + = -0.426+0.876(0.22+0.416) = 0.14

prediction for
34
y :
( )
34 33

0 y y + + = 0.07
30 20 10 0
0
-1
-2
hour
m
e
a
s
u
r
e
m
e
n
t

e
r
r
o
r