Minitab 16

© All Rights Reserved

54 vues

Minitab 16

© All Rights Reserved

- Real Statistics Examples Workbook
- PiecewiseRegression
- DMMethodsandModels_ChapterSummaries
- ISOM201 Class Notes
- DK5985_ch03
- Moore 16
- Objective Week1 6 July 2012
- epi.
- Lecture4 Module2 Anova 1
- cse546wi12LogisticRegression
- Process Capability Analysis in Minitab_Manual
- 24 Taguchi Design
- 8 Tsm Chapter08
- Beer
- Macroeconomics Project
- TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION INDICES OF 35 PYRAZINES
- Final Cheat Sheet Nadya
- 14-20
- DS_5
- Ppt-17

Vous êtes sur la page 1sur 60

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

2

Regression

Regression Overview, 2-2

I Regression, 2-3

I Stepwise Regression, 2-14

I Best Subsets Regression, 2-20

I Fitted Line Plot, 2-24

I Residual Plots, 2-27

Logistic Regression Overview, 2-29

I Binary Logistic Regression, 2-33

I Ordinal Logistic Regression, 2-44

I Nominal Logistic Regression, 2-51

See also,

I Resistant Line, Chapter 8

I Regression with Life Data, Chapter 16

STREGRSN.MK5 Page 1 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression Overview

2-2 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Regression Overview

Regression analysis is used to investigate and model the relationship between a

response variable and one or more predictors. MINITAB provides various least-squares

and logistic regression procedures.

I Use least squares regression when your response variable is continuous.

I Use logistic regression when your response variable is categorical.

Both least squares and logistic regression methods estimate parameters in the model so that the

fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain

parameter estimates, whereas MINITABs logistic regression commands obtain maximum

likelihood estimates of the parameters. See Logistic Regression Overview on page 2-29 for

more information about logistic regression.

Use the table below to assist in selecting a procedure:

Use to

response

type

estimation

method

Regression

(page 2-3)

perform simple or multiple regression: fit a model,

store regression statistics, examine residual diagnostics,

generate point estimates, generate prediction and

confidence intervals, and perform lack-of-fit tests

continuous least squares

Stepwise

(page 2-14)

perform stepwise, forward selection, or backward

elimination which add or remove variables from a

model in order to identify a useful subset of predictors

continuous least squares

Best Subsets

(page 2-20)

identify subsets the predictors based on the maximum

R

2

criterion

continuous least squares

Fitted Line Plot

(page 2-24)

perform linear and polynomial regression with a single

predictor and plot a regression line through the data;

on the actual or log

10

scale

continuous least squares

Residual Plots

(page 2-27)

generate a set of residual plots to use for residual

analysis: normal score plot, a chart of individual

residuals, a histogram of residuals, and a plot of fits

versus residuals

continuous least squares

Binary Logistic

(page 2-33)

perform logistic regression on a response with only two

possible values, such as presence or absence

categorical maximum

likelihood

Ordinal Logistic

(page 2-44)

perform logistic regression on a response with three or

more possible values that have a natural order, such as

none, mild, or severe

categorical maximum

likelihood

Nominal

Logistic

(page 2-51)

perform logistic regression on a response with three or

more possible values that have no natural order, such

as sweet, salty, or sour

categorical maximum

likelihood

STREGRSN.MK5 Page 2 Friday, December 17, 1999 12:23 PM

Regression Regression

MINITAB Users Guide 2 2-3

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Regression

You can use Regression to perform simple and multiple regression using the method of

least squares. Use this procedure for fitting general least squares models, storing

regression statistics, examining residual diagnostics, generating point estimates,

generating prediction and confidence intervals, and performing lack-of-fit tests.

You can also use this command to fit polynomial regression models. However, if you

want to fit a polynomial regression model with a single predictor, you may find it more

advantageous to use Fitted Line Plot (page 2-24).

Data

Enter response and predictor variables in numeric columns of equal length so that each

row in your worksheet contains measurements on one observation or subject.

MINITAB omits all observations that contain missing values in the response or in the

predictors, from calculations of the regression equation and the ANOVA table items.

h To do a linear regression

1 Choose Stat Regression Regression.

2 In Response, enter the column containing the response (Y) variable.

3 In Predictors, enter the columns containing the predictor (X) variables.

4 If you like, use one or more of the options listed below, then click OK.

Options

Graphs subdialog box

I draw five different residual plots for regular, standardized, or deleted residualssee

Choosing a residual type on page 2-5. Available residual plots include a:

STREGRSN.MK5 Page 3 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression

2-4 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

histogram.

normal probability plot.

plot of residuals versus the fitted values ( ).

plot of residuals versus data order. The row number for each data point is shown

on the x-axis (for example, 1 2 3 4 n).

separate plot for the residuals versus each specified column.

For a discussion, see Residual plots on page 2-6.

Results subdialog box

I display the following in the Session window:

no output

the estimated regression equation, table of coefficients, s, R

2

, and the analysis of

variance table

the default output, which includes the above output plus the sequential sums of

squares and the fits and residuals of unusual observations

the default output, plus the full table of fits and residuals

Options subdialog box

I perform weighted regressionsee Weighted regression on page 2-6

I exclude the intercept term from the regression by unchecking Fit Interceptsee

Regression through the origin on page 2-7

I display the variance inflation factor (VIFa measure of multicollinearity effect)

associated with each predictorsee Variance inflation factor on page 2-7

I display the Durbin-Watson statistic which detects autocorrelation in the residuals

see Detecting autocorrelation in residuals on page 2-7

I display the PRESS statistic and adjusted R-squared

I perform a pure error lack-of-fit test for testing model adequacy when there are

predictor replicatessee Testing lack-of-fit on page 2-8

I perform a data subsetting lack-of-fit test to test the model adequacysee Testing

lack-of-fit on page 2-8

I predict the response, confidence interval, and prediction interval for new

observationssee Prediction of new observations on page 2-9

Storage subdialog box

I store the coefficients, fits, and regular, standardized, and deleted residualssee

Choosing a residual type on page 2-5.

I store the leverages, Cooks distances, and DFITS, for identifying outlierssee

Identifying outliers on page 2-9.

Y

Regression Regression

MINITAB Users Guide 2 2-5

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

I store the mean square error, the (XX)

-1

matrix, and the R matrix of the QR or

Cholesky decomposition. (The variance-covariance matrix of the coefficients is

MSE*(XX)

-1

.) See Help for information on these matrices.

Residual analysis and regression diagnostics

Regression analysis usually does not end when a regression model has been fit. You also

can examine residual plots and other regression diagnostics to assess if the residuals

appear random and normally distributed. MINITAB provides a number of residual plots

through the Graphs subdialog box. Alternatively, after fits and residuals are stored, you

can use Stat Regression Residual Plots to obtain four plots within a single graph

window.

MINITAB also produces regression diagnostics for identifying outliers or unusual

observations. These observations may have a significant influence upon the regression

results. See Identifying outliers on page 2-9. You might check unusual observations to see

if they are correct. If so, you can try to determine why they are unusual and consider

what effect they have on the regression equation. You might wish to examine how

sensitive the regression results are to the outliers being present. Outliers can suggest

inadequacies in the model or a need for additional information.

Choosing a residual type

You can calculate three types of residuals. Use the table below to help you choose which

type you would like to plot:

Residual type Choose when you want to Calculation

regular examine residuals in the original scale of the data response fit

standardized use a rule of thumb for identifying observations that are

not fit well by the model. A standardized residual greater

than 2, in absolute value, might be considered to be

large. MINITAB displays these observations in a table of

unusual observations, labeled with an R.

(residual) / (standard

deviation of the

residual)

Studentized identify observations that are not fit well by the model.

Removing observations can affect the variance estimate

and also can affect parameter estimates. A large absolute

Studentized residual may indicate that including the

observation in the model increases the error variance or

that it has a large affect upon the parameter estimates, or

both.

(residual) / (standard

deviation of the

residual). The i

th

studentized residual is

computed with the i

th

observation removed.

STREGRSN.MK5 Page 5 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression

2-6 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Residual plots

MINITAB generates residual plots that you can use to examine the goodness of model fit.

You can choose the following residual plots:

I Normal plot of residuals. The points in this plot should generally form a straight line

if the residuals are normally distributed. If the points on the plot depart from a

straight line, the normality assumption may be invalid. To perform a statistical test for

normality, use Stat Basic Statistics Normality Test (page 1-43).

I Histogram of residuals. This plot should resemble a normal (bell-shaped)

distribution with a mean of zero. Substantial clusters of points away from zero may

indicate that factors other than those in the model may be influencing your results.

I Residuals versus fits. This plot should show a random pattern of residuals on both

sides of 0. There should not be any recognizable patterns in the residual plot. The

following may indicate error that is not random:

a series of increasing or decreasing points

a predominance of positive residuals, or a predominance of negative residuals

patterns such as increasing residuals with increasing fits

I Residuals versus order. This is a plot of all residuals in the order that the data was

collected and can be used to find non-random error, especially of time-related

effects.

I Residuals versus other variables. This is a plot of all residuals versus another

variable. Commonly, you might use a predictor or a variable left out of the model

and see if there is a pattern that you may wish to fit.

If certain residual values are of concern, you can brush your graph to identify these

values. See the Brushing Graphs chapter in MINITAB Users Guide 1 for more information.

Weighted regression

Weighted least squares regression is a method for dealing with observations that have

nonconstant variances. If the variances are not constant, observations with

I large variances should be given relatively small weights

I small variances should be given relatively large weights

The usual choice of weights is the inverse of pure error variance in the response.

h To perform weighted regression

1 Choose Stat Regression Regression Options.

2 In Weights, enter the column containing the weights. The weights must be greater

than or equal to zero. Click OK in each dialog box.

STREGRSN.MK5 Page 6 Friday, December 17, 1999 12:23 PM

Regression Regression

MINITAB Users Guide 2 2-7

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

If there are n observations in the data set, MINITAB forms an n n matrix W with the

column of weights as its diagonal and zeros elsewhere. MINITAB calculates the regression

coefficients by (XWX)

-1

(XWY). This is equivalent to minimizing a weighted error sum

of squares,

, where w

i

is the weight.

Regression through the origin

By default, the y-intercept term (also called the constant) is included in equation. Thus,

MINITAB fits the model

However, if the response at X = 0 is naturally zero, a model without an intercept can

make sense. If so, Uncheck Fit Intercept in the Options subdialog box, and the

0

term

will be omitted. Thus, MINITAB fits the model

Because it is difficult to interpret the R

2

when the constant is omitted, the R

2

is not

printed. If you wish to compare fits of models with and without intercepts, compare

mean square errors and examine residual plots.

Variance inflation factor

The variance inflation factor (VIF) is used to detect whether one predictor has a strong

linear association with the remaining predictors (the presence of multicollinearity

among the predictors). VIF measures how much the variance of an estimated regression

coefficient increases if your predictors are correlated (multicollinear).

VIF = 1 indicates no relation; VIF > 1, otherwise. The largest VIF among all predictors is

often used as an indicator of severe multicollinearity. Montgomery and Peck [21]

suggest that when VIF is greater than 5-10, then the regression coefficients are poorly

estimated. You should consider the options to break up the multicollinearity: collecting

additional data, deleting predictors, using different predictors, or an alternative to least

square regression. For additional information, see [3], [21].

Detecting autocorrelation in residuals

In linear regression, it is assumed that the residuals are independent (that is, they are

not autocorrelated) of each other. If the independence assumption were violated, some

model fitting results would be questionable. For example, positive correlation between

w

i

y fit ( )

2

[ ]

Y

0

1

+ X

1

2

X

2

k

X

k

+ + + + =

Y

1

X

1

2

X

2

k

X

k

+ + + + =

STREGRSN.MK5 Page 7 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression

2-8 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

error terms tends to inflate the t-values for coefficients. Because of that, checking model

assumptions after fitting a model is an important part of regression analysis.

MINITAB provides two methods to check this assumption:

I A graph of residuals versus data order (1 2 3 4 n) can provide a means to visually

inspect residuals for autocorrelation.

I The Durbin-Watson statistic tests for the presence of autocorrelation in regression

residuals by determining whether or not the correlation between two adjacent error

terms is zero. The test is based upon an assumption that errors are generated by a

first-order autoregressive process. If there are missing observations, these are omitted

from the calculations, and only the nonmissing observations are used.

To reach a conclusion from the test, you will need to compare the displayed statistic

with lower and upper bounds in a table. If D > upper bound, no correlation; if D <

lower bound, positive correlation; if D is in between the two bounds, the test is

inconclusive. For additional information, see [4], [22].

Testing lack-of-fit

MINITAB provides two lack-of-fit tests so you can determine whether or not the

regression model adequately fits your data. The pure error lack-of-fit test requires

replicates; the data subsetting lack-of-fit test does not require replicates.

I Pure error lack-of-fit testIf your predictors contain replicates (repeated x values

with one predictor or repeated combinations of x values with multiple predictors),

MINITAB can calculate a pure error test for lack-of-fit. The error term will be

partitioned into pure error (error within replicates) and a lack-of-fit error. The F-test

can be used to test if you have chosen an adequate regression model. For additional

information, see [9], [22], [29].

I Data subsetting lack-of-fit testMINITAB also performs a lack-of-fit test that does

not require replicates but involves subsetting the data, and attempts to identify the

nature of any lack-of-fit. This test is nonstandard, but it can provide information

about the lack-of-fit relative to each variable. See [6] and Help for more information.

MINITAB performs 2k+1 hypothesis tests, where k is the number of predictors, and

then combines them using Bonferroni inequalities to give an overall significance level

of 0.1. A message is printed out for each test for which there is evidence of lack-of-fit.

For each predictor, a curvature test and an interaction test are performed by

comparing the fit above and below the predictor mean using indicator variables. A

test can also be performed by fitting the model to the central portion of the data

and then comparing the error sums of squares of that central data portion to the

error sums of squares of all the data.

STREGRSN.MK5 Page 8 Friday, December 17, 1999 12:23 PM

Regression Regression

MINITAB Users Guide 2 2-9

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Prediction of new observations

If you have new predictor (X) values and you wish to know what the response

would be using the regression equation, then use Prediction intervals for new

observations in the Options subdialog box. Enter constants or columns

containing the new x values, one for each predictor. Columns must be of equal

length. If you enter a constant and a column(s), MINITAB will assume that you

want predicted values for all combinations of constant and column values. You

can change the confidence level from the default 95%, and you can also store

the printed values: fits, standard errors of fits, confidence limits, and prediction

limits. If you use prediction with weights, see Help for obtaining correct results.

Identifying outliers

In addition to graphs, you can store three additional measures for the purpose

of identifying outliers, or unusual observations that can have a significant

influence upon the regression. The measures are leverages, Cooks distance,

and DFITS:

I Leverages are the diagonals of the hat matrix, H = X (XX)

-1

X, where X is

the design matrix. Note that h

i

depends only on the predictors; it does not

involve the response Y. Many people consider h

i

to be large enough to merit

checking if it is more than 2p/n or 3p/n, where p is the number of predictors

(including one for the constant). MINITAB displays these in a table of unusual

observations with high leverage. Those with leverage over 3p/n or 0.99,

whichever is smallest, are marked with an X and those with leverage greater

than 5p/n are marked with XX.

I Cooks distance combines leverages and Studentized residuals into one

overall measure of how unusual the predictor values and response are for

each observation. Large values signify unusual observations. Geometrically,

Cooks distance is a measure of the distance between coefficients calculated

with and without the i

th

observation. Cook [7] and Weisberg [29] suggest

checking observations with Cooks distance > F (.50, p, np), where F is a

value from an F-distribution.

I DFITS, like Cooks distance, combines the leverage and the Studentized

residual into one overall measure of how unusual an observation is. DFITS

(also called DFFITS) is the difference between the fitted values calculated

with and without the i

th

observation, and scaled by stdev (

i

). Belseley, Kuh,

and Welsch [3] suggest that observations with DFITS > 2 should be

considered as unusual. See Help for more details on these measures.

e Example of performing a simple linear regression

You are a manufacturer who wishes to easily obtain a quality measure on a

product, but the procedure is expensive. However, there is a quick-and-dirty

way of doing the same thing that is much less expensive but also is slightly less

precise. You examine the relationship between the two scores to see if you can

Y

p n

STREGRSN.MK5 Page 9 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression

2-10 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

predict the desired score (Score2) from the score that is easy to obtain (Score1). You

also obtain a prediction interval for an observation with Score1 being 8.2.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Regression.

3 In Response, enter Score2. In Predictors, enter Score1.

4 Click Options.

5 In Predict intervals for new observations, type 8.2. Click OK in each dialog box.

Session

window

output

Regression Analysis: Score2 versus Score1

The regression equation is

Score2 = 1.12 + 0.218 Score1

Predictor Coef SE Coef T P

Constant 1.1177 0.1093 10.23 0.000

Score1 0.21767 0.01740 12.51 0.000

S = 0.1274 R-Sq = 95.7% R-Sq(adj) = 95.1%

Analysis of Variance

Source DF SS MS F P

Regression 1 2.5419 2.5419 156.56 0.000

Residual Error 7 0.1136 0.0162

Total 8 2.6556

Unusual Observations

Obs Score1 Score2 Fit SE Fit Residual St Resid

9 7.50 2.5000 2.7502 0.0519 -0.2502 -2.15R

R denotes an observation with a large standardized residual

Predicted Values for New Observations

New Obs Fit SE Fit 95.0% CI 95.0% PI

1 2.9026 0.0597 ( 2.7614, 3.0439) ( 2.5697, 3.2356)

Values of Predictors for New Observations

New Obs Score1

1 8.20

Interpreting the results

The regression procedure fits the model

Y

0

1

X + + =

STREGRSN.MK5 Page 10 Friday, December 17, 1999 12:23 PM

Regression Regression

MINITAB Users Guide 2 2-11

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

where Y is the response, X is the predictor,

0

and

1

are the regression coefficients, and

is an error term having a normal distribution with mean of zero and standard

deviation . MINITAB estimates

0

by b

0

,

1

by b

1

, and by s. The fitted equation is then

where is called the predicted or fitted value. In this example, b

0

is 1.12 and b

1

is

0.218.

Table of Coefficients. The first table in the output gives the estimated coefficients, b

0

and b

1

, along with their standard errors. In addition, a t-value that tests whether the

null hypothesis of the coefficient is equal to zero and the corresponding p-value is

given. In this example, the p-values that are used to test whether the constant and slope

are equal to zero are printed as 0.000, because MINITAB rounds these values to three

decimal points. These p-values are actually less than 0.0005. These values indicate that

there is sufficient evidence that the coefficients are not zero for likely Type I error rates

( levels).

S = 0.1274. This is an estimate of , the estimated standard deviation about the

regression line. Note that

R-Sq = 95.7%. This is R

2

, also called the coefficient of determination. Note that R

2

=

Correlation (Y, )

2

. Also,

R

2

= (SS Regression) / (SS Total)

The R

2

value is the proportion of variability in the Y variable (in this example, Score2)

accounted for by the predictors (in this example, Score1).

R-Sq(adj) = 95.1%. This is R

2

adjusted for degrees of freedom. If a variable is added to

an equation, R

2

will get larger even if the added variable is of no real value. To

compensate for this, MINITAB also prints R-Sq (adj), which is an approximately unbiased

estimate of the population R

2

that is calculated by the formula

converted to a percent, where p is the number of coefficients fit in the regression

equation (2 in our example). In the same notation, the usual R

2

is

Analysis of Variance. This table contains sums of squares (abbreviated SS). SS

Regression is sometimes written SS (Regression | b

0

) and sometimes called SS Model. SS

Error is sometimes written as SS Residual, SSE, or RSS. MS Error is often written as MSE.

SS Total is the total sum of squares corrected for the mean. Use the analysis of variance

Y

b

0

b

1

X + =

Y

s

2

MSError =

Y

R

2

adj ( ) 1

SS Error n p ( )

SS Total n 1 ( )

-------------------------------------------- =

R

2

1

SS Error

SS Total

-------------------- =

STREGRSN.MK5 Page 11 Friday, December 17, 1999 12:23 PM

Chapter 2 Regression

2-12 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

table to assess the overall fit. The F-test is a test of the hypothesis H

0

: All regression

coefficients, excepting

0

, are zero.

Unusual Observations. Unusual observations are marked with an X if the predictor is

unusual (large leverage), and they are marked with an R if the response is unusual (large

standardized residual). See Choosing a residual type on page 2-5 and Identifying outliers

on page 2-9. The default is to print only unusual observations. You can choose to print a

full table of fitted values by selecting this option in the Results subdialog box.

The Fit or fitted Y value is sometimes called predicted Y value or . SE Fit is the

(estimated) standard error of the fitted value. St Resid is the standardized residual.

Predicted Values. The interval displayed under 95% CI is the confidence interval for the

population mean of all responses (Score2) that correspond to the given value of the

predictor (Score1 = 8.2). The interval displayed under 95% PI is the prediction interval

for an individual observation taken at Score1 = 8.2. The confidence interval is

appropriate for the data used in the regression. If you have new observations, use the

prediction interval. See Prediction of new observations on page 2-9.

Regression analysis would not be complete without examining residual patterns. The

following multiple regression example and residual plots procedure provide additional

information about regression analysis.

e Example of a multiple regression

As part of a test of solar thermal energy, you measure the total heat flux from homes.

You wish to examine whether total heat flux (Heatflux) can be predicted by insulation,

by the position of the focal points in the east, south, and north directions, and by the

time of day. Data are from [21], page 486. You found, using best subsets regression on

page 2-23, that the best two-predictor model included the variables North and South

and the best three-predictor added the variable East. You would like to evaluate the

three-predictor model using multiple regression.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Regression.

3 In Response, enter Heatflux.

4 In Predictors, enter North South East. Click OK.

Y

Regression Regression

MINITAB Users Guide 2 2-13

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Session

window

output

Regression Analysis: HeatFlux versus North, South, East

The regression equation is

HeatFlux = 389 - 24.1 North + 5.32 South + 2.12 East

Predictor Coef SE Coef T P

Constant 389.17 66.09 5.89 0.000

North -24.132 1.869 -12.92 0.000

South 5.3185 0.9629 5.52 0.000

East 2.125 1.214 1.75 0.092

S = 8.598 R-Sq = 87.4% R-Sq(adj) = 85.9%

Analysis of Variance

Source DF SS MS F P

Regression 3 12833.9 4278.0 57.87 0.000

Residual Error 25 1848.1 73.9

Total 28 14681.9

Source DF Seq SS

North 1 10578.7

South 1 2028.9

East 1 226.3

Unusual Observations

Obs North HeatFlux Fit SE Fit Residual St Resid

4 17.5 230.70 210.20 5.03 20.50 2.94R

22 17.6 254.50 237.16 4.24 17.34 2.32R

R denotes an observation with a large standardized residual

Interpreting the results

MINITAB fits the regression model

where Y is the response, X

1

, X

2

, and X

3

are the predictors,

0

,

1

,

2

, and

3

are the

regression coefficients, and is an error term having a normal distribution with mean of

0 and standard deviation .

The multiple regression output is similar to the simple regression output, but it also

includes the sequential sums of squares. Sequential sums of squares differ from

t-statistics. T-statistics test the null hypothesis that each coefficient is zero, given that all

other variables are present in the model. The sequential sums of squares are the unique

sums of squares of the current variable, given the sums of squares of any previously

entered variables.

For example, in the sequential sums of squares column of the Analysis of Variance table,

the value for North (10578.7) is the sums of squares for North; the value for South

(2028.9) is the unique sums of squares for South given the sums of squares for North;

Y

0

1

X

1

2

X

2

3

X

3

e + + + + =

STREGRSN.MK5 Page 13 Friday, December 17, 1999 12:23 PM

Chapter 2 Stepwise Regression

2-14 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

and the value for East (226.3) is the unique sums of squares for East given the sums of

squares of North and South.

The first line in the sequential sums of squares table gives SS (b

1

| b

0

), or the reduction

in SS Error due to fitting the b

1

term (an equivalent is to use X1 as a predictor),

assuming that you have already fit b

0

. The next line gives SS (b

2

| b

0

, b

1

), or the

reduction in SS Error due to fitting the b

2

term, assuming that you have already fit the

terms b

0

and b

1

. The next line is SS (b

3

| b

0

, b

1

, b

2

), and so on. If you want a different

sequence, say SS (b

2

| b

0

, b

3

), then repeat the regression procedure and enter X3 first,

then X2. MINITAB does not print p-values for the sequential sums of squares. Except for

the last sequential sums of squares, the mean square error should not be used to test

the significance of these terms.

In this example, t-test p-values of less than 0.0005 indicate that there is significant

evidence that the coefficients of variables North and South are not zero. The coefficient

of the variable East, however, has an t-test p-value of 0.092. If the evidence for the

coefficient not being zero appears insufficient and if it adds little to the prediction, you

may choose the more parsimonious model with predictors North and South. Make this

decision only after examining the residuals. In the residual plots example on page 2-28,

you examine the residuals from the model with predictors North and South.

(Alternatively, you could have used the graphs available in the Graphs subdialog box.)

Stepwise Regression

Stepwise regression removes and adds variables to the regression model for the purpose

of identifying a useful subset of the predictors. MINITAB provides three commonly used

procedures: standard stepwise regression (adds and removes variables), forward

selection (adds variables), and backward elimination (removes variables).

Data

Enter response and predictor variables in the worksheet in numeric columns of equal

length so that each row in your worksheet contains measurements on one observation

or subject. MINITAB automatically omits rows with missing values from the calculations.

STREGRSN.MK5 Page 14 Friday, December 17, 1999 12:23 PM

Stepwise Regression Regression

MINITAB Users Guide 2 2-15

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

h To do a stepwise regression

1 Choose Stat Regression Stepwise.

2 In Response, enter the numeric column containing the response (Y) data.

3 In Predictors, enter the numeric columns containing the predictor (X) variables.

4 If you like, use one or more of the options listed below, then click OK.

Options

Stepwise dialog box

I By entering variables in Predictors to include in every model, you can designate a

set of predictor variables that cannot be removed from the model, even when their

p-values are less than the Alpha to enter value.

Method subdialog box

I perform standard stepwise regression (adds and removes variables), forward

selection (adds variables), or backward elimination (removes variables).

I when you choose the Stepwise method, you can enter a starting set of predictor

variables in Enter. These variables are removed if their p-values are greater than the

Alpha to enter value. If you want keep variables in the model regardless of their

p-values, enter them in Predictors to include in every model in the main dialog

box. See Stepwise regression (default) on page 2-16.

I when you choose the Stepwise or Forward selection method, you can set the value of

the for entering a new variable in the model in Alpha to enter. See Stepwise

regression (default) and Forward selection on page 2-17.

I when you choose the Stepwise or Backward elimination method, you can set the

value of for removing a variable from the model in Alpha to remove. See Stepwise

regression (default) and Backward elimination on page 2-17.

STREGRSN.MK5 Page 15 Friday, December 17, 1999 12:23 PM

Chapter 2 Stepwise Regression

2-16 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Options subdialog box

I display the next best alternate predictors up to the number requested. If a new

predictor is entered into the model, MINITAB displays the predictor which was the

second best choice, the third best choice, and so on, up to the requested number.

I set the number of steps between pauses. See User intervention on page 2-17.

I exclude the intercept term from the regression by unchecking Fit Intercept. See

Regression through the origin on page 2-7.

Method

MINITAB provides three commonly used procedures: standard stepwise regression (page

2-16), forward selection (page 2-17), and backward elimination (page 2-17)

Stepwise regression (default)

The basic method of stepwise regression is to calculate an F-statistic for each variable in

the model. Suppose the model contains X1, , Xj. Then the F-statistic for Xi is

with 1 and n j 1 degrees of freedom.

MINITAB then determines the corresponding p-value. If the p-value for any variable is

greater than Alpha to remove, the variable with the largest p-value is removed from

the model. The regression equation is calculated for this smaller model, the results are

printed, and the procedure proceeds to a new step.

If no variable can be removed, the procedure attempts to add a variable. An F-statistic is

calculated for each variable not yet in the model. Suppose the model, at this stage,

contains X1, , Xp. Then the F-statistic for a new variable, X

j + 1

is

MINITAB then determines the corresponding p-value. The variable with the smallest

p-value is then added, provided its p-value is smaller than Alpha to enter. Adding this

variable is equivalent to choosing the variable with the largest partial correlation or to

choosing the variable that most effectively reduces the error SS. The regression equation

is then calculated, results are displayed, and the procedure goes to a new step.

When no more variables can be entered into or removed from the model, the stepwise

procedure ends.

SSE X1 X i 1 ( ) X i 1 + ( ) Xj , , , , , [ ] SSE X1 Xj , , [ ]

MSE X1 Xj , , [ ]

-------------------------------------------------------------------------------------------------------------------------------------------------

SSE X1 Xj , , [ ] SSE X1 Xj X

j 1 +

, , , [ ]

MSE X1 Xj X

j 1 +

, , , [ ]

---------------------------------------------------------------------------------------------------------- -

STREGRSN.MK5 Page 16 Friday, December 17, 1999 12:23 PM

Stepwise Regression Regression

MINITAB Users Guide 2 2-17

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Forward selection

This procedure adds variables to the model using the same method as the stepwise

procedure. Once added, however, a variable is never removed. The forward selection

procedure ends when none of the candidate variables have a p-value smaller than

Alpha to enter.

Backward elimination

This procedure starts with the model that contains all the predictors and then removes

variables, one at a time, using the same method as the stepwise procedure. No variable,

however, can re-enter the model. The backward elimination procedure ends when none

of the variables included the model have a p-value greater than Alpha to remove.

User intervention

Stepwise proceeds automatically by steps and then pauses. You can set the number of

steps between pauses in the Options subdialog box.

The number of steps can start at one with the default and maximum determined by the

output width. Set a smaller value if you wish to intervene more often. You must check

Editor Enable Commands in order to intervene and use the procedure interactively.

If you do not, the procedure will run to completion without pausing.

At the pause, MINITAB displays a MORE? prompt. At this prompt, you can continue the

display of steps, terminate the procedure, or intervene by typing a subcommand.

To Type

display another page of steps (or until

no more predictors can enter or leave

the model)

YES

terminate the procedure NO

enter a set of variables ENTER CC

remove a set of variables REMOVE CC

force a set of variables to be in model FORCE CC

display the next best alternate predictors BEST K

set the number of steps between pauses STEPS K

change F to enter FENTER K

change F to remove FREMOVE K

change to enter AENTER K

change to remove AREMOVE K

STREGRSN.MK5 Page 17 Friday, December 17, 1999 12:23 PM

Chapter 2 Stepwise Regression

2-18 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Use of variable selection procedures

Variable selection procedures can be a valuable tool in data analysis, particularly in the

early stages of building a model. At the same time, these procedures present certain

dangers. Here are some considerations:

I Since the procedures automatically snoop through many models, the model

selected may fit the data too well. That is, the procedure can look at many

variables and select ones which, by pure chance, happen to fit well.

I The three automatic procedures are heuristic algorithms, which often work very well

but which may not select the model with the highest R

2

value (for a given number of

predictors).

I Automatic procedures cannot take into account special knowledge the analyst may

have about the data. Therefore, the model selected may not be the best from a

practical point of view.

e Example of a stepwise regression

Students in an introductory statistics course participated in a simple experiment. Each

student recorded his or her height, weight, gender, smoking preference, usual activity

level, and resting pulse. They all flipped coins, and those whose coins came up heads

ran in place for one minute. Afterward, the entire class recorded their pulses once more.

You wish to find the best predictors for the second pulse rate.

1 Open the worksheet PULSE.MTW.

2 Press c+M to make the Session window active.

3 Check Editor Enable Commands.

4 Choose Stat Regression Stepwise.

5 In Response, enter Pulse2.

6 In Predictors, enter Pulse1 RanWeight.

7 Click Options.

8 In Number of steps between pauses, enter 2. Click OK in each dialog box.

9 At the first More? prompt, type Yes.

10 At the second More? prompt, type No.

STREGRSN.MK5 Page 18 Friday, December 17, 1999 12:23 PM

Stepwise Regression Regression

MINITAB Users Guide 2 2-19

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Session

window

output

Stepwise Regression: Pulse2 versus Pulse1, Ran, ...

F-to-Enter: 4 F-to-Remove: 4

Response is Pulse2 on 6 predictors, with N = 92

Step 1 2

Constant 10.28 44.48

Pulse1 0.957 0.912

T-Value 7.42 9.74

P-Value 0.000 0.000

Ran -19.1

T-Value -9.05

P-Value 0.000

S 13.5 9.82

R-Sq 37.97 67.71

R-Sq(adj) 37.28 66.98

C-p 103.2 13.5

More? (Yes, No, Subcommand, or Help)

Step 3

Constant 42.62

Pulse1 0.812

T-Value 8.88

P-Value 0.000

Ran -20.1

T-Value -10.09

P-Value 0.000

Sex 7.8

T-Value 3.74

P-Value 0.000

S 9.18

R-Sq 72.14

R-Sq(adj) 71.19

C-p 1.9

More? (Yes, No, Subcommand, or Help)

Interpreting the results

This example uses six predictors. You requested that MINITAB do two steps of the

automatic stepwise procedure, display the results, and allow you to intervene.

The first page of output gives results for the first two steps. In step 1, the variable

Pulse1 entered the model; in step 2, the variable Ran entered. No variables were

removed on either of the first two steps. For each model, MINITAB displays the constant

STREGRSN.MK5 Page 19 Friday, December 17, 1999 12:23 PM

Chapter 2 Best Subsets Regression

2-20 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

term, the coefficient and its t-value for each variable in the model, S (square root of

MSE), and R

2

.

Because you answered YES at the MORE? prompt, the automatic procedure continued

for one more step, adding the variable Sex. At this point, no more variables could enter

or leave, so the automatic procedure stopped and again allowed you to intervene.

Because you do not want to intervene, you typed NO.

The stepwise output is designed to present a concise summary of a number of fitted

models. If you want more information on any of the models, you can use the regression

procedure (page 2-3).

Best Subsets Regression

Best subsets regression generates regression models using the maximum R

2

criterion by

first examining all one-predictor regression models and then selecting the two models

giving the largest R

2

. MINITAB displays information on these models, examines all

two-predictor models, selects the two models with the largest R

2

, and displays

information on these two models. This process continues until the model contains all

predictors.

Data

Enter response and predictor variables in the worksheet in numeric columns of equal

length so that each row in your worksheet contains measurements on one unit or

subject. MINITAB automatically omits rows with missing values from all models.

When there are a large number of predictors, there are many subsets. There is a limit to

the number of free predictors (variables which may optionally enter the model). If you

have m free predictors and force q predictors to be in the model, then (m q) must be

less than or equal to 31.

In general, best subsets regression can take a long time when there are 15 or more free

predictors. The length of time, however, varies with the data set. If you have to analyze

a very large data set, consider forcing certain predictors to be in the model, thereby

decreasing the number of free variables.

STREGRSN.MK5 Page 20 Friday, December 17, 1999 12:23 PM

Best Subsets Regression Regression

MINITAB Users Guide 2 2-21

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

h To do a best subsets regression

1 Choose Stat Regression Best Subsets.

2 In Response, enter the numeric column containing the response (Y) data.

3 In Free predictors, enter up to 31 numeric columns containing the predictor (X)

variables. MINITAB can include any of these free predictors if they meet the specified

criteria (for example, greater than F to enter).

4 If you like, use one or more of the options listed below, then click OK.

Options

Best Subsets Regression dialog box

I force a set of predictors to be in the model by entering these variables in Predictors

in all models. These variables cannot be removed, even when their F-statistics are

less than the F to Enter value.

Options subdialog box

I display information for a specified number of free variables in the models. You can

give a minimum and maximum. For example, if you specify 3 as the minimum and 6

as the maximum, MINITAB will display the best 3, 4, 5, and 6 variable models (this

number does not include variables forced into the model).

I display information about the best models of each variable number. You can

specify the number of best models to display. For example, if you specify 3,

MINITAB will display the best, second best, and third best models for each number of

variables. You can enter a value from 1 to 5 (the default is 2).

I exclude the intercept term from the regression by unchecking Fit Interceptsee

Regression through the origin on page 2-7.

STREGRSN.MK5 Page 21 Friday, December 17, 1999 12:23 PM

Chapter 2 Best Subsets Regression

2-22 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Using the best subsets regression procedure

The best subsets regression procedure can be used to select a group of likely models for

further analysis. The general method is to select the smallest subset that fulfills certain

statistical criteria. The reason that you would use a subset of variables rather than a full

set is because the subset model may actually estimate the regression coefficients and

predict future responses with smaller variance than the full model using all predictors

[15].

The statistics R

2

, adjusted R

2

, C

p

, and s (square root of MSE) are calculated by the best

subsets procedure and can be used as comparison criteria.

Typically, you would only consider subsets that provide the largest R

2

value. However,

R

2

always increases with the size of the subset. For example, the best 5-predictor model

will always have a higher R

2

than the best 4-predictor model. Therefore, R

2

is most

useful when comparing models of the same size. When comparing models with the

same number of predictors, choosing the model with the highest R

2

is equivalent to

choosing the model with the smallest SSE.

Use adjusted R

2

and C

p

to compare models with different numbers of predictors. In this

case, choosing the model with the highest adjusted R

2

is equivalent to choosing the

model with the smallest mean square error (MSE). If adjusted R

2

is negative (usually

when there is a large number of predictors and small R

2

) then MINITAB sets the adjusted

R

2

to zero.

The C

p

statistic is given by the formula

where SSE

p

is SSE for the best model with p parameters (including the intercept, if it is

in the equation), and MSE

m

is the mean square error for the model with all m

predictors.

In general, look for models where C

p

is small and close to p. If the model is adequate

(i.e., fits the data well), then the expected value of C

p

is approximately equal to p (the

number of parameters in the model). A small value of C

p

indicates that the model is

relatively precise (has small variance) in estimating the true regression coefficients and

predicting future responses. This precision will not improve much by adding more

predictors. Models with considerable lack-of-fit have values of C

p

larger than p. See [15]

for additional information on C

p

.

Exercise caution when using variable selection procedures such as best subsets (and

stepwise regression). These procedures are automatic and therefore do not consider the

practical importance of any of the predictors. In addition, anytime you fit a model to

data, the goodness of the fit comes from two basic sources:

I fitting the underlying structure of the data (a structure that will appear in other data

sets gathered in the same way)

I fitting the peculiarities of the one particular data set you analyze

C

p

SSE

p

MSE

m

---------------- - n 2p ( ) =

STREGRSN.MK5 Page 22 Friday, December 17, 1999 12:23 PM

Best Subsets Regression Regression

MINITAB Users Guide 2 2-23

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Unfortunately, when you search through many models to find the best, as you do in

best subsets regression, a good fit is often chosen largely for the second reason. There

are two ways that you can verify a model obtained by a variable selection procedure.

You can

I verify the model using a new set of data.

I take the original data set and randomly divide it into two parts. Then use the variable

selection procedure on one part to select a model and verify the fit using the second

part.

e Example of best subsets regression

Total heat flux is measured as part of a solar thermal energy test. You wish to see how

total heat flux is predicted by other variables: insolation, the position of the focal points

in the east, south, and north directions, and the time of day. Data are from

Montgomery and Peck [21], page 486.

1 Open the worksheet EXH_REGR.

2 Choose Stat Regression Best Subsets.

3 In Response, enter Heatflux.

4 In Free Predictors, enter Insolation-Time. Click OK.

Session

window

output

Best Subsets Regression: HeatFlux versus Insolation, East, ...

Response is HeatFlux

I

n

s

o S N

l E o o T

a a u r i

t s t t m

Vars R-Sq R-Sq(adj) C-p S i t h h e

1 72.1 71.0 38.5 12.328 X

1 39.4 37.1 112.7 18.154 X

2 85.9 84.8 9.1 8.9321 X X

2 82.0 80.6 17.8 10.076 X X

3 87.4 85.9 7.6 8.5978 X X X

3 86.5 84.9 9.7 8.9110 X X X

4 89.1 87.3 5.8 8.1698 X X X X

4 88.0 86.0 8.2 8.5550 X X X X

5 89.9 87.7 6.0 8.0390 X X X X X

Interpreting the results

Each line of the output represents a different model. Vars is the number of variables or

predictors in the model. The statistics R

2

, adjusted R

2

, C

p

, and s are displayed next (R

2

STREGRSN.MK5 Page 23 Friday, December 17, 1999 12:23 PM

Chapter 2 Fitted Line Plot

2-24 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

and adjusted R

2

are converted to percentages). Predictors that are present in the model

are indicated by an X.

In this example, the best one-predictor model uses North (R

2

adj = 71.0) and the

second-best one-predictor model uses Insolation (R

2

adj = 37.1). Moving from the best

one-predictor model to the best two-predictor model increased the adjusted R

2

from

71.0 to 84.8. R

2

usually increases slightly as more predictors are added even when the

new predictors do not improve the model. The best two-predictor model might be

considered as the minimum fit. The multiple regression example on page 2-12 and the

residual plots example on page 2-28 indicate that adding the variable East does not

improve the fit of the model.

Fitted Line Plot

This procedure performs regression with linear and polynomial (second or third order)

terms, if requested, of a single predictor variable and plots a regression line through the

data, on the actual or log

10

scale. Polynomial regression is one method for modeling

curvature in the relationship between a response variable (Y) and a predictor variable

(X) by extending the simple linear regression model to include X

2

and X

3

as predictors.

Data

Enter your response and single predictor variables in the worksheet in numeric columns

of equal length so that each row in your worksheet contains measurements on one unit

or subject. MINITAB automatically omits rows with missing values from the calculations.

h To do a fitted line plot

1 Choose Stat Regression Fitted Line Plot.

2 In Response (Y), enter the numeric column containing the response data.

3 In Predictor (X), enter the numeric column containing the predictor variable.

4 If you like, use one or more of the options listed below, then click OK.

STREGRSN.MK5 Page 24 Friday, December 17, 1999 12:23 PM

Fitted Line Plot Regression

MINITAB Users Guide 2 2-25

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Options

Fitted Line Plot dialog box

I choose a linear (default), quadratic, or cubic regression model to automatically

include all lower order terms. See Polynomial regression model choices on page 2-25.

Options subdialog box

I transform the y-variable by log

10

Y. You can also choose to display the y-scale in the

log

10

scale.

I transform the x-variable by log

10

X. You can also choose to display the plot x scale in

the log

10

scale. If you use this option with polynomials of order greater than one,

then the polynomial regression will be based on powers of the log

10

X.

I display confidence bands and prediction bands about the regression line. You can

also change the confidence level from the default of 95%.

I replace the default title with your own title.

Storage subdialog box

I store the residuals, fits, and regression model coefficients (b

0

, b

1

, b

2

, up to b

3

down

the column, where b

i

is the coefficient of the i

th

power of the predictor or

transformed predictor).

I store the scaled residuals and scaled fits when using the y-variable transformation,

log

10

Y.

Polynomial regression model choices

You can fit the following linear, quadratic, or cubic regression models:

Another way of modeling curvature is to generate additional models by using the log

10

of X and/or Y for linear, quadratic, and cubic models. In addition, taking the log

10

of Y

may be used to reduce right-skewness or nonconstant variance of residuals.

Model type Order Statistical model

linear first

quadratic second

cubic third

Y

0

1

+ X + =

Y

0

1

+ X

2

X

2

+ + =

Y

0

1

+ X

2

X

2

3

X

3

+ + + =

STREGRSN.MK5 Page 25 Friday, December 17, 1999 12:23 PM

Chapter 2 Fitted Line Plot

2-26 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

e Example of plotting a fitted regression line

You are studying the relationship between a particular machine setting and the amount

of energy consumed. This relationship is known to have considerable curvature, and

you believe that a log transformation of the response variable will produce a more

symmetric error distribution. You choose to model the relationship between the

machine setting and the amount of energy consumed with a quadratic model.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Fitted Line Plot.

3 In Response (Y), enter EnergyConsumption.

4 In Predictor (X), enter MachineSetting.

5 Under Type of Regression Model, choose Quadratic.

6 Click Options. Check Logten of Y, Display logscale for Y variable, Display

confidence bands, and Display prediction bands. Click OK in each dialog box.

Session

window

output

Polynomial Regression Analysis: EnergyConsum versus MachineSetti

The regression equation is

log(EnergyConsum) = 7.06962 - 0.698628 MachineSetti

+ 0.0173974 MachineSetti**2

S = 0.167696 R-Sq = 93.1 % R-Sq(adj) = 91.1 %

Analysis of Variance

Source DF SS MS F P

Regression 2 2.65326 1.32663 47.1743 0.000

Error 7 0.19685 0.02812

Total 9 2.85012

Source DF Seq SS F P

Linear 1 0.03688 0.1049 0.754

Quadratic 1 2.61638 93.0370 0.000

Graph

window

output

STREGRSN.MK5 Page 26 Friday, December 17, 1999 12:23 PM

Residual Plots Regression

MINITAB Users Guide 2 2-27

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Interpreting the results

The quadratic model (p-value = 0.000, or actually p-value < 0.0005) appears to provide

a good fit to the data. The R

2

indicates that machine setting accounts for 93.1% of the

variation in log

10

of the energy consumed. A visual inspection of the plot reveals that

the data are evenly spread about the regression line, implying no systematic lack-of-fit.

The lines labeled CI are the 95% confidence limits for the log

10

of energy consumed.

The lines labeled PI are the 95% prediction limits for new observations.

Residual Plots

You can generate a set of plots to use for residual analysis by storing fits and residuals

using another procedure, such as regression, and then using the Residual Plots

procedure to produce a normal score plot, a chart of individual residuals, a histogram of

residuals, and a plot of fits versus residuals, all on the same graph.

Data

You must save a column of residuals and a column of fits from another MINITAB

procedure. MINITAB automatically omits rows with missing values from the calculations.

h To display the residual plots

1 Choose Stat Regression Residual Plots.

2 In Fits, enter the column containing stored fits.

3 In Residuals, enter the column containing the stored residuals.

4 If you like, use the option listed below, then click OK.

Options

You can replace the default title with your own title.

STREGRSN.MK5 Page 27 Friday, December 17, 1999 12:23 PM

Chapter 2 Residual Plots

2-28 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

e Example of residual plots

You examine the residuals from the best two-predictor model of the best subsets

regression example on page 2-23. You determined in the multiple regression example

on page 2-12 that adding the third variable from the best three-predictor model may

not add appreciably to the fit. You now examine residual patterns from the best

two-predictor model to further examine goodness-of-fit.

Step 1: Store the residuals and fits from a regression analysis

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Regression.

3 In Response, enter Heatflux.

4 In Predictors, enter South North.

5 Click Storage. Check Fits and Standardized residuals.

6 Click OK in each dialog box.

Step 2: Generate the residual plots

1 Choose Stat Regression Residual Plots.

2 In Fits, enter the column containing the stored fits.

3 In Residuals, enter the column containing the stored residuals. Click OK.

Session

window

output

Residual Plots

TEST 1. One point more than 3.00 sigmas from center line.

Test Failed at points: 22

TEST 2. 9 points in a row on same side of center line.

Test Failed at points: 16

Graph

window

output

STREGRSN.MK5 Page 28 Friday, December 17, 1999 12:23 PM

Logistic Regression Overview Regression

MINITAB Users Guide 2 2-29

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Interpreting the results

The residuals plots procedure generates four plots in one graph window. The normal

plot shows an approximately linear pattern that is consistent with a normal distribution.

Similarly, the histogram exhibits a pattern that is consistent with a sample from a

normal distribution. However, the I Chart (a control chart of individual observations)

reveals that one point labeled with a 1the twenty-second valueis outside the three

sigma limits, and another point labeled with a 2 is flagged because it is the ninth in a

row on the same side of the mean.

The plot of residuals versus fits shows that the fit tends to be better for higher predicted

values. Investigation shows that the highest residual coincides with the highest value of

the variable East. Including East in the model and repeating the residual plots procedure

showed that no points are flagged as unusual (not shown). The contribution to the fit

by the variable East may warrant further investigation.

Logistic Regression Overview

Both logistic regression and least squares regression investigate the relationship

between a response variable and one or more predictors. A practical difference between

them is that logistic regression techniques are used with categorical response variables,

and linear regression techniques are used with continuous response variables.

MINITAB provides three logistic regression procedures that you can use to assess the

relationship between one or more predictor variables and a categorical response

variable of the following types:

Both logistic and least squares regression methods estimate parameters in the model so that the

fit of the model is optimized. Least squares minimizes the sum of squared errors to obtain

parameter estimates, whereas logistic regression obtains maximum likelihood estimates of the

parameters using an iterative-reweighted least squares algorithm [19].

Tip You can identify points in the plots using the brushing capabilities. See the Brushing

Graphs chapter in MINITAB Users Guide 1.

Variable

type

Number of

categories Characteristics Examples

Binary 2 two levels success, failure

yes, no

Ordinal 3 or more natural ordering

of the levels

none, mild, severe

fine, medium, coarse

Nominal 3 or more no natural ordering

of the levels

blue, black, red, yellow

sunny, rainy, cloudy

STREGRSN.MK5 Page 29 Friday, December 17, 1999 12:23 PM

Chapter 2 Logistic Regression Overview

2-30 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

How to specify the model terms

The logistic regression procedures can fit models with:

I up to 9 factors and up to 50 covariates

I crossed or nested factorssee Crossed vs. nested factors on page 3-19

I covariates that are crossed with each other or with factors, or nested within factors

Model continuous predictors as covariates and categorical predictors as factors. Here

are some examples. A is a factor and X is a covariate.

The model for logistic regression is a generalization of the model used in MINITABs

general linear model (GLM) procedure. Any model fit by GLM can also be fit by the

logistic regression procedures. For a discussion of specifying models in general, see

Specifying the model terms on page 3-20 and Specifying reduced models on page 3-21. In

the logistic regression commands, MINITAB assumes any variable in the model is a

covariate unless the variable is specified as a factor. In contrast, GLM assumes that any

variable in the model is a factor unless the variable is specified as a covariate.

Model restrictions

Logistic regression models in MINITAB have the restrictions as GLM models:

I There must be enough data to estimate all the terms in your model, so that the

model is full rank. MINITAB will automatically determine if your model is full rank and

display a message. In most cases, eliminating some unimportant high order

interactions in your model should solve your problem.

I The model must be hierarchical. In a hierarchical model, if an interaction term is

included, all lower order interactions and main effects that comprise the interaction

term must appear in the model.

Model terms

A X AX fits a full model with a covariate crossed with a factor

A | X an alternative way to specify the previous model

A X XX fits a model with a covariate crossed with itself making a squared term

A X(A) fits a model with a covariate nested within a factor

STREGRSN.MK5 Page 30 Friday, December 17, 1999 12:23 PM

Logistic Regression Overview Regression

MINITAB Users Guide 2 2-31

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Reference levels for factors

MINITAB needs to assign one factor level as the reference level, meaning that the

interpretation of the estimated coefficients is relative to this level. MINITAB designates

the reference level based on the data type:

I For numeric factors, the reference level is the level with the least numeric value.

I For date/time factors, the reference level is the level with the earliest date/time.

I For text factors, the reference level is the level that is first in alphabetical order.

You can change the default reference level in the Options subdialog box.

For more information, Interpreting the parameter estimates relative to the event and the

reference levels on page 2-39.

Logistic regression creates a set of design variables for each factor in the model. If there

are k levels, there will be k1 design variables and the reference level will be coded as 0.

Here are two examples of the default coding scheme:

Reference event for the response variable

MINITAB needs to designate one of the response values as the reference event. MINITAB

defines the reference event based on the data type:

I For numeric factors, the reference event is the greatest numeric value.

I For date/time factors, the reference event is the most recent date/time.

I For text factors, the reference event is the last in alphabetical order.

You can change the default reference event in the Options subdialog box.

For more information, Interpreting the parameter estimates relative to the event and the

reference levels on page 2-39.

Note If you have defined a value order for a text factor, the default rule above does not

apply. MINITAB designates the first value in the defined order as the reference value. See

Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.

Factor A with 4 levels

(1 2 3 4)

Factor B with 3 levels

(Temp Pressure Humidity)

reference

level

A1 A2 A3 reference

level

B1 B2

1 0 0 0 Humidity 0 0

2 1 0 0 Pressure 1 0

3 0 1 0 Temp 0 1

4 0 0 1

Note If you have defined a value order for a text factor, the default rule above does not

apply. MINITAB designates the last value in the defined order as the reference event. See

Ordering Text Categories in the Manipulating Data chapter in MINITAB Users Guide 1.

STREGRSN.MK5 Page 31 Friday, December 17, 1999 12:23 PM

Chapter 2 Logistic Regression Overview

2-32 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Worksheet structure

Data used for input to the logistic regression procedures may be arranged in two

different ways in your worksheet: as raw (categorical) data, or as frequency (collapsed)

data. For binary logistic regression, there are three additional ways to arrange the data

in your worksheet: as successes and trials, as successes and failures, or as failures and

trials. These ways are illustrated here for the same data.

The response entered as raw data or as frequency data

C1 C2 C3 C4

Response Factor Covar

0 1 12

1 1 12

1 1 12

.

.

.

.

.

.

.

.

.

1 1 12

0 2 12

1 2 12

.

.

.

.

.

.

.

.

.

1 2 12

.

.

.

.

.

.

.

.

.

C1 C2 C3 C4

Response Count Factor Covar

0 1 1 12

1 19 1 12

0 1 2 12

1 19 2 12

0 5 1 24

1 15 1 24

0 4 2 24

1 16 2 24

0 7 1 50

1 13 1 50

0 8 2 50

1 12 2 50

0 11 1 125

1 2 1 125

0 9 2 125

1 11 2 125

0 19 1 200

1 1 1 200

0 18 2 200

1 2 2 200

Raw Data: one row for each

observation

Frequency Data: one row for each

combination of factor and

1

19

1

19

STREGRSN.MK5 Page 32 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-33

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

The binary response entered as the number of successes, failures, or trials

Enter one row for each combination of factor and covariate.

Use caution when viewing large regression coefficients

If the absolute value of the regression coefficient is large, exercise caution in judging the

p-value of the test. When the absolute regression coefficients are large, their calculated

standard errors can be too large, leading you to conclude that they are not significant

[13]. If you have one or more large absolute regression coefficients for the factor(s) and/

or covariate(s), the best test is to perform logistic regression both with and without

these terms and make a conclusion based upon the change in the log-likelihood.

If you do test the significance of model terms in this way, your test statistic will be

2(log-likelihood from reduced model log-likelihood from full model). To compute

the p-value for this test, choose Calc Probability Distributions Chi-square. In

Degrees of freedom, enter the model degrees of freedom from full model model

degrees of freedom from reduced model, where the model degrees of freedom are the

number of estimated coefficients. Check Input constant, and enter the test statistic

from above. Store the answer in a constant, say k1, and then calculate the p-value as

1 k1 using Calc Calculator.

Binary Logistic Regression

Use binary logistic regression to perform logistic regression on a binary response

variable. A binary variable only has two possible values, such as presence or absence of a

particular disease. A model with one or more predictors is fit using an

iterative-reweighted least squares algorithm to obtain maximum likelihood estimates of

the parameters [19].

Successes and Trials

C1 C2 C3 C4

S T Factor Covar

19 20 1 12

19 20 2 12

15 20 1 24

16 20 2 24

13 20 1 50

12 20 2 50

9 20 1 125

11 20 2 125

1 20 1 200

2 20 2 200

Successes and Failures

C1 C2 C3 C4

S F Factor Covar

19 1 1 12

19 1 2 12

15 5 1 24

16 4 2 24

13 7 1 50

12 8 2 50

9 11 1 125

11 9 2 125

1 19 1 200

2 18 2 200

Failures and Trials

C1 C2 C3 C4

F T Factor Covar

1 20 1 12

1 20 2 12

5 20 1 24

4 20 2 24

7 20 1 50

8 20 2 50

11 20 1 125

9 20 2 125

19 20 1 200

18 20 2 200

STREGRSN.MK5 Page 33 Friday, December 17, 1999 12:23 PM

Chapter 2 Binary Logistic Regression

2-34 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Binary logistic regression has also been used to classify observations into one of two

categories, and it may give fewer classification errors than discriminant analysis for some

cases [10], [23].

Data

Your data must be arranged in your worksheet in one of five ways: as raw data, as

frequency data, as successes and trials, as successes and failures, or as failures and trials.

See Worksheet structure on page 2-32.

Factors, covariates, and response data can be numeric, text, or date/time. The reference

level and the reference event depend on the data type. See Reference levels for factors on

page 2-31 and Reference event for the response variable on page 2-31 for details.

The predictors may either be factors (nominal variables) or covariates (continuous

variables). Factors may be crossed or nested. Covariates may be crossed with each other

or with factors, or nested within factors.

The model can include up to 9 factors and 50 covariates. Unless you specify a predictor

in the model as a factor, the predictor is assumed to be a covariate. Model continuous

predictors as covariates and categorical predictors as factors. See How to specify the

model terms on page 2-30 for more information.

MINITAB automatically omits observations with missing values from all calculations.

h To do a binary logistic regression

1 Choose Stat Regression Binary Logistic Regression.

2 Do one of the following:

I If your data is in raw form, choose Response and enter the column containing the

response variable.

STREGRSN.MK5 Page 34 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-35

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

I If your data is in frequency form, choose Response and enter the column

containing the response variable. In Frequency, enter the column containing the

count or frequency variable.

I If your data is in success-trial, success-failure, or failure-trial form, choose Success

with Trial, Success with Failure, or Failure with Trial, and enter the respective

columns in the accompanying boxes.

See Worksheet structure on page 2-32.

3 In Model, enter the model terms. See How to specify the model terms on page 2-30.

4 If you like, use one or more of the options listed below, then click OK.

Options

Binary Logistic Regression dialog box

I include categorical variables (factors) in the model

Graphics subdialog box

I plot delta Pearson

2

, delta deviance, delta based on standardized Pearson

residuals, and delta based on Pearson residuals versus:

the estimated event probability for each distinct factor/covariate pattern

the leverage for each distinct factor/covariate pattern

See Regression diagnostics and residual analysis on page 2-38.

Options subdialog box

I specify the link function: logit (the default), normit (also called probit), or gompit

(also called complementary log-log)see Link functions on page 2-46

I change the reference event of the response or the reference levels for the factors

see Interpreting the parameter estimates relative to the event and the reference levels on

page 2-39

I specify initial values for model parameters or parameter estimates for a validation

modelsee Entering initial values for parameter estimates on page 2-37

I change the maximum number of iterations for reaching convergence (the default is

20)

I change the number of groups for the Hosmer-Lemeshow goodness-of-fit test from

the default of 10see Groups for the Hosmer-Lemeshow goodness-of-fit test on page

2-38

STREGRSN.MK5 Page 35 Friday, December 17, 1999 12:23 PM

Chapter 2 Binary Logistic Regression

2-36 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Results subdialog box

I display the following in the Session window:

no output.

basic information on response, parameter estimates, the log-likelihood, and the

test for all slopes being zero.

the default output, which includes the above output plus three goodness-of-fit

tests (Pearson, deviance, and Hosmer-Lemeshow), a table of observed and

expected frequencies, and measures of association.

the default, along with factor level values, and tests for terms with more than 1

degree of freedom. If you choose the logit link function, MINITAB also prints two

Brown goodness-of-fit tests.

I display the log-likelihood at each iteration of the parameter estimation process.

Storage subdialog box

I store the following diagnostic measures:

Pearson, standardized Pearson, and deviance residuals

changes (delta) in: Pearson

2

, the deviance statistic, and the estimated regression

coefficients based on either standardized Pearson or Pearson residuals when the

respective factor/covariate patterns are removed

leverages, the diagonals of the hat matrix

See Regression diagnostics and residual analysis on page 2-38.

I store the following characteristics of the estimated regression equation:

predicted probabilities of success

estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients

the log-likelihood for the last maximum likelihood iteration

I store the following aggregated data:

the number of occurrences for each factor/covariate pattern

the number of trials for each factor/covariate pattern

Link functions

MINITAB provides three link functionslogit (the default), normit (also called probit),

and gompit (also called complementary log-log)allowing you to fit a broad class of

binary response models. These are the inverse of the cumulative logistic distribution

function (logit), the inverse of the cumulative standard normal distribution function

(normit), and the inverse of the Gompertz distribution function (gompit). This class of

models is defined by:

STREGRSN.MK5 Page 36 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-37

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

g(

j

) = , where

The link function is the inverse of a distribution function. The link functions and their

corresponding distributions are summarized below (pi in the variance is 3.14159):

You want to choose a link function that results in a good fit to your data.

Goodness-of-fit statistics can be used to compare fits using different link functions.

Certain link functions may be used for historical reasons or because they have a special

meaning in a discipline.

An advantage of the logit link function is that it provides an estimate of the odds ratios.

For a comparison of link functions, see [19].

Entering initial values for parameter estimates

There are several scenarios for which you might enter values for parameter estimates.

For example, you may wish to give starting estimates so that the algorithm converges to

a solution, or you may wish to validate a model with an independent sample.

I ConvergenceThe maximum likelihood solution may not converge if the starting

estimates are not in the neighborhood of the true solution. If the algorithm does not

converge to a solution, you can specify what you think are good starting values for

parameter estimates in Starting estimates for algorithm in the Options subdialog

box.

I ValidationYou may also wish to validate the model with an independent sample.

Typically, this is done by splitting the data into two subsets. Use the first set to

estimate and store the coefficients. If you enter these estimates in Estimates for

validation model in the Options subdialog box, MINITAB will use these values as the

parameter estimates rather than calculating new parameter estimates. Then, you can

assess the model fit for the independent sample.

j

= the probability of a response for the j

th

factor/covariate pattern

g(

j

) = the link function (described below)

0

= the intercept

= a vector of predictor variables associated with the j

th

factor/covariate pattern

= a vector of unknown coefficients associated with the predictors

Name Link function Distribution Mean Variance

logit g(

j

) = log

e

(

j

/ (1

j

)) logistic 0 pi

2

/ 3

normit g(

j

) =

-1

(

j

) normal 0 1

gompit g(

j

) = log

e

(log

e

(1

j

)) Gompertz (Euler constant) pi

2

/ 6

0

x

j

+

x

j

STREGRSN.MK5 Page 37 Friday, December 17, 1999 12:23 PM

Chapter 2 Binary Logistic Regression

2-38 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

In both cases, enter a column with the first entry being the constant estimate, and the

remaining entries corresponding to the model terms in the order in which they appear

in the Model box or the output.

Groups for the Hosmer-Lemeshow goodness-of-fit test

The Hosmer-Lemeshow statistic is the chi-square goodness-of-fit statistic from a 2 (the

number of groups) table. The default number of groups is 10. This may work for a large

number of problems but if the number of distinct factor/covariate patterns is small or

large you may wish to adjust the number of groups. Hosmer and Lemeshow suggest

using a minimum of six groups. See [16] for details.

Regression diagnostics and residual analysis

Following any modeling procedure, it is a good idea to assess the validity of your

model. Logistic regression has a collection of diagnostic plots, goodness-of-fits tests,

and other diagnostic measures to do this. These residuals and diagnostic statistics allow

you to identify factor/covariate patterns that are either poorly fit by the model, have a

strong influence upon the estimated parameters, or which have a high leverage.

MINITAB provides different options for each of these, as listed in the following table (See

Help for computational details). Hosmer and Lemeshow [16] suggest that you interpret

these diagnostics jointly to understand any potential problems with the model.

To identify Use... Which measures...

poorly fit factor/

covariate patterns

Pearson residual the difference between the actual and

predicted observation

standardized

Pearson residual

the difference between the actual and

predicted observation, but standardized to

have = 1

deviance residual deviance residuals, a component of deviance

2

delta chi-square changes in the Pearson

2

when the j

th

factor/

covariate pattern is removed

delta deviance changes in the deviance

when the j

th

factor/

covariate pattern is removed

factor/covariate

patterns with a

strong influence

on parameter

estimates

delta beta changes in the coefficients

when the j

th

factor/

covariate pattern is removedbased on

Pearson residuals

delta beta based

on standardized

Pearson residuals

changes in the coefficients when the j

th

factor/

covariate pattern is removedbased on

standardized Pearson residuals

factor/covariate

patterns with a

large leverage

leverage (Hi) leverages of the j

th

factor/covariate pattern, a

measure of how unusual predictor values are

STREGRSN.MK5 Page 38 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-39

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

The graphs available in the Graphs subdialog box allow you to visualize some of these

diagnostics jointly; you can plot a measure useful for identifying poorly fit factor/

covariate patterns (delta chi-square or delta deviance) or a measure useful for

identifying a factor/covariate pattern with a strong influence on parameter estimates

(one of the delta beta statistics) versus either the estimated event probability or

leverage. The estimated event probability is the probability of the event, given the data

and model. Leverages are used to assess how unusual the predictor values are (see

Identifying outliers on page 2-9). See [16] for a further discussion of diagnostic plots. You

can use MINITABs graph brushing capabilities to identify points. See the Brushing Graphs

chapter in MINITAB Users Guide 1 for more information.

Interpreting the parameter estimates relative to the event and the

reference levels

The interpretation of the parameter estimates depends on: the link function (see Link

functions on page 2-36), reference event (see Reference event for the response variable on

page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A

parameter estimate associated with a predictor (factor or covariate) represents the

change in the link function for each unit change in the predictor, while all other

predictors are held constant. A unit change in a factor refers to a comparison of a

certain level to the reference level.

The logit link provides the most natural interpretation of the parameter estimates and is

therefore the default link in MINITAB. A summary of the interpretation follows:

I The odds of a reference event is the ratio of P(event) to P(not event). The parameter

estimate of a predictor (factor or covariate) is the estimated change in the log of

P(event)/P(not event) for each unit change in the predictor, assuming the other

predictors remain constant.

I The parameter estimates can also be used to calculate the odds ratio, or the ratio

between two odds. Exponentiating the parameter estimate of a factor yields the ratio

of P(event)/P(not event) for a certain factor level compared to the reference level.

The odds ratios at different values of the covariate can be constructed relative to

zero. In the covariate case, it may be more meaningful to interpret the odds and not

the odds ratio. Note that a parameter estimate of zero or an odds ratio of one both

imply the same thingthe factor or covariate has no effect.

You can change the event or reference levels in the Options subdialog box if you wish to

change how you view the parameter estimates. To change the event, specify the new

event value in the Event box. To change the reference level for a factor, specify the

factor variable followed by the new reference level in the Reference factor level box.

You can specify reference levels for more than one factor at the same time. If the levels

are text or date/time, enclose them in double quotes.

STREGRSN.MK5 Page 39 Friday, December 17, 1999 12:23 PM

Chapter 2 Binary Logistic Regression

2-40 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

e Example of a binary logistic regression

You are a researcher who is interested in understanding the effect of smoking and

weight upon resting pulse rate. Because you have categorized the responsepulse

rateinto low and high, a binary logistic regression analysis is appropriate to

investigate the effects of smoking and weight upon pulse rate.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Binary Logistic Regression.

3 In Response, enter RestingPulse. In Model, enter Smokes Weight. In Factors

(optional), enter Smokes.

4 Click Graphs. Check Delta chi-square vs probability and Delta chi-square vs

leverage. Click OK.

5 Click Results. Choose In addition, list of factor level values, tests for terms with

more than 1 degree of freedom, and 2 additional goodness-of-fit tests. Click OK

in each dialog box.

Session

window

output

Binary Logistic Regression: RestingPulse versus Smokes, Weight

Link Function: Logit

Response Information

Variable Value Count

RestingP Low 70 (Event)

High 22

Total 92

Factor Information

Factor Levels Values

Smokes 2 No Yes

Logistic Regression Table

Odds 95% CI

Predictor Coef SE Coef Z P Ratio Lower Upper

Constant -1.987 1.679 -1.18 0.237

Smokes

Yes -1.1930 0.5530 -2.16 0.031 0.30 0.10 0.90

Weight 0.02502 0.01226 2.04 0.041 1.03 1.00 1.05

Log-Likelihood = -46.820

Test that all slopes are zero: G = 7.574, DF = 2, P-Value = 0.023

Goodness-of-Fit Tests

Method Chi-Square DF P

Pearson 40.848 47 0.724

Deviance 51.201 47 0.312

Hosmer-Lemeshow 4.745 8 0.784

Brown:

General Alternative 0.905 2 0.636

Symmetric Alternative 0.463 1 0.496

A

B

C

E

D

STREGRSN.MK5 Page 40 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-41

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Graph

window

output

Table of Observed and Expected Frequencies:

(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)

Group

Value 1 2 3 4 5 6 7 8 9 10 Total

Low

Obs 4 6 6 8 8 6 8 12 10 2 70

Exp 4.4 6.4 6.3 6.6 6.9 7.2 8.3 12.9 9.1 1.9

High

Obs 5 4 3 1 1 3 2 3 0 0 22

Exp 4.6 3.6 2.7 2.4 2.1 1.8 1.7 2.1 0.9 0.1

Total 9 10 9 9 9 9 10 15 10 2 92

Measures of Association:

(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures

Concordant 1045 67.9% Somers D 0.38

Discordant 461 29.9% Goodman-Kruskal Gamma 0.39

Ties 34 2.2% Kendalls Tau-a 0.14

Total 1540 100.0%

G

F

STREGRSN.MK5 Page 41 Friday, December 17, 1999 12:23 PM

Chapter 2 Binary Logistic Regression

2-42 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Interpretation of results

The Session window output contains the following seven parts:

a. Response Informationdisplays the number of missing observations and the number of

observations that fall into each of the two response categories. The response value that has

been designated as the reference event is the first entry under Value and labeled as the event.

In this case, the reference event is low pulse rate (see Reference event for the response variable

on page 2-31).

B Factor Informationdisplays all the factors in the model, the number of levels for each

factor, and the factor level values. The factor level that has been designated as the reference

level is first entry under Values, the subject does not smoke (see Reference levels for factors on

page 2-31).

C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard

error of the coefficients, z-values, and p-values. When you use the logit link function, you

also see the odds ratio and a 95% confidence interval for the odds ratio.

I From the output, you can see that both Smokes (z = 2.16, p = 0.031) and Weight

(z = 2.04, p = 0.041) have p-values less than 0.05, indicating that there is

sufficient evidence that the parameters are not zero using a significance level of

= 0.05.

I The coefficient of -1.193 for Smokes represents the estimated change in the log of

P(low pulse)/P(high pulse) when the subject smokes compared to when he/she

does not smoke, with the covariate Weight held constant. The coefficient of

0.0250 for Weight is the estimated change in the log of P(low pulse)/P(high pulse)

with a 1 unit (lb) increase in Weight, with the factor Smokes held constant.

I Although there is evidence that the parameter of Weight is not zero, the odds ratio

is very close to one (1.03), indicating that a one pound increase in weight

minimally effects a persons resting pulse rate. A more meaningful difference

would be found if you compared subjects with a larger weight difference (for

example, if the weight unit is 10 pounds, the odds ratio becomes 1.28, indicating

that the odds of a subject having a low pulse increases by 1.28 times with each 10

pound increase in weight).

I For Smokes, the negative coefficient of -1.193 and the odds ratio of 0.30 indicate

that subjects who smoke tend to have a higher resting pulse rate than subjects

who do not smoke. Given that subjects have the same weight, the odds ratio can

be interpreted as the odds of non-smokers in the sample having a low pulse being

30% of the odds of smokers having a low pulse.

D Next, the last Log-Likelihood from the maximum likelihood iterations is displayed along with

the statistic G. This statistic tests the null hypothesis that all the coefficients associated with

predictors equal zero versus these coefficients not all being equal to zero. In this example, G

= 7.574, with a p-value of 0.023, indicating that there is sufficient evidence that at least one of

the coefficients is different from zero, given that your accepted level is greater than 0.023.

STREGRSN.MK5 Page 42 Friday, December 17, 1999 12:23 PM

Binary Logistic Regression Regression

MINITAB Users Guide 2 2-43

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

I Note that for factors with more than 1 degree of freedom, MINITAB performs a

multiple degrees of freedom test with a null hypothesis that all the coefficients

associated with the factor are equal to 0 versus them not all being equal to 0. This

example does not have a factor with more than 1 degree of freedom.

E Goodness-of-Fit Testsdisplays Pearson, deviance, and Hosmer-Lemeshow goodness-of-fit

tests. In addition, two Brown testsgeneral alternative and symmetric alternativeare

displayed because you have chosen the logit link function and the selected option in the

Results subdialog box. The goodness-of-fit tests, with p-values ranging from 0.312 to 0.724,

indicate that there is insufficient evidence to claim that the model does not fit the data

adequately. If the p-value is less than your accepted level, the test would reject the null

hypothesis of an adequate fit.

F Table of Observed and Expected Frequenciesallows you to see how well the model fits

the data by comparing the observed and expected frequencies. There is insufficient evidence

that the model does not fit the data well, as the observed and expected frequencies are

similar. This supports the conclusions made by the Goodness of Fit Tests.

G Measures of Associationdisplay a table of the number and percentage of concordant,

discordant, and tied pairs, as well as common rank correlation statistics. These values

measure the association between the observed responses and the predicted probabilities.

I The table of concordant, discordant, and tied pairs is calculated by pairing the

observations with different response values. Here, you have 70 individuals with a

low pulse and 22 with a high pulse, resulting in 70 22 = 1540 pairs with different

response values. Based on the model, a pair is concordant if the individual with a

low pulse rate has a higher probability of having a low pulse, discordant if the

opposite is true, and tied if the probabilities are equal. In this example, 67.9% of

pairs are concordant and 29.9% are discordant. You can use these values as a

comparative measure of prediction, for example in comparing fits with different

sets of predictors or with different link functions.

I Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the

table of concordant and discordant pairs. These measures most likely lie between

0 and 1 where larger values indicate that the model has a better predictive ability.

In this example, the measure range from 0.14 to 0.39 which implies less than

desirable predictive ability.

PlotsIn the example, you chose two diagnostic plotsdelta Pearson

2

versus the

estimated event probability and delta Pearson

2

versus the leverage. Delta Pearson

2

for the j

th

factor/covariate pattern is the change in the Pearson

2

when all observations

with that factor/covariate pattern are omitted. These two graphs indicate that two

observations are not well fit by the model (high delta

2

). A high delta

2

can be caused

by a high leverage and/or a high Pearson residual. In this case, a high Pearson residual

caused the large delta

2

, because the leverages are less than 0.1. Hosmer and

Lemeshow indicate that delta

2

or delta deviance greater than 3.84 is large.

If you choose Editor Brush, brush these points, and then click on them, they will be

identified as data values 31 and 66. These are individuals with a high resting pulse, who

STREGRSN.MK5 Page 43 Friday, December 17, 1999 12:23 PM

Chapter 2 Ordinal Logistic Regression

2-44 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

do not smoke, and who have smaller than average weights (Weight = 116, 136

pounds). You might further investigate these cases to see why the model did not fit

them well.

Ordinal Logistic Regression

Use ordinal logistic regression to perform logistic regression on an ordinal response

variable. Ordinal variables are categorical variables that have three or more possible

levels with a natural ordering, such as strongly disagree, disagree, neutral, agree, and

strongly agree. A model with one or more predictors is fit using an iterative-reweighted

least squares algorithm to obtain maximum likelihood estimates of the parameters [19].

Parallel regression lines are assumed, and therefore, a single slope is calculated for each

covariate. In situations where this assumption is not valid, nominal logistic regression,

which generates separate logit functions, is more appropriate.

Data

Your data may be arranged in one of two ways: as raw data or as frequency data. See

Worksheet structure on page 2-32.

Factors, covariates, and response data can be numeric, text, or date/time. The reference

level and the reference event depend on the data type. See Reference levels for factors on

page 2-31 and Reference event for the response variable on page 2-31 for details.

The predictors may either be factors (nominal variables) or covariates (continuous

variables). Factors may be crossed or nested. Covariates may be crossed with each other

or with factors, or nested within factors.

The model can include up to 9 factors and 50 covariates. Unless you specify a predictor

in the model as a factor, the predictor is assumed to be a covariate. Model continuous

predictors as covariates and categorical predictors as factors. See How to specify the

model terms on page 2-30 for more information.

MINITAB automatically omits observations with missing values from all calculations.

STREGRSN.MK5 Page 44 Friday, December 17, 1999 12:23 PM

Ordinal Logistic Regression Regression

MINITAB Users Guide 2 2-45

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

h To do an ordinal logistic regression

1 Choose Stat Regression Ordinal Logistic Regression.

2 Do one of the following:

I If you have raw response data, in Response, enter the numeric column containing

the response data.

I If you have frequency data, in Response, enter the numeric column containing

the response values. Then, in Frequency, enter the variable containing the counts.

See Worksheet structure on page 2-32.

3 In Model, enter the model terms. See How to specify the model terms on page 2-30.

4 If you like, use one or more of the options listed below, then click OK.

Options

Ordinal Logistic Regression dialog box

I include categorical variables (factors) in the model

Options subdialog box

I specify the link function: logit (the default), normit (also called probit), or gompit

(also called complementary log-log)see Link functions on page 2-46

I specify the order of the response values or change the reference levels for the

factorssee Interpreting the parameter estimates relative to the order of response values

and the reference levels on page 2-47

I specify initial values for model parameters or parameter estimates for a validation

modelsee Entering initial values for parameter estimates on page 2-37

I change the maximum number of iterations for reaching convergence (default is 20)

STREGRSN.MK5 Page 45 Friday, December 17, 1999 12:23 PM

Chapter 2 Ordinal Logistic Regression

2-46 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Results subdialog box

I display the following in the Session window:

no output

information on response, parameter estimates, the log-likelihood, and the test for

all slopes being zero

the default output, which includes the above output plus two goodness-of-fit tests

(Pearson and deviance), and measures of association

the default output plus factor level values, and tests for terms with more than one

degree of freedom

I display the log-likelihood at each iteration of the parameter estimation process

Storage subdialog box

I store the following characteristics of the estimated regression equation:

estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients.

the log-likelihood for the last maximum likelihood iteration.

I store the following event probabilities and/or aggregated data:

the number of trials for each factor/covariate pattern.

event probabilities, cumulative event probabilities, and number of occurrences for

each factor/covariate pattern. Note that the number of events (distinct response

values) must be specified to store these values.

Link functions

MINITAB provides three link functionslogit (the default), normit (also called probit),

and gompit (also called complementary log-log)allowing you to fit a broad class of

ordinal response models. These are the inverse of the cumulative logistic distribution

function (logit), the inverse of the cumulative standard normal distribution function

(normit), and the inverse of the Gompertz distribution function (gompit). This class of

models is defined by:

g(

ij

) =

where

k = the number of distinct values of the response or the number of possible events

ij

= the cumulative probability up to and including event i for the j

th

factor/

covariate pattern

g(

ij

) = the link function (described below)

= the constant associated with the i

th

event

i

x

j

+ , i=1, ..., k-1

i

STREGRSN.MK5 Page 46 Friday, December 17, 1999 12:23 PM

Ordinal Logistic Regression Regression

MINITAB Users Guide 2 2-47

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

The link function is the inverse of a distribution function. The link functions and their

corresponding distributions are summarized below ( in the variance is 3.14159):

You want to choose a link function that results in a good fit to your data.

Goodness-of-fit statistics can be used to compare the fits using different link functions.

Certain link functions may be used for historical reasons or because they have a special

meaning in a discipline.

An advantage of the logit link function is that it provides an estimate of the odds ratios.

The logit link function is the default. For a comparison of link functions, see [19].

Interpreting the parameter estimates relative to the order of response

values and the reference levels

The interpretation of the parameter estimates depends on: the link function (see Link

functions on page 2-36), reference event (see Reference event for the response variable on

page 2-31), and reference factor levels (see Reference levels for factors on page 2-31). A

parameter estimate associated with a predictor (factor or covariate) represents the

change in the link function for each unit change in the predictor, while all other

predictors are held constant. A unit change in a factor refers to a comparison of a

certain level to the reference level.

The logit link provides the most natural interpretation of the estimated coefficients and

is therefore the default link in MINITAB. A summary of the interpretation follows:

I The odds of a reference event is the ratio of P(event) to P(not event). The estimated

coefficient of a predictor (factor or covariate) is the estimated change in the log of

P(event)/P(not event) for each unit change in the predictor, assuming the other

predictors remain constant.

I The estimated coefficient can also be used to calculate the odds ratio, or the ratio

between two odds. Exponentiating the parameter estimate of a factor yields the ratio

of P(event)/P(not event) for a certain factor level compared to the reference level.

The odds ratios at different values of the covariate can be constructed relative to

zero. In the covariate case, it may be more meaningful to interpret the odds and not

= a vector of predictor variables associated with the j

th

factor/covariate pattern

= a vector of coefficients associated with the predictors

Name Link function Distribution Mean Variance

logit g(

ij

) = log

e

(

ij

/ (1

ij

)) logistic 0 pi

2

/ 3

normit g(

ij

) =

-1

(

ij

) normal 0 1

gompit g(

ij

) = log

e

(log

e

(1

ij

)) Gompertz (Euler constant) pi

2

/ 6

x

j

STREGRSN.MK5 Page 47 Friday, December 17, 1999 12:23 PM

Chapter 2 Ordinal Logistic Regression

2-48 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the

same thingthe factor or covariate has no effect.

You can change the order of response values or the reference level in the Options

subdialog box if you wish to change how you view the parameter estimates. If your

responses were coded Low, Medium, and High, rather than 1, 2, 3, the default

alphabetical ordering of the responses would be improper. To change the order of

response values, specify the new order in the Order of the response values box. To

order as Low, Medium, and High, enter these values in this order, each enclosed in

double quotes. To change the reference level for a factor, specify the factor variable and

the new reference level in the Reference factor level box. You can specify reference

levels for more than one factor at the same time. If the levels are text or date/time,

enclose them in double quotes.

e Example of an ordinal logistic regression

Suppose you are a field biologist and you believe that the adult population of

salamanders in the Northeast has gotten smaller over the past few years. You would like

to determine whether any association exists between the length of time a hatched

salamander survives and level of water toxicity, as well as whether there is a regional

effect. Survival time is coded as 1 if it is less than 10 days, 2 if it is equal to 10 to 30

days, and 3 if it is equal to 31 to 60 days.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Ordinal Logistic Regression.

3 In Response, enter Survival. In Model, enter Region ToxicLevel. In Factors (optional),

enter Region.

4 Click Results. Choose In addition, list of factor level values, and tests for terms

with more than 1 degree of freedom. Click OK in each dialog box.

STREGRSN.MK5 Page 48 Friday, December 17, 1999 12:23 PM

Ordinal Logistic Regression Regression

MINITAB Users Guide 2 2-49

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Session

window

output

Interpreting the results

The Session window contains the following five parts:

a. Response Informationdisplays the number of observations that fall into each of the

response categories and the number of missing observations. The ordered response values,

from lowest to highest, are shown. Here, you use the default coding scheme, which orders

Ordinal Logistic Regression: Survival versus Region, ToxicLevel

Link Function: Logit

Response Information

Variable Value Count

Survival 1 15

2 46

3 12

Total 73

Factor Information

Factor Levels Values

Region 2 1 2

Logistic Regression Table

Odds 95% CI

Predictor Coef SE Coef Z P Ratio Lower Upper

Const(1) -7.043 1.680 -4.19 0.000

Const(2) -3.523 1.471 -2.39 0.017

Region

2 0.2015 0.4962 0.41 0.685 1.22 0.46 3.23

ToxicLev 0.12129 0.03405 3.56 0.000 1.13 1.06 1.21

Log-likelihood = -59.290

Test that all slopes are zero: G = 14.713, DF = 2, P-Value = 0.001

Goodness-of-Fit Tests

Method Chi-Square DF P

Pearson 122.799 122 0.463

Deviance 100.898 122 0.918

Measures of Association:

(Between the Response Variable and Predicted Probabilities)

Pairs Number Percent Summary Measures

Concordant 1127 79.3% Somers D 0.59

Discordant 288 20.3% Goodman-Kruskal Gamma 0.59

Ties 7 0.5% Kendalls Tau-a 0.32

Total 1422 100.0%

D

A

B

C

E

STREGRSN.MK5 Page 49 Friday, December 17, 1999 12:23 PM

Chapter 2 Ordinal Logistic Regression

2-50 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

the values from lowest to highest: 1 is less than 10 days, 2 is equal to 10 to 30 days, and 3 is

equal to 31 to 60 days (see Reference event for the response variable on page 2-31).

B Factor Informationdisplays all the factors in the model, the number of levels for each

factor, and the factor level values. The factor level that has been designated as the reference

level is the first entry under Values, region 1 (see Reference levels for factors on page 2-31).

C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard

error of the coefficients, z-values, and p-values. When you use the logit link function,

MINITAB displays the calculated odds ratio and a 95% confidence interval for the odds ratio.

I The values labeled Const(1) and Const(2) are estimated intercepts for the logits of

the cumulative probabilities of survival for less than 10 days, and for 10 to 30 days,

respectively. Because the cumulative probability for the last response value is 1,

there is no need to estimate an intercept for 31 to 60 days.

I The coefficient of 0.2015 for Region is the estimated change in the logit of the

cumulative survival time probability when the region is 2 compared to region

being 1, with the covariate ToxicLevel held constant. Because the p-value for this

parameter estimate is 0.685, there is insufficient evidence to conclude that Region

has an effect upon survival time.

I There is one parameter estimated for each covariate, which gives parallel lines for

the factor levels. Here, the estimated coefficient for the single covariate,

ToxicLevel, is 0.121, with a p-value of less than 0.0005. The p-value indicates that

there is sufficient evidence to conclude that the toxic level affects survival. The

positive coefficient, and an odds ratio that is greater than one, indicates that

higher toxic levels tend to be associated with lower values of survival.

I Next, MINITAB displays the last Log-Likelihood from the maximum likelihood

iterations along with the statistic G. This statistics tests the null hypothesis that all

the coefficients associated with predictors equal to 0 versus them not all being

equal to 0. In this example, G = 14.713 with a p-value of 0.001, indicating that

there is sufficient evidence to conclude that at least one of the coefficients is

different from zero.

D Goodness-of-Fit Testsdisplays both Pearson and deviance goodness-of-fit tests. In our

example, the p-value for the Pearson test is 0.463, and the p-value for the deviance test is

0.918, indicating that there is insufficient evidence to claim that the model does not fit the

data adequately. If the p-value is less than your selected level, the tests rejects the null

hypothesis of an adequate fit.

E Measures of Associationdisplays a table of the number and percentage of concordant,

discordant, and tied pairs, as well as common rank correlation statistics. These values

measure the association between the observed responses and the predicted probabilities.

I The table of concordant, discordant, and tied pairs is calculated by pairing the

observations with different response values. Here, you have fifteen 1s, fourty-six

2s, and twelve 3s, resulting in (15 46) + (15 12) + (46 12) = 1422 pairs of

different response values. Pairs involving the lowest coded response value (the 1-2

STREGRSN.MK5 Page 50 Friday, December 17, 1999 12:23 PM

Nominal Logistic Regression Regression

MINITAB Users Guide 2 2-51

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

and 1-3 value pairs in the example) are concordant if the cumulative probability

up to the lowest response value (here 1) is greater for the observation with the

lowest value. This pattern applies to other value pairs. Pairs involving responses

coded as 2 and 3 in this example are concordant if the cumulative probability up

to 2 is greater for the observation coded as 2. The pair is discordant if the opposite

is true and tied if the cumulative probabilities are equal. In this example, 79.3% of

pairs are concordant, 20.3% are discordant, and 0.5% are ties. You can use these

values as a comparative measure of prediction (for example, when evaluating

predictors and different link functions).

I Somers D, Goodman-Kruskal Gamma, and Kendalls Tau-a are summaries of the

table of concordant and discordant pairs. The numbers have the same numerator

(the number of concordant pairs minus the number of discordant pairs). The

denominators are the total number of pairs with Somers D, the total number of

pairs excepting ties with Goodman-Kruskal Gamma, and the number of all

possible observation pairs for Kendalls Tau-a. These measures most likely lie

between 0 and 1, where larger values indicate that the model has a better

predictive ability.

Nominal Logistic Regression

Use nominal logistic regression to perform logistic regression on a nominal response

variable using an iterative-reweighted least squares algorithm to obtain maximum

likelihood estimates of the parameters [19]. Nominal variables are categorical variables

that have three or more possible levels with no natural ordering. For example, the levels

in a food tasting study may include crunchy, mushy, and crispy.

Data

Your data may be arranged in one of two ways: as raw data or as frequency data. See

Worksheet structure on page 2-32.

Factors, covariates, and response data can be numeric, text, or date/time. The reference

level and the reference event depend on the data type. See Reference levels for factors on

page 2-31 and Reference event for the response variable on page 2-31 for details.

The predictors may either be factors (nominal variables) or covariates (continuous

variables). Factors may be crossed or nested. Covariates may be crossed with each other

or with factors, or nested within factors.

The model can include up to 9 factors and 50 covariates. Unless you specify a predictor

in the model as a factor, the predictor is assumed to be a covariate. Model continuous

predictors as covariates and categorical predictors as factors. See How to specify the

model terms on page 2-30 for more information.

STREGRSN.MK5 Page 51 Friday, December 17, 1999 12:23 PM

Chapter 2 Nominal Logistic Regression

2-52 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MINITAB automatically omits observations with missing values from all calculations.

h To do a nominal logistic regression

1 Choose Stat Regression Nominal Logistic Regression.

2 Do one of the following:

I If you have raw response data, in Response, enter the numeric column containing

the response data.

I If you have frequency data, in Response, enter the numeric column containing

the response values. Then, in Frequency, enter the variable containing the counts.

See Worksheet structure on page 2-32.

3 In Model, enter the model terms. See How to specify the model terms on page 2-30.

4 If you like, use one or more of the options listed below, then click OK.

Options

Nominal Logistic Regression dialog box

I define the factors (categorical variables) in the model

Options subdialog box

I change the reference event of the response or the reference levels for the factors

see Interpreting the parameter estimates relative to the reference event and reference

levels on page 2-54

I specify initial values for model parameters or parameter estimates for a validation

modelsee Entering initial values for parameter estimates on page 2-37

I change the maximum number of iterations for reaching convergence from the

default of 20

STREGRSN.MK5 Page 52 Friday, December 17, 1999 12:23 PM

Nominal Logistic Regression Regression

MINITAB Users Guide 2 2-53

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Results subdialog box

I display the following in the Session window:

no output

basic information on response, parameter estimates, the log-likelihood, and the

test for all slopes being zero

the default output, which includes the above output plus two goodness-of-fit tests

(Pearson and deviance)

the default output, plus factor level values and tests for terms with more than one

degree of freedom

I display the log-likelihood at each iteration of the parameter estimation process

Storage subdialog box

I store characteristics of the estimated regression equation:

estimated model coefficients, their standard errors, and the variance-covariance

matrix of the estimated coefficients.

the log-likelihood for the last maximum likelihood iteration.

I store event probabilities and/or aggregated data:

the number of trials for each factor/covariate pattern.

event probabilities and number of occurrences for each factor/covariate pattern.

Note that the number of events (distinct response values) must be specified to

store these values.

STREGRSN.MK5 Page 53 Friday, December 17, 1999 12:23 PM

Chapter 2 Nominal Logistic Regression

2-54 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Nominal logistic regression model

The model fit by nominal logistic regression is defined by:

, where

See [16] for additional discussion.

Interpreting the parameter estimates relative to the reference event

and reference levels

The interpretation of the parameter estimates depends upon the designated reference

event (see Reference event for the response variable on page 2-31) and reference factor

levels (see Reference levels for factors on page 2-31). A parameter estimate associated

with a predictor represents the change in the particular logit for each unit change in the

predictor, assuming that all other factors and covariates are held constant. A one unit

change in a factor refers to a comparison of a certain level to the reference level.

If there are k distinct response values, MINITAB estimates k1 sets of parameter estimates.

These are the estimated differences in log odds or logits of levels of the response

variable relative to the reference event. Each set contains a constant and coefficients for

the factors and the covariates. Note that these sets of parameter estimates give

nonparallel lines for the response value. The interpretation of the parameter estimates is

as follows:

I The coefficient of a predictor (factor or covariate) is the estimated change in the log

of P(response level)/P(reference event) for each unit change in the predictor,

assuming the other predictors remain constant.

I The coefficient can also be used to calculate the odds ratio, or the ratio between two

odds. Exponentiating the parameter estimate of a factor yields the ratio of P(response

level)/P(reference event) for a certain factor level compared to the reference level.

The odds ratios at different values of the covariate can be constructed relative to

zero. In the covariate case, it may be more meaningful to interpret the odds and not

k = the number of distinct values of the response or the number of

possible events

ij

= the probability of the i

th

event for the j

th

factor/covariate pattern

(

1j

is the probability of the reference event for the j

th

factor/covariate

pattern)

b

i0

= the intercept for the (i1)

st

logit function

= a vector of predictor variables for the j

th

factor/covariate pattern

b

i

= a vector of unknown coefficients associated with the predictors for the

(i1)

st

logit function

log

e

i j

1j

-------

i 0

x

j

+

i

, i=2, ..., k =

x

j

STREGRSN.MK5 Page 54 Friday, December 17, 1999 12:23 PM

Nominal Logistic Regression Regression

MINITAB Users Guide 2 2-55

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

the odds ratio. Note that a coefficient of zero or an odds ratio of one both imply the

same thingthe factor or covariate has no effect.

You can change the reference event or reference levels in the Options subdialog box if

you wish to change how you view the parameter estimates. To change the event,

specify the new event value in the Reference event box. To change the reference

event, give the new level. To change the reference level for a factor, specify the factor

variable followed by the new reference level in the Reference factor level box. You can

specify reference levels for more than one factor at the same time. If the levels are text

or date/time, enclose them in double quotes.

e Example of a nominal logistic regression

Suppose you are a grade school curriculum director interested in what children identify

as their favorite subject and how this subject is associated with their age or the teaching

method employed. Thirty children, 10 to 13 years old, had classroom instruction in

science, math, and language arts that employed either lecture or discussion techniques.

At the end of the school year, they were asked to identify their favorite subject. You use

nominal logistic regression because the response is categorical but possesses no implicit

categorical ordering.

1 Open the worksheet EXH_REGR.MTW.

2 Choose Stat Regression Nominal Logistic Regression.

3 In Response, enter Subject. In Model, enter TeachingMethod Age. In Factors

(optional), enter TeachingMethod.

4 Click Results. Choose In addition, list of factor level values, tests for terms with

more than 1 degree of freedom. Click OK in each dialog box.

STREGRSN.MK5 Page 55 Friday, December 17, 1999 12:23 PM

Chapter 2 Nominal Logistic Regression

2-56 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

Session

window

output

Interpreting the results

The Session window output contains the following five parts:

a. Response Informationdisplays the number of observations that fall into each of the

response categories (science, math, and language arts), as well as the number of missing

observations. The response value that has been designated as the reference event is the first

entry under Value and labeled as the reference event. Here, the default coding scheme

defines the reference event as science using reverse alphabetical order.

B Factor Informationdisplays all the factors in the model, the number of levels for each

factor, and the factor level values. The factor level that has been designated as the reference

Nominal Logistic Regression: Subject versus TeachingMethod, Age

Response Information

Variable Value Count

Subject science 10 (Reference Event)

math 11

arts 9

Total 30

Factor Information

Factor Levels Values

Teaching 2 discuss lecture

Logistic Regression Table

Odds 95% CI

Predictor Coef SE Coef Z P Ratio Lower Upper

Logit 1: (math/science)

Constant -1.123 4.564 -0.25 0.806

Teaching

lecture -0.5631 0.9376 -0.60 0.548 0.57 0.09 3.58

Age 0.1247 0.4011 0.31 0.756 1.13 0.52 2.49

Logit 2: (arts/science)

Constant -13.848 7.243 -1.91 0.056

Teaching

lecture 2.770 1.372 2.02 0.044 15.96 1.08 234.91

Age 1.0135 0.5845 1.73 0.083 2.76 0.88 8.66

Log-likelihood = -26.446

Test that all slopes are zero: G = 12.825, DF = 4, P-Value = 0.012

Goodness-of-Fit Tests

Method Chi-Square DF P

Pearson 6.953 10 0.730

Deviance 7.886 10 0.640

A

B

C

D

E

STREGRSN.MK5 Page 56 Friday, December 17, 1999 12:23 PM

Nominal Logistic Regression Regression

MINITAB Users Guide 2 2-57

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

level is the first entry under Values. Here, the default coding scheme defines the reference

level as discussion using alphabetical order.

C Logistic Regression Tableshows the estimated coefficients (parameter estimates), standard

error of the coefficients, z-values, and p-values. MINITAB also displays the odds ratio and a

95% confidence interval for the odds ratio. The coefficient associated with a predictor is the

estimated change in the logit

with a one unit change in the predictor, assuming that all other

factors and covariates are the same.

I If there are k distinct response values, MINITAB estimates k1 sets of parameter

estimates, here labeled as Logit(1) and Logit(2). These are the estimated

differences in log odds or logits of math and language arts, respectively, compared

to science as the reference event. Each set contains a constant and coefficients for

the factors (here, teaching method) and the covariates (here, age). The

TeachingMethod coefficient is the estimated change in the logit when

TeachingMethod is lecture compared to the teaching method being discussion,

with Age held constant. The Age coefficient is the estimated change in the logit

with a one year increase in age with teaching method held constant. These sets of

parameter estimates give nonparallel lines for the response values.

I The first set of estimated logits, labeled Logit(1), are the parameter estimates of

the change in logits of math relative to the reference event, science. The p-values

of 0.548 and 0.756 for TeachingMethod and Age, respectively, indicate that there

is insufficient evidence to conclude that a change in teaching method from

discussion to lecture or a change in age affected the choice of math as favorite

subject compared to science.

I The second set of estimated logits, labeled Logit(2), are the parameter estimates of

the change in logits of language arts relative to the reference event, science. The

p-values of 0.044 and 0.083 for TeachingMethod and Age, respectively, indicate

that there is sufficient evidence, if the p-values are less than your acceptable

level, to conclude that a change in teaching method from discussion to lecture or

a change in age affected the choice of language arts as the favorite subject

compared to science. The positive coefficient for teaching method indicates

students given a lecture style of teaching tend to prefer language arts over science

compared to students given a discussion style of teaching. The estimated odds

ratio of 15.96 implies that the odds of choosing language arts over science is

about 16 times higher for these students when the teaching method changes

from discussion to lecture. The positive coefficient associated with age indicates

that students tend to like language arts over science as they become older.

D Next, MINITAB displays the last Log-Likelihood from the maximum likelihood iterations

along with the statistic G. G is the difference in 2 log-likelihood for a model that only has

the constant terms and the fitted model shown in the Logistic Regression Table. G is the test

statistic for testing the null hypothesis that all the coefficients associated with predictors being

equal to 0 versus them not all being equal to 0. G = 12.825 with a p-value of 0.012, which

indicates that at = 0.05 there is sufficient evidence for at least one coefficient being

different from 0.

STREGRSN.MK5 Page 57 Friday, December 17, 1999 12:23 PM

Chapter 2 References

2-58 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

E Goodness-of-Fit Testsdisplays Pearson and deviance goodness-of-fit tests. The p-value for

the Pearson test is 0.730 and the p-value for the deviance test is 0.640, indicating that there is

insufficient evidence for the model not fitting the data adequately. If the p-value is less than

your selected level, the test would indicate sufficient evidence for an inadequate fit.

References

[1] A. Agresti (1984). Analysis of Ordinal Categorical Data, John Wiley & Sons, Inc.

[2] A. Agresti (1990). Categorical Data Analysis, John Wiley & Sons, Inc.

[3] D.A. Belsley, E. Kuh, and R.E. Welsch (1980). Regression Diagnostics, John Wiley &

Sons, Inc.

[4] A. Bhargava (1989). Missing Observations and the Use of the Durbin-Watson

Statistic, Biometrika 76, 4, pp.828831.

[5] C.C. Brown (1982). On a Goodness of fit Test for the Logistic Model Based on

Score Statistics, Communications in Statistics, 11, pp.10871105.

[6] D.A. Burn and T.A. Ryan, Jr. (1983). A Diagnostic Test for Lack of Fit in Regression

Models, ASA 1983 Proceedings of the Statistical Computing Section, pp.286290.

[7] R.D. Cook (1977). Detection of Influential Observations in Linear Regression,

Technometrics 19, pp.1518.

[8] R.D. Cook and S. Weisberg (1982). Residuals and Influence in Regression, Chapman

and Hall.

[9] N.R. Draper and H. Smith (1981). Applied Regression Analysis, Second Edition, John

Wiley & Sons, Inc.

[10] S.E. Fienberg (1987). The Analysis of Cross-Classified Categorical Data. The MIT

Press.

[11] M.J. Garside (1971). Some Computational Procedures for the Best Subset

Problem, Applied Statistics 20, pp.815.

[12] James H. Goodnight (1979). A Tutorial on the Sweep Operator, The American

Statistician 33, pp.149158.

[13] W.W. Hauck and A. Donner (1977). Walds test as applied to hypotheses in logit

analysis. Journal of the American Statistical Association 72, 851-853.

[14] D.C. Hoaglin and R.E. Welsch (1978). The Hat Matrix in Regression and ANOVA,

The American Statistician 32, pp.1722, and Corrigenda 32, p.146.

[15] R.R. Hocking (1976). A Biometrics Invited Paper: The Analysis and Selection of

Variables in Linear Regression, Biometrics 32, pp.149.

STREGRSN.MK5 Page 58 Friday, December 17, 1999 12:23 PM

References Regression

MINITAB Users Guide 2 2-59

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

[16] D.W. Hosmer and S. Lemeshow (1989). Applied Logistic Regression, John Wiley &

Sons, Inc.

[17] LINPACK (1979). Linpack Users Guide by J.J. Dongarra, J.R. Bunch, C.B. Moler, and

G.W. Stewart, Society for Industrial and Applied Mathematics, Philadelphia, PA.

[18] J.H. Maindonald (1984). Statistical Computation, John Wiley & Sons, Inc.

[19] P. McCullagh and J.A. Nelder (1992). Generalized Linear Models, Chapman & Hall.

[20] W. Miller (1978). Performing Armchair Roundoff Analysis of Statistical

Algorithms, Communications in Statistics, pp.243255.

[21] D.C. Montgomery and E.A. Peck (1982). Introduction to Linear Regression Analysis.

John Wiley & Sons.

[22] J. Neter, W. Wasserman, and M. Kutner (1985). Applied Linear Statistical Models,

Richard D. Irwin, Inc.

[23] S.J. Press and S. Wilson (1978). Choosing Between Logistic Regression and

Discriminant Analysis, Journal of the American Statistical Association 73, 699-705.

[24] M. Schatzoff, R. Tsao, and S. Fienberg (1968). Efficient Calculation of All Possible

Regressions, Technometrics 10, pp.769779.

[25] G.W. Stewart (1973). Introduction to Matrix Computations, Academic Press.

[26] R.A. Thisted (1988). Elements of Statistical Computing: Numerical Computation,

Chapman & Hall.

[27] P. Velleman and R. Welsch (1981). Efficient Computation of Regression

Diagnostics, The American Statistician 35, pp.234242.

[28] P.F. Velleman, J. Seaman, and I.E. Allen (1977). Evaluating Package Regression

Routines, ASA 1977 Proceedings of the Statistical Computing Section.

[29] S. Weisberg (1980). Applied Linear Regression, John Wiley & Sons, Inc.

Acknowledgments

We are very grateful for help in the design of the regression algorithm from W. Miller

and from G.W. Stewart and for useful suggestions from. P.F. Velleman and S. Weisberg

and many others.

STREGRSN.MK5 Page 59 Friday, December 17, 1999 12:23 PM

Chapter 2 References

2-60 MINITAB Users Guide 2

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

MEET MTB UGUIDE 1 SC QREF UGUIDE 2 INDEX CONTENTS HOW TO USE

STREGRSN.MK5 Page 60 Friday, December 17, 1999 12:23 PM

- Real Statistics Examples WorkbookTransféré parRaúl Araque
- PiecewiseRegressionTransféré parLuz Gibbons
- DMMethodsandModels_ChapterSummariesTransféré parJosé Guilherme Macedo Vieira
- ISOM201 Class NotesTransféré parMam'Zell Nitouse
- DK5985_ch03Transféré parcegarcia
- Moore 16Transféré parankuriitb
- Objective Week1 6 July 2012Transféré parPrakhar Gupta
- epi.Transféré parTanaya Wahengbam
- Lecture4 Module2 Anova 1Transféré parApam Benjamin
- cse546wi12LogisticRegressionTransféré parChrisRoby
- Process Capability Analysis in Minitab_ManualTransféré parChandrasekar Muthukumar
- 24 Taguchi DesignTransféré parafonsopilar
- 8 Tsm Chapter08Transféré pardeepakbvs
- BeerTransféré parMark Cliffton Badlon
- Macroeconomics ProjectTransféré parKaushik_n
- TREATMENT BY ALTERNATIVE METHODS OF REGRESSION GAS CHROMATOGRAPHIC RETENTION INDICES OF 35 PYRAZINESTransféré parijsc
- Final Cheat Sheet NadyaTransféré paross3150
- 14-20Transféré pardelustrous
- DS_5Transféré parsandeeptirukoti
- Ppt-17Transféré parmayankpec
- Correlation for Solution Gas -Oil RatioTransféré parandrea
- LAB Human5Transféré parmmmm
- Sample Data Sets for Linear Regression1Transféré parThảo Nguyễn
- 40 Interview Questions on Machine Learning_AnalyticsVidhyaTransféré parKaleab Tekle
- Eco 173 Regression Part iTransféré parNadim Islam
- Applied Regression II (Qixuan Chen) P8110 - Syllabus 2016Transféré parsykim657
- Chapter Simple Linear Regression 1Transféré parMobasher Messi
- An application of the modified Holmberg–PerssonTransféré parManuel Durant Rodriguez
- Forecasting FX RatesTransféré parBer Nanke
- PGP09168_AT&TTransféré parReshma Majumder

- 140425_3. Analysis of VarianceTransféré parchaiyan_05
- CapabilityTransféré pardioruiz
- 24 Taguchi DesignTransféré parafonsopilar
- 2.3.4_Variable_and_Attribute_MSA.pdfTransféré parAlpha Samad
- 140528_six_sigma-Good Normal Distribution GraphTransféré parchaiyan_05
- 140425 2. RegressionTransféré parchaiyan_05
- 140425_1. Basic StatisticsTransféré parchaiyan_05

- 201520_ARTH_207_08Transféré parJay Holland
- 00028740Transféré parFelipe Timóteo
- retecetTransféré parapi-395535298
- The 7 wastes explainedTransféré parChitrakant Deshmukh
- Ethics in Marketing Mini-case Studies[1]Transféré parHarsimranSingh
- 001Transféré parLord Aumar
- Evening Office Lighting Visual Comfort vs. Energy Efficiency vs. PerformanceTransféré parJehan Dhanushke
- Sample Lab ReportTransféré parmsg366809
- Silver Anvil Case AnalysisTransféré parAmanda Chesser
- BLUMENSTOCK_Charity and Reciprocity in Mobile Phone-Based Giving_Evidence From RwandaTransféré parDetteDeCastro
- preview of people uncw edu-nottingh myers-briggs survey pdfTransféré parapi-272869416
- Stats 3Transféré parjaramara
- Tugas Clinical Appraisal dr. M. Haryadi.docxTransféré parRizha Zhetira
- GENDER DIFFERENCES IN COPING BEHAVIOURTransféré parDaniarSiahaan
- The Elements in ReviewTransféré parMr. Kleier
- Main InitialsTransféré parAnagha Bhosale
- Monorail Project ProposalTransféré parAsmawi Mohamad
- thesisTransféré parIristhea Genson Santos
- Assignment Guideline-MPU1223 April2017Transféré parReen Zaza Zareen
- TIG DayTransféré parMohamed Ibrahim
- Bcec3201 What is Building AuditTransféré parazrief14
- The 5As of ActualizationTransféré parSenenAgileo
- 9709_s10_qp_63Transféré parAbrar Jahin
- COating Standard IMOTransféré par070708
- NDT Training for the Oil & Gas IndustryTransféré parmicktunx
- Geographical Skills GlossaryTransféré parDavid Drake
- Sampling TechniquesTransféré parShantiram Dahal
- I m Not Returning to Google After MaternityTransféré parLaw&Crime
- GolfTransféré parBeo Grad
- Ciggarete WarningTransféré parAnonymous dkEBxwVJO

## Bien plus que des documents.

Découvrez tout ce que Scribd a à offrir, dont les livres et les livres audio des principaux éditeurs.

Annulez à tout moment.