Vous êtes sur la page 1sur 23

W h at i s M u l ti p le Li n e ar Re g re ssi on ?

M u l ti p le l in e ar re g re s si on i s the mo st c ommon f orm of l in e ar


re g re ssi on an al ysi s. As a p re d i cti ve an al ysi s, th e mul ti pl e li n e ar
re g re ssi on i s u se d to exp l ain th e re l ati on sh i p be tw e e n one c on tin u ou s
d e pe n d en t va ri ab le f rom two or mo re i n de p e nd e n t vari ab l e s. Th e
i nd e p en d e n t vari ab l e s can be con ti n u ou s or ca te g ori cal ( d u mmy cod e d
a s ap p rop ri ate ).
Q ue sti o ns An swe re d :
D o ag e an d IQ s core s e ff e c ti ve l y p re d i ct G PA ?
H ow d o we i gh t, he ig h t, an d ag e expl ai n th e vari an ce a cc ou n te d f or in
BM I sc ore s?
As sum p ti o ns:
D ata mu st b e n ormal l y d i stri bu te d
A li ne ar re l ati on sh i p i s assu me d b e twe e n the d e pe n d en t vari ab le an d
th e in d e pe n de n t va ri ab le s.
Th e re si d u al s a re h omo sce d asti c an d ap p roxi m ate l y re ct an g ul a rsh ap e d .
Ab se n ce of mu l ti col li n e ari ty i s as su me d in the mod e l , so th at th e
i nd e p en d e n t vari ab l e s are n ot too hi g hl y corre l ate d .
At the ce n te r of th e mul ti p le l in e ar re g re s si on an al ysi s i s th e task of
fi tti n g a si n g le l in e th rou g h a sca tte r p l ot. M o re sp e ci fi c all y th e
mu l tip l e li ne ar re g re ssi on fi ts a l in e th rou g h a mul ti - di me n si on al cl ou d
of d at a p oi n ts. Th e si mpl e st f orm h as one d e pe n d en t an d tw o
i nd e p en d e n t vari ab l e s, th e ge ne r al f orm of th e mul ti p le l in e ar
re g re ssi on i s d e fi n ed a s

f or i = 1n .
S ome ti me s th e de p e nd e n t vari ab l e i s al so c all e d en d og e n ou s vari ab l e ,
p rog n osti c vari ab l e or re g re ss an d . The i n de p en d e n t vari ab l e s a re al so
c all e d exog e n ou s vari ab l e s, p re di ct or va ri ab le s or re g re sso rs.

Th e re a re 3 maj o r u se s f or M u l ti p l e Li n e ar Re g re ssi on An al ysi s (1)


c au sal an al ysi s, (2) f ore c asti n g an eff e ct, ( 3) tre n d f ore c asti n g . O the r
th an c orre l a ti on an al ysi s, wh i ch f ocu se s on the st re n g th of the
re l ati on sh i p b e twe e n two or more vari ab l e s, re g re ssi on an al ysi s
a ssu me s a de p en d e n ce or c au sal re l a ti on shi p b e tw ee n on e or mo re
i nd e p en d e n t an d on e de p en d e n t vari ab l e .
Fi rstl y, i t mi g h t be u se d to i de n ti fy the st re n g th of the e ff e ct th at the
i nd e p en d e n t vari ab l e s h ave on a d e pe n d en t vari ab le . Typ i cal
q u e sti on s a re wh at i s th e stre n g th of re l a ti on shi p b e tw ee n d ose an d
e ff e c t, sal e s an d m arke ti n g sp e nd , ag e an d i n come .
S e con d l y, i t can b e u sed to f ore c ast eff e ct s or imp a cts of ch an g e s.
Th at i s mul ti pl e l in e ar re g re ssi on an al ysi s he l p s u s to un d e rstan d h ow
mu ch wi ll th e de p e nd e n t vari ab l e ch an g e , wh e n w e ch an ge th e
i nd e p en d e n t vari ab l e s. Typ i c al qu e sti on s are h ow mu ch ad di ti on al Y
d o I ge t f or on e ad di ti on al u ni t X.
Th i rdl y, mu l ti pl e li ne a r re g re ssi on an al ysi s p re d i cts tre n d s an d f u tu re
val u e s. Th e mul ti p le l in e ar re g re ssi on an al ysi s c an be u se d to g e t
p oi n t e sti mate s. Typ i cal q u e sti on s a re w h at w il l th e p ri ce f or g old b e
i n 6 mon th f rom n ow? W h at i s th e total eff ort f or a task X?
W he n se le c tin g th e mod el f or the mu l tip l e li ne ar re g re ssi on an al ysi s
an oth e r imp ort an t c on si de ra ti on i s the mod e l fi t. Ad d i ng i n de p en d e n t
va ri abl e s to a mu l tip l e li ne ar re g re ssi on mod e l wi ll al w ay s i n cre ase i ts
st ati sti cal vali d i ty, be c au se i t wi l l al w ays exp l ain a bi t mo re va ri an ce
(typ i cal l y exp re sse d a s R).

MULTIPLE REGRESSION USING THE DATA ANALYSIS ADD-IN


This requires the Data Analysis Add-in: see Excel 2007: Access and Activating the
Data Analysis Add-in
The data used are in carsdata.xls
We then create a new variable in cells C2:C6, cubed household size as a regressor.
Then in cell C1 give the the heading CUBED HH SIZE.
(It turns out that for the se data squared HH SIZE has a coefficient of exactly 0.0 the
cube is used).
The spreadsheet cells A1:C6 should look like:

We have regression with an intercept and the regressors HH SIZE and CUBED HH
SIZE
The population regression model is: y = 1 + 2 x2 + 3 x3 + u
It is assumed that the error u is independent with constant variance
(homoskedastic) - see EXCEL LIMITATIONS at the bottom.
We wish to estimate the regression line:

y = b1 + b2 x2 + b3 x3

We do this using the Data analysis Add-in and Regression.

The only change over one-variable regression is to include more than one column in
the Input X Range.
Note, however, that the regressors need to be in contiguous columns (here columns
B and C).
If this is not the case in the original data, then columns need to be copied to get the
regressors in contiguous columns.
Hitting OK we obtain

The regression output has three components:

Regression statistics table

ANOVA table

Regression coefficients table.

INTERPRET REGRESSION STATISTICS TABLE


This is the following output. Of greatest interest is R Square.

Explanation
Multiple R

0.89582
R = square root of R2
8

R Square

0.80250 2
R
8

Adjusted R
Square

0.60501
Adjusted R2 used if more than one x variable
6

Standard Error

0.44440 This is the sample estimate of the standard deviation of


1
the error u

Observations

Number of observations used in the regression (n)

The above gives the overall goodness-of-fit measures:


R2 = 0.8025
Correlation between y and y-hat is 0.8958 (when squared gives 0.8025).
Adjusted R2 = R2 - (1-R2 )*(k-1)/(n-k) = .8025 - .1975*2/2 = 0.6050.
The standard error here refers to the estimated standard deviation of the error term
u.
It is sometimes called the standard error of the regression. It equals sqrt(SSE/(n-k)).
It is not to be confused with the standard error of y itself (from descriptive statistics)
or with the standard errors of the regression coefficients given below.
R2 = 0.8025 means that 80.25% of the variation of yi around ybar (its mean) is
explained by the regressors x2i and x3i.
INTERPRET ANOVA TABLE
An ANOVA table is given. This is often skipped.

df

SS

MS

Significance F

Regression

1.6050

0.8025

4.0635

0.1975

Residual

0.3950

0.1975

Total

2.0

The ANOVA (analysis of variance) table splits the sum of squares into its
components.
Total sums of squares
= Residual (or error) sum of squares + Regression (or explained) sum of squares.
Thus i (yi - ybar)2 = i (yi - yhati)2 + i (yhati - ybar)2
where yhati is the value of yi predicted from the regression line
and ybar is the sample mean of y.
For example:
R2 = 1 - Residual SS / Total SS (general formula for R2)
= 1 - 0.3950 / 1.6050
(from data in the ANOVA table)
= 0.8025
(which equals R2 given in the regression Statistics table).
The column labeled F gives the overall F-test of H0: 2 = 0 and 3 = 0 versus Ha: at
least one of 2 and 3 does not equal zero.
Aside: Excel computes F this as:
F = [Regression SS/(k-1)] / [Residual SS/(n-k)] = [1.6050/2] / [.39498/2] = 4.0635.
The column labeled significance F has the associated P-value.
Since 0.1975 > 0.05, we do not reject H0 at signficance level 0.05.

Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of


regressors including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.
INTERPRET REGRESSION COEFFICIENTS TABLE
The regression output of most interest is the following table of coefficients and
associated output:

Coefficient

St. error t Stat

P-value Lower 95%

Upper 95%

Intercept

0.89655

0.76440 1.1729 0.3616

-2.3924

4.1855

HH SIZE

0.33647

0.42270 0.7960 0.5095

-1.4823

2.1552

CUBED HH SIZE

0.00209

0.01311 0.1594 0.8880

-0.0543

0.0585

Let j denote the population coefficient of the jth regressor (intercept, HH SIZE and
CUBED HH SIZE).
Then

Column "Coefficient" gives the least squares estimates of j.

Column "Standard error" gives the standard errors (i.e.the estimated


standard deviation) of the least squares estimates b j of j.

Column "t Stat" gives the computed t-statistic for H0: j = 0 against Ha: j
0.
This is the coefficient divided by the standard error. It is compared to a t with
(n-k) degrees of freedom where here n = 5 and k = 3.

Column "P-value" gives the p-value for test of H0: j = 0 against Ha: j 0..
This equals the Pr{|t| > t-Stat}where t is a t-distributed random variable with
n-k degrees of freedom and t-Stat is the computed value of the t-statistic
given in the previous column.
Note that this p-value is for a two-sided test. For a one-sided test divide this
p-value by 2 (also checking the sign of the t-Stat).

Columns "Lower 95%" and "Upper 95%" values define a 95% confidence
interval for j.

A simple summary of the above output is that the fitted line is


y = 0.8966 + 0.3365*x + 0.0021*z

CONFIDENCE INTERVALS FOR SLOPE COEFFICIENTS


95% confidence interval for slope coefficient 2 is from Excel output (-1.4823,
2.1552).
Excel computes this as
b2 t_.025(3) se(b2)
= 0.33647 TINV(0.05, 2) 0.42270
= 0.33647 4.303 0.42270
= 0.33647 1.8189
= (-1.4823, 2.1552).
Other confidence intervals can be obtained.
For example, to find 99% confidence intervals: in the Regression dialog box (in the
Data Analysis Add-in),
check the Confidence Level box and set the level to 99%.

TEST HYPOTHESIS OF ZERO SLOPE COEFFICIENT ("TEST OF STATISTICAL


SIGNIFICANCE")
The coefficient of HH SIZE has estimated standard error of 0.4227, t-statistic of
0.7960 and p-value of 0.5095.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
The coefficient of CUBED HH SIZE has estimated standard error of 0.0131, t-statistic
of 0.1594 and p-value of 0.8880.
It is therefore statistically insignificant at significance level = .05 as p > 0.05.
There are 5 observations and 3 regressors (intercept and x) so we use t(5-3)=t(2).
For example, for HH SIZE p = =TDIST(0.796,2,2) = 0.5095.
TEST HYPOTHESIS ON A REGRESSION PARAMETER
Here we test whether HH SIZE has coefficient 2 = 1.0.
Example: H0: 2 = 1.0 against Ha: 2 1.0 at significance level = .05.
Then
t = (b2 - H0 value of 2) / (standard error of b2 )
= (0.33647 - 1.0) / 0.42270
= -1.569.
Using the p-value approach

p-value = TDIST(1.569, 2, 2) = 0.257. [Here n=5 and k=3 so n-k=2].

Do not reject the null hypothesis at level .05 since the p-value is > 0.05.

Using the critical value approach

We computed t = -1.569

The critical value is t_.025(2) = TINV(0.05,2) = 4.303. [Here n=5 and k=3 so
n-k=2].

So do not reject null hypothesis at level .05 since t = |-1.569| < 4.303.

OVERALL TEST OF SIGNIFICANCE OF THE REGRESSION PARAMETERS


We test H0: 2 = 0 and 3 = 0 versus Ha: at least one of 2 and 3 does not equal
zero.
From the ANOVA table the F-test statistic is 4.0635 with p-value of 0.1975.
Since the p-value is not less than 0.05 we do not reject the null hypothesis that the
regression parameters are zero at significance level 0.05.
Conclude that the parameters are jointly statistically insignificant at significance
level 0.05.
Note: Significance F in general = FINV(F, k-1, n-k) where k is the number of
regressors including hte intercept.
Here FINV(4.0635,2,2) = 0.1975.
PREDICTED VALUE OF Y GIVEN REGRESSORS
Consider case where x = 4 in which case CUBED HH SIZE = x^3 = 4^3 = 64.
yhat = b1 + b2 x2 + b3 x3 = 0.88966 + 0.33654 + 0.002164 = 2.37006
EXCEL LIMITATIONS
Excel restricts the number of regressors (only up to 16 regressors ??).
Excel requires that all the regressor variables be in adjoining columns.
You may need to move columns to ensure this.
e.g. If the regressors are in columns B and D you need to copy at least one of
columns B and D so that they are adjacent to each other.
Excel standard errors and t-statistics and p-values are based on the assumption that
the error is independent with constant variance (homoskedastic).
Excel does not provide alternaties, such asheteroskedastic-robust or
autocorrelation-robust standard errors and t-statistics and p-values.

More specialized software such as STATA, EVIEWS, SAS, LIMDEP, PC-TSP, ... is
needed.

Lesson 5: Multiple Linear Regression


Printer-friendly version
Introduction
In this lesson, we make our first (and last?!) major jump in the course. We move
from the simple linear regression model with one predictor to the multiple linear
regression model with two or more predictors. That is, we use the adjective "simple"
to denote that our model has only predictor, and we use the adjective "multiple" to
indicate that our model has at least two predictors.
In the multiple regression setting, because of the potentially large number of
predictors, it is more efficient to use matrices to define the regression model and
the subsequent analyses. This lesson considers some of the more important
multiple regression formulas in matrix form. If you're unsure about any of this, it
may be a good time to take a look at this Matrix Algebra Review.
The good news!
The good news is that everything you learned about the simple linear regression
model extends with at most minor modification to the multiple linear
regression model. Think about it you don't have to forget all of that good stuff
you learned! In particular:

The models have similar "LINE" assumptions. The only real difference is that
whereas in simple linear regression we think of the distribution of errors at a fixed
value of the single predictor, with multiple linear regression we have to think of the
distribution of errors at a fixed set of values for all the predictors. All of the model
checking procedures we learned earlier are useful in the multiple linear regression
framework, although the process becomes more involved since we now have
multiple predictors. We'll explore this issue further in Lesson 7.
The use and interpretation of r2 (which we'll denote R2 in the context of
multiple linear regression) remains the same. However, with multiple linear
regression we can also make use of an "adjusted" R2 value, which is useful for
model building purposes. We'll explore this measure further in Lesson 10.
With a minor generalization of the degrees of freedom, we use t-tests and tintervals for the regression slope coefficients to assess whether a predictor is
significantly linearly related to the response, after controlling for the effects of all
the opther predictors in the model.
With a minor generalization of the degrees of freedom, we use prediction
intervals for predicting an individual response and confidence intervals for
estimating the mean response. We'll explore these further in Lesson 7.
Learning objectives and outcomes
Upon completion of this lesson, you should be able to do the following:

Know how to calculate a confidence interval for a single slope parameter in


the multiple regression setting.

Be able to interpret the coefficients of a multiple regression model.


Understand what the scope of the model is in the multiple regression model.
Understand the calculation and interpretation of R2 in a multiple regression
setting.
Understand the calculation and use of adjusted R2 in a multiple regression
setting.
5.2 - Example on Underground Air Quality
Printer-friendly version
What are the breathing habits of baby birds that live in underground burrows?

Some mammals burrow into the ground to live.


Scientists have found that the quality of the air in these burrows is not as good as
the air aboveground. In fact, some mammals change the way that they breathe in
order to accommodate living in the poor air quality conditions underground.

Some researchers (Colby, et al, 1987) wanted to find out if nestling bank swallows,
which live in underground burrows, also alter how they breathe. The researchers
conducted a randomized experiment on n = 120 nestling bank swallows. In an
underground burrow, they varied the percentage of oxygen at four different levels
(13%, 15%, 17%, and 19%) and the percentage of carbon dioxide at five different
levels (0%, 3%, 4.5%, 6%, and 9%). Under each of the resulting 5 4 = 20
experimental conditions, the researchers observed the total volume of air breathed
per minute for each of 6 nestling bank swallows. In this way, they obtained the
following data (babybirds.txt) on the n = 120 nestling bank swallows:
Response (y): percentage increase in "minute ventilation," (Vent), i.e., total
volume of air breathed per minute.
Potential predictor (x1): percentage of oxygen (O2) in the air the baby birds
breathe.
Potential predictor (x2): percentage of carbon dioxide (CO2) in the air the
baby birds breathe.
Here's a scatter plot matrix of the resulting data obtained by the researchers:

What does this particular scatter plot matrix tell us? Do you buy into the following
statements?

There doesn't appear to be a substantial relationship between minute


ventilation (Vent) and percentage of oxygen (O2).
The relationship between minute ventilation (Vent) and percentage of carbon
dioxide (CO2) appears to be curved and with increasing error variance.
The plot between percentage of oxygen (O2) and percentage of carbon
dioxide (CO2) is the classical appearance of a scatter plot for the experimental
conditions. The plot suggests that there is no correlation at all between the two
variables. You should be able to observe from the plot the 4 levels of O2 and the 5
levels of CO2 that make up the 54 = 20 experimental conditions.
When we have one response variable and only two predictor variables, we have
another sometimes useful plot at our disposal, namely a "three-dimensional
scatter plot:"
If we added the estimated regression equation to the plot, what one word do you
think describes what it would look like? Click the "Draw Plane" button in the above
animation to draw the plot of the estimated regression equation for this data. Does
it make sense that it looks like a "plane?" Incidentally, it is still important to
remember that the plane depicted in the plot is just an estimate of the actual plane
in the population that we are trying to study.
Here is a reasonable "first-order" model with two quantitative predictors that
we could consider when trying to summarize the trend in the data:
yi=(0+1xi1+2xi2)+iyi=(0+1xi1+2xi2)+i
where:

yi is percentage of minute ventilation of nestling bank swallow i

xi1 is percentage of oxygen exposed to nestling bank swallow i


xi2 is percentage of carbon dioxide exposed to nestling bank swallow i
and the independent error terms i follow a normal distribution with mean 0
and equal variance 2.
The adjective "first-order" is used to characterize a model in which the highest
power on all of the predictor terms is one. In this case, the power on xi1, although
typically not shown, is one. And, the power on xi2 is also one, although not shown.
Therefore, the model we formulated can be classified as a "first-order model." An
example of a second-order model would
be y=0+1x+2x2+y=0+1x+2x2+.
Do you have your research questions ready? How about the following set of
questions? (Do the procedures that appear in parentheses seem appropriate in
answering the research question?)

Is oxygen related to minute ventilation, after taking into account carbon


dioxide? (Conduct a hypothesis test for testing whether the O2 slope parameter is
0.)
Is carbon dioxide related to minute ventilation, after taking into account
oxygen? (Conduct a hypothesis test for testing whether the CO2 slope parameter is
0.)
What is the mean minute ventilation of all nestling bank swallows whose
breathing air is comprised of 15% oxygen and 5% carbon dioxide? (Calculate and
interpret a confidence interval for the mean response.)
Here's the output we obtain when we ask Minitab to estimate the multiple
regression model we formulated above:

What do we learn from the Minitab output?

Only 26.82% of the variation in minute ventilation is reduced by taking into


account the percentages of oxygen and carbon dioxide.
The P-values for the t-tests appearing in the table of estimates suggest that
the slope parameter for carbon dioxide level (P < 0.001) is significantly different
from 0, while the slope parameter for oxygen level (P = 0.408) is not. Does this
conclusion appear consistent with the above scatter plot matrix and the threedimensional plot? Yes!
The P-value for the analysis of variance F-test (P < 0.001) suggests that the
model containing oxygen and carbon dioxide levels is more useful in predicting
minute ventilation than not taking into account the two predictors. (Again, the F-test
does not tell us that the model with the two predictors is the best model! For one
thing, we have performed no model checking yet!)
5.3 - The Multiple Linear Regression Model
Notation for the Population Model
A population model for a multiple regression model that relates a y-variable
to p -1 predictor variables is written as
yi=0+1xi,1+2xi,2++p1xi,p1+i.yi=0+1xi,1+2xi,2+
+p1xi,p1+i.

We assume that the ii have a normal distribution with mean 0 and constant
variance 22. These are the same assumptions that we used in simple regression
with one x-variable.
The subscript i refers to the ithith individual or unit in the population. In the
notation for the x-variables, the subscript following isimply denotes which x-variable
it is.
Estimates of the Model Parameters
The estimates of the coefficients are the values that minimize the sum of
squared errors for the sample. The exact formula for this is given in the next section
on matrix notation.
The letter b is used to represent a sample estimate of a coefficient.
Thus b0b0 is the sample estimate of 00, b1b1 is the sample estimate of11, and
so on.
MSE=SSEnpMSE=SSEnp estimates 22, the variance of the errors. In the
formula, n = sample size, p = number of coefficients in the model (including the
intercept) and SSESSE = sum of squared errors. Notice that for simple linear
regression p = 2. Thus, we get the formula for MSE that we introduced in that
context of one predictor.
S=MSES=MSE estimates and is known as the regression standard
error or the residual standard error.
In the case of two predictors, the estimated regression equation yields a
plane (as opposed to a line in the simple linear regression setting). For more than
two predictors, the estimated regression equation yields a hyperplane.
Interpretation of the Model Parameters
Each coefficient represents the change in the mean response, E(y), per
unit increase in the associated predictor variable when all the other predictors are
held constant.
For example, 11 represents the change in the mean response, E(y), per
unit increase in x1x1 when x2x2, x3x3, ..., xp1xp1 are held constant.
The intercept term, 00, represents the mean response, E(y), when all the
predictors x1x1, x2x2, ..., xp1xp1, are all zero (which may or may not have any
practical meaning).
Predicted Values and Residuals
A predicted value is calculated as y^i=b0+b1xi,1+b2xi,2+
+bp1xi,p1y^i=b0+b1xi,1+b2xi,2++bp1xi,p1, where the b values come
from statistical software and the x-values are specified by us.
A residual (error) term is calculated as ei=yiy^iei=yiy^i, the difference
between an actual and a predicted value of y.
A plot of residuals versus predicted values ideally should resemble a
horizontal random band. Departures from this form indicates difficulties with the
model and/or data.
Other residual analyses can be done exactly as we did in simple regression.
For instance, we might wish to examine a normal probability plot (NPP) of the
residuals. Additional plots to consider are plots of residuals versus each x-variable
separately. This might help us identify sources of curvature or nonconstant variance.
We'll explore this further in Lesson 7.
ANOVA Table
Source

df

SS

MS

Regres
sion

p1

SSR

MSR = SSR / (p
1)

Error

np

SSE

MSE = SSE /
(n p)

Total

n1

SSTO

MSR / MSE

Coefficient of Determination, R-squared, and Adjusted R-squared


As in simple linear
regression, R2=SSRSSTO=1SSESSTOR2=SSRSSTO=1SSESSTO, and represents
the proportion of variation in yy (about its mean) "explained" by the multiple linear
regression model with predictors, x1,x2,...x1,x2,...
Adjusted R2=1(n1np)(1R2)R2=1(n1np)(1R2), and, while it has
no practical interpretation, is useful for model building purposes (simply stated,
when comparing two models used to predict the same response variable, we
generally prefer the model with the higher value of adjusted R2R2 see Lesson 10
for more details).
Significance Testing of Each Variable
Within a multiple regression model, we may want to know whether a particular xvariable is making a useful contribution to the model. That is, given the presence of
the other x-variables in the model, does a particular x-variable help us predict or
explain the y-variable? For instance, suppose that we have three x-variables in the
model. The general structure of the model could be
y=0+1x1+2x2+3x3+.y=0+1x1+2x2+3x3+.
As an example, to determine whether variable x1x1 is a useful predictor variable in
this model, we could test
H0HA:1=0:10.H0:1=0HA:10.
If the null hypothesis above were the case, then a change in the value
of x1x1 would not change y, so y and x1x1 are not linearly related. Also, we would
still be left with variables x2x2 and x3x3 being present in the model. When we
cannot reject the null hypothesis above, we should say that we do not need
variable x1x1 in the model given that variables x2x2 and x3x3 will remain in the
model. In general, the interpretation of a slope in multiple regression can be tricky.
Correlations among the predictors can change the slope values dramatically from
what they would be in separate simple regressions.
To carry out the test, statistical software will report p-values for all coefficients in
the model. Each p-value will be based on a t-statistic calculated as
t=t= (sample coefficient - hypothesized value) / standard error of coefficient.
For our example above, the t-statistic is:
t=b10se(b1)=b1se(b1).t=b10se(b1)=b1se(b1).
Note that the hypothesized value is usually just 0, so this portion of the formula is
often omitted.
Multiple linear regression, in contrast to simple linear regression, involves multiple
predictors and so testing each variable can quickly become complicated. For
example, suppose we apply two separate tests for two predictors,
say x1x1 and x2x2, and both tests have high p-values. One test suggests x1x1 is
not needed in a model with all the other predictors included, while the other test

suggests x2x2 is not needed in a model with all the other predictors included. But,
this doesn't necessarily mean that both x1x1 and x2x2 are not needed in a model
with all the other predictors included. It may well turn out that we would do better
to omit either x1x1 or x2x2 from the model, but not both. How then do we
determine what to do? We'll explore this issue further in Lesson 6.
5.4 - A Matrix Formulation of the Multiple Regression Model
Printer-friendly version
Note: This portion of the lesson is most important for those students who will
continue studying statistics after taking Stat 501. We will only rarely use the
material within the remainder of this course. It is, however, particularly important
for students who plan on taking Stat 502, 503, 504, or 505.
A matrix formulation of the multiple regression model
In the multiple regression setting, because of the potentially large number of
predictors, it is more efficient to use matrices to define the regression model and
the subsequent analyses. Here, we review basic matrix algebra, as well as learn
some of the more important multiple regression formulas in matrix form.
As always, let's start with the simple case first. Consider the following simple linear
regression function:
yi=0+1xi+ifor i=1,...,nyi=0+1xi+ifor i=1,...,n
If we actually let i = 1, ..., n, we see that we obtain n equations:
y1y2yn=0+1x1+1=0+1x2+2=0+1xn+ny1=0+1x1+1y2=0+1x
2+2yn=0+1xn+n
Well, that's a pretty inefficient way of writing it all out! As you can see, there is a
pattern that emerges. By taking advantage of this pattern, we can instead formulate
the above simple linear regression function in matrix notation:

That is, instead of writing out the n equations, using matrix notation, our simple
linear regression function reduces to a short and simple statement:
Y=X+Y=X+
Now, what does this statement mean? Well, here's the answer:

X is an n 2 matrix.

Y is an n 1 column vector, is a 2 1 column vector, and is an n 1


column vector.
The matrix X and vector are multiplied together using the techniques
of matrix multiplication.
And, the vector X is added to the vector using the techniques of matrix
addition.
Now, that might not mean anything to you, if you've never studied matrix algebra
or if you have and you forgot it all! So, let's start with a quick and basic review.
Definition of a matrix
An r c matrix is a rectangular array of symbols or numbers arranged in r rows
and c columns. A matrix is almost always denoted by a single capital letter in
boldface type.
Here are three examples of simple matrices. The matrix A is a 2 2 square
matrix containing numbers:
A=[1623]A=[1263]
The matrix B is a 5 3 matrix containing numbers:
B=1111180926571403.43.12.52.81.9B=[1803.41923.11652.51712.
81401.9]
And, the matrix X is a 6 3 matrix containing a column of 1's and two columns of
various x variables:
X=111111x11x21x31x41x51x61x12x22x32x42x52x62X=[1x11
x121x21x221x31x321x41x421x51x521x61x62]
Definition of a vector and a scalar
A column vector is an r 1 matrix, that is, a matrix with only one column. A
vector is almost often denoted by a single lowercase letter in boldface type. The
following vector q is a 3 1 column vector containing numbers:
q=258q=[258]
A row vector is an 1 c matrix, that is, a matrix with only one row. The vector h is
a 1 4 row vector containing numbers:
h=[21463290]h=[21463290]
A 1 1 "matrix" is called a scalar, but it's just an ordinary number, such as 29
or 2.
Matrix multiplication
Recall that X that appears in the regression function:
Y=X+Y=X+

is an example of matrix multiplication. Now, there are some restrictions you can't
just multiply any two old matrices together. Two matrices can be multiplied
together only if the number of columns of the first matrix equals the number of
rows of the second matrix. Then, when you multiply the two matrices:
the number of rows of the resulting matrix equals the number of rows of the
first matrix, and

the number of columns of the resulting matrix equals the number of columns
of the second matrix.
For example, if A is a 2 3 matrix and B is a 3 5 matrix, then the matrix
multiplication AB is possible. The resulting matrix C = AB has 2 rows and 5
columns. That is, C is a 2 5 matrix. Note that the matrix multiplication BA is not
possible.
For another example, if X is an n p matrix and is a p 1 column vector, then
the matrix multiplication X is possible. The resulting matrix X has n rows and 1
column. That is, X is an n 1 column vector.
Okay, now that we know when we can multiply two matrices together, how do we
do it? Here's the basic rule for multiplying A by B to get C = AB:
The entry in the ith row and jth column of C is the inner product that is, elementby-element products added together of the ith row of A with the jth column of B.
For example:
C=AB=[189172]356249176538=[904110138106278859]C=AB=[197812]
[321554736968]=[901011068841382759]
That is, the entry in the first row and first column of C, denoted c11, is obtained
by:
c11=1(3)+9(5)+7(6)=90c11=1(3)+9(5)+7(6)=90
And, the entry in the first row and second column of C, denoted c12, is obtained
by:
c12=1(2)+9(4)+7(9)=101c12=1(2)+9(4)+7(9)=101
And, the entry in the second row and third column of C, denoted c23, is obtained
by:
c23=8(1)+1(7)+2(6)=27c23=8(1)+1(7)+2(6)=27
You might convince yourself that the remaining five elements of C have been
obtained correctly.
Matrix addition
Recall that X + that appears in the regression function:
Y=X+Y=X+

is an example of matrix addition. Again, there are some restrictions you can't just
add any two old matrices together. Two matrices can be added together
only if they have the same number of rows and columns. Then, to add two
matrices, simply add the corresponding elements of the two matrices. That is:
Add the entry in the first row, first column of the first matrix with the entry in
the first row, first column of the second matrix.
Add the entry in the first row, second column of the first matrix with the entry
in the first row, second column of the second matrix.
And, so on.
For example:
C=A+B=213485176+792531218=91059561814C=A+B=[24
1187356]+[752931218]=[99110585614]

That is, the entry in the first row and first column of C, denoted c11, is obtained
by:
c11=2+7=9c11=2+7=9
And, the entry in the first row and second column of C, denoted c12, is obtained
by:
c12=4+5=9c12=4+5=9
You might convince yourself that the remaining seven elements of C have been
obtained correctly.

Least squares estimates in matrix notation


Here's the punchline: the p 1 vector containing the estimates of the p parameters
of the regression function can be shown to equal:
b=b0b1bp1=(XX)1XYb=[b0b1bp1]=(XX)1XY
where:

(X'X)-1 is the inverse of the X'X matrix, and


X' is the transpose of the X matrix.
As before, that might not mean anything to you, if you've never studied matrix
algebra or if you have and you forgot it all! So, let's go off and review inverses
and transposes of matrices.
Definition of the transpose of a matrix
The transpose of a matrix A is a matrix, denoted A' or AT, whose rows are the
columns of A and whose columns are the rows of A all in the same order. For
example, the transpose of the 3 2 matrix A:
A=147589A=[154879]
is the 2 3 matrix A':
A=AT=[154879]A=AT=[147589]
And, since the X matrix in the simple linear regression setting is:
X=111x1x2xnX=[1x11x21xn]
the X'X matrix in the simple linear regression setting must be:
XX=[1x11x21xn]111x1x2xn=[nni=1xini=1xini=1x2i]X
X=[111x1x2xn][1x11x2xn1]=[ni=1nxii=1nxii=1nxi2]
Definition of the identity matrix
The square n n identity matrix, denoted In, is a matrix with 1's on the diagonal
and 0's elsewhere. For example, the 2 2 identity matrix is:
I2=[1001]I2=[1001]
The identity matrix plays the same role as the number 1 in ordinary arithmetic:
[9476][1001]=[9476][9746][1001]=[9746]

That is, when you multiply a matrix by the identity, you get the same matrix back.
Definition of the inverse of a matrix
The inverse A-1 of a square (!!) matrix A is the unique matrix such that:
A1A=I=AA1A1A=I=AA1
That is, the inverse of A is the matrix A-1 that you have to multiply A by in order to
obtain the identity matrix I. Note that I am not just trying to be cute by including (!!)
in that first sentence. The inverse only exists for square matrices!
Now, finding inverses is a really messy venture. The good news is that we'll always
let computers find the inverses for us. In fact, we won't even know that Minitab is
finding inverses behind the scenes!
An example
Ugh! All of these definitions! Let's take a look at an example just to convince
ourselves that, yes, indeed the least squares estimates are obtained by the
following matrix formula:
b=b0b1bp1=(XX)1XYb=[b0b1bp1]=(XX)1XY

Let's consider the data in soapsuds.txt, in which


the height of suds (y = suds) in a standard dishpan was recorded for various
amounts of soap (x = soap, in grams) (Draper and Smith, 1998, p. 108). Using
Minitab to fit the simple linear regression model to these data, we obtain:

Let's see if we can obtain the same answer using the above matrix formula. We
previously showed that:
XX=[nni=1xini=1xini=1x2i]XX=[ni=1nxii=1nxii=1nxi2]
Using the calculator function in Minitab, we can easily calculate some parts of this
formula:

That is, the 2 2 matrix X'X is:


XX=[738.538.5218.75]XX=[738.538.5218.75]
And, the 2 1 column vector X'Y is:
XY=[ni=1yini=1xiyi]=[3471975]XY=[i=1nyii=1nxiyi]=[3471975]
So, we've determined X'X and X'Y. Now, all we need to do is to find the inverse
(X'X)-1. As mentioned before, it is very messy to determine inverses by hand.
Letting computer software do the dirty work for us, it can be shown that the inverse
of X'X is:
(XX)1=[4.46430.785710.785710.14286](XX)
1=[4.46430.785710.785710.14286]
And so, putting all of our work together, we obtain the least squares estimates:
b=(XX)1XY=[4.46430.785710.785710.14286][3471975]=[2.679.51]b=(XX)
1XY=[4.46430.785710.785710.14286][3471975]=[2.679.51]
That is, the estimated intercept is b0 = -2.67 and the estimated slope is b1 = 9.51.
Aha! Our estimates are the same as those reported by Minitab:

within rounding error!


Further Matrix Results for Multiple Linear Regression
Chapter 5 and the first six sections of Chapter 6 in the course textbook contain
further discussion of the matrix formulation of linear regression, including matrix
notation for fitted values, residuals, sums of squares, and inferences about
regression parameters. One important matrix that appears in many formulas is the
so-called "hat matrix," H=X(XX)1XH=X(XX)1X, since it puts the hat on YY!
Linear dependence
There is just one more really critical topic that we should address here, and that is
linear dependence. We say that the columns of the matrix A:
A=1232164812163A=[1241218636123]

are linearly dependent, since (at least) one of the columns can be written as a
linear combination of another, namely the third column is 4 the first column. If
none of the columns can be written as a linear combination of the other columns,
then we say the columns arelinearly independent.
Unfortunately, linear dependence is not always obvious. For example, the columns
in the following matrix A:
A=123432111A=[141231321]
are linearly dependent, because the first column plus the second column equals 5
the third column.
Now, why should we care about linear dependence? Because the inverse of a
square matrix exists only if the columns are linearly independent. Since the vector
of regression estimates b depends on (X'X)-1, the parameter estimates b0, b1, and so
on cannot be uniquely determined if some of the columns of X are linearly
dependent! That is, if the columns of your X matrix that is, two or more of your
predictor variables are linearly dependent (or nearly so), you will run into trouble
when trying to estimate the regression equation.
For example, suppose for some strange reason we multiplied the predictor
variable soap by 2 in the dataset soapsuds.txt. That is, we'd have two predictor
variables, say soap1 (which is the original soap) and soap2 (which is 2 the original
soap):

If we tried to regress y = suds on x1 = soap1 and x2 = soap2, we see that Minitab


spits out trouble:

In short, the first moral of the story is "don't collect your data in such a way that the
predictor variables are perfectly correlated." And, the second moral of the story is "if
your software package reports an error message concerning high correlation among
your predictor variables, then think about linear dependence and how to get rid of
it."

Vous aimerez peut-être aussi