Vous êtes sur la page 1sur 10

Multiple Regression Analysis

The principles of Simple Regression Analysis can be extended to two or more


explanatory variables.
With two explanatory variables we get an equation
Y = α + β1X1 + β2X2. . It is customary to write it as Y = β0 +β1X1 + β2X2

As an example, if a hypotensive agent is administered prior to surgery, recovery


time for blood pressure to normal value will depend on the dose of the
hypotensive and the blood pressure during surgery.

This can be modelled as Recovery time = log dose – Surgery B.P.


Categorical Explanatory Variables
 Binary variables are coded 0, 1. For example a
binary variable x1(‘Gender’) is coded male =
0, female = 1.
Recovery time for Blood Pressure
and dose of hypotensive
The scatter plot shows a Recovery time for Blood Pressure and dose of hypotensive
linear relationship. Blood RecvTime = -14.2576 + 8.00772 Logdose

Pressure takes longer to


S = 14.7103 R-Sq = 15.5 % R-Sq(adj) = 13.8 %

70

come back to normal value 60

the larger the dose of the 50

RecvTime
hypotensive. 40

30

There are many outliers 20

because of individual 10 Regression

95% CI

variability of subjects and


0

because of different types of


2.5 3.5 4.5 5.5 6.5

Logdose

surgical operations.
Recovery time for Blood Pressure and
lowest Blood Pressure reading during
surgery
Recovery time for Blood Pressure and lowest B.P. reading during surgery
RecvTime = 34.4692 - 0.183546 Bpsurg

The lower the blood pressure


S = 15.9386 R-Sq = 0.8 % R-Sq(adj) = 0.0 %

70

achieved during surgery the longer 60

the time for it to reach normal value 50

RecvTime
during recovery from anaesthesia 40

30

20
Regression
10
95% CI

50 60 70 80 90

Bpsurg
Multiple Regression Analysis

The effects of the two explanatory variables acting jointly is described by the
equation
Recov. Time = 22.3 + 10.6 Log dose – 0.740 Surg. B.P.

As noted on the scatter plots several observations had outliers or larger


than expected X values.
Categorical Explanatory Variables
 Binary variables are coded 0, 1. For example a variable x1 (Gender) is coded
male = 0 female = 1. Then in the regression equation
Y = β0 + β1x1 + β2x2 when x1 = 1 the value of Y indicates what is obtained for female
gender; and when x1 = 0 the value of Y indicates what is obtained for males.
If we have a nominal variable with more than two categories we have to create a
number of new dummy (also called indicator) binary variables
How many Explanatory Variables?
 As a rule of thumb multiple regression
analysis should not be performed if the total
number of variables is greater than the number
of
subjects ÷ 10.
Analysis
In the computer output look for:

 Adjusted R2. It represents the proportion of variability of Y


explained by the X’s. R2 is adjusted so that models with
different number of variables can be compared.
 The F-test in the ANOVA table. Significant F indicates a
linear relationship between Y and at least one of the X’s.
 The t-test of each partial regression coefficient. Significant t
indicates that the variable in question influences the Y
response while controlling for other explanatory variables.
Usefulness of Scatter Plots - I
 The scatter plot on the right Motality and Water Hardness

illustrates the relationship


Mortal = 1676.36 - 3.22609 Calcium

S = 143.029 R-Sq = 42.9 % R-Sq(adj) = 41.9 %

between water hardness and 2000

mortality in 61 large towns


in England and Wales.

Mortal
1500

 The regression line indicates


inverse relationship between Regression

95% CI

water hardness and 1000

mortality rates.
0 20 40 60 80 100 120 140

Calcium
Usefulness of Scatter Plots - II
Motality and Water Hardness in Towns in the North
Motality and Water Hardness in Towns in the South
MortalN = 1692.31 - 1.93134 CalciumN

100
MortalS = 1522.82 - 2.09272 CalciumS
S = 129.209 R-Sq = 13.6 % R-Sq(adj) = 11.0 %
S = 114.297 R-Sq = 36.3 % R-Sq(adj) = 33.6 %
2000

1600
1900

1800
50 East 1500
MortalN

MortalS
West
1700 1400

0
1600 1300

1500

1400
1st 3rd N orth
Regression

95% CI
1200 Regression

95% CI

1100

0 10
Qtr
20 30 40
Qtr 50

CalciumN
60 70 80 90 100 0 20 40 60

CalciumS
80 100 120 140

 The inverse relationship between water hardness is till


maintained. But
 For towns in the North the regression line is less steep than for
towns in the South indicating that other causes of mortality are
stronger in the North compared to the South.

Vous aimerez peut-être aussi