Vous êtes sur la page 1sur 90

SW388R7

Data Analysis &


Computers II

Hierarchical Multiple Regression

Slide 1

Differences between hierarchical


and standard multiple regression
Sample problem
Steps in hierarchical multiple regression

Compu
ters II

Differences between standard and hierarchical


multiple regression

Slide 2

Standard multiple regression is used to evaluate the relationship


between a set of independent variables and a dependent variable.

Hierarchical regression is used to evaluate the relationship between a


set of independent variables and the dependent variable, controlling
for or taking into account the impact of a different set of independent
variables on the dependent variable.

For example, a research hypothesis might state that there are


differences between the average salary for male employees and female
employees, even after we take into account differences between
education levels and prior work experience.

In hierarchical regression, the independent variables are entered into


the analysis in a sequence of blocks, or groups that may contain one or
more variables. In the example above, education and work experience
would be entered in the first block and sex would be entered in the
second block.

Compu
ters II
Differences in statistical results

Slide 3

SPSS shows the statistical results (Model Summary, ANOVA,


Coefficients, etc.) as each block of variables is entered into the
analysis.

In addition (if requested), SPSS prints and tests the key statistic
used in evaluating the hierarchical hypothesis: change in R for
each additional block of variables.

The null hypothesis for the addition of each block of variables


to the analysis is that the change in R (contribution to the
explanation of the variance in the dependent variable) is zero.

If the null hypothesis is rejected, then our interpretation


indicates that the variables in block 2 had a relationship to the
dependent variable, after controlling for the relationship of the
block 1 variables to the dependent variable.

Compu
ters II
Variations in hierarchical regression - 1

Slide 4

A hierarchical regression can have as many blocks as there are


independent variables, i.e. the analyst can specify a hypothesis
that specifies an exact order of entry for variables.

A more common hierarchical regression specifies two blocks of


variables: a set of control variables entered in the first block
and a set of predictor variables entered in the second block.

Control variables are often demographics which are thought to


make a difference in scores on the dependent variable.
Predictors are the variables in whose effect our research
question is really interested, but whose effect we want to
separate out from the control variables.

Compu
ters II
Variations in hierarchical regression - 2

Slide 5

Support for a hierarchical hypothesis would be expected to


require statistical significance for the addition of each block of
variables.

However, many times, we want to exclude the effect of blocks


of variables previously entered into the analysis, whether or not
a previous block was statistically significant. The analysis is
interested in obtaining the best indicator of the effect of the
predictor variables. The statistical significance of previously
entered variables is not interpreted.

The latter strategy is the one that we will employ in our


problems.

Compu
ters II

Differences in solving hierarchical regression


problems

Slide 6

R change, i.e. the increase when the predictors variables are


added to the analysis is interpreted rather than the overall R
for the model with all variables entered.

In the interpretation of individual relationships, the


relationship between the predictors and the dependent variable
is presented.

Similarly, in the validation analysis, we are only concerned with


verifying the significance of the predictor variables.
Differences in control variables are ignored.

Compu
ters II
Slide 7

A hierarchical regression problem

The problem asks us to examine the feasibility


of doing multiple regression to evaluate the
relationships among these variables. The
inclusion of the controlling for phrase
indicates that this is a hierarchical multiple
regression problem.
Multiple regression is feasible if the dependent
variable is metric and the independent
variables (both predictors and controls) are
metric or dichotomous, and the available data
is sufficient to satisfy the sample size
requirements.

Compu
ters II
Slide 8

Level of measurement - answer


Hierarchical multiple regression
requires that the dependent
variable be metric and the
independent variables be metric
or dichotomous.

"Spouse's highest academic degree" [spdeg] is ordinal, satisfying the


metric level of measurement requirement for the dependent variable, if
we follow the convention of treating ordinal level variables as metric.
Since some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.
"Age" [age] is interval, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
"Highest academic degree" [degree] is ordinal, satisfying the metric or
dichotomous level of measurement requirement for independent
variables, if we follow the convention of treating ordinal level variables
as metric. Since some data analysts do not agree with this convention, a
note of caution should be included in our interpretation.
"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
True with caution
is the correct
answer.

Compu
ters II
Slide 9

Sample size - question

The second question asks about the


sample size requirements for multiple
regression.
To answer this question, we will run the
initial or baseline multiple regression to
obtain some basic data about the
problem and solution.

ters II
Slide
10

The baseline regression - 1

After we check for violations of


assumptions and outliers, we will
make a decision whether we should
interpret the model that includes the
transformed variables and omits
outliers (the revised model), or
whether we will interpret the model
that uses the untransformed
variables and includes all cases
including the outliers (the baseline
model).
In order to make this decision, we
run the baseline regression before
we examine assumptions and
outliers, and record the R for the
baseline model. If using
transformations and outliers
substantially improves the analysis
(a 2% increase in R), we interpret
the revised model. If the increase is
smaller, we interpret the baseline
model.

To run the baseline


model, select Regression
| Linear from the
Analyze model.

ters II
Slide
11

The baseline regression - 2


First, move the
dependent variable spdeg
to the Dependent text
box.

Second, move the


independent variables to
control for age and sex
to the Independent(s)
list box.

Fourth, click on the Next


button to tell SPSS to add
another block of variables
to the regression analysis.

Third, select the method for


entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.

ters II
Slide
12

The baseline regression - 3


SPSS identifies that we
will now be adding
variables to a second
block.

First, move the


predictor independent
variable degree to the
Independent(s) list box
for block 2.

Second, click on the


Statistics button to
specify the statistics
options that we want.

ters II
Slide
13

The baseline regression - 4


First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.

Second, mark the checkboxes for Model


Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Fifth, click on
the Continue
button to close
the dialog box.

Third, mark the


Durbin-Watson
statistic on the
Residuals panel.

Fourth, mark the


Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.

ters II
Slide
14

The baseline regression - 5

Click on the OK
button to
request the
regression
output.

ters II
Slide
15

R for the baseline model


The R of 0.281 is the benchmark
that we will use to evaluate the
utility of transformations and the
elimination of outliers.

Prior to any transformations of variables


to satisfy the assumptions of multiple
regression or the removal of outliers,
the proportion of variance in the
dependent variable explained by the
independent variables (R) was 28.1%.
The relationship is statistically
significant, though we would not stop if
it were not significant because the lack
of significance may be a consequence of
violation of assumptions or the inclusion
of outliers.

ters II
Slide
16

Sample size evidence and answer

Descriptive Statistics
Mean
SPOUSES HIGHEST
DEGREE
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Std. Deviation

1.78

1.281

136

45.80
1.60
1.65

14.534
.491
1.220

136
136
136

Hierarchical multiple regression requires that the


minimum ratio of valid cases to independent
variables be at least 5 to 1. The ratio of valid
cases (136) to number of independent variables
(3) was 45.3 to 1, which was equal to or greater
than the minimum ratio. The requirement for a
minimum ratio of cases to independent variables
was satisfied.
In addition, the ratio of 45.3 to 1 satisfied the
preferred ratio of 15 cases per independent
variable.
The answer to the question is true.

ters II
Slide
17

Assumption of normality for the dependent


variable - question

Having satisfied the level of measurement


and sample size requirements, we turn our
attention to conformity with three of the
assumptions of multiple regression:
normality, linearity, and homoscedasticity.
First, we will evaluate the assumption of
normality for the dependent variable.

ters II
Slide
18

Run the script to test normality


First, move the variables to the
list boxes based on the role that
the variable plays in the analysis
and its level of measurement.

Second, click on the Normality option


button to request that SPSS produce
the output needed to evaluate the
assumption of normality.

Fourth, click on
the OK button to
produce the output.
Third, mark the checkboxes
for the transformations that
we want to test in evaluating
the assumption.

ters II
Slide
19

Normality of the dependent variable:


spouses highest degree
Descriptives
SPOUSES
HIGHEST DEGREE

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

The dependent variable "spouse's highest


academic degree" [spdeg] did not satisfy the
criteria for a normal distribution. The
skewness of the distribution (0.573) was
between -1.0 and +1.0, but the kurtosis of
the distribution (-1.051) fell outside the
range from -1.0 to +1.0.

Statistic
1.78
1.56

Std. Error
.110

2.00
1.75
1.00
1.640
1.281
0
4
4
2.00
.573
-1.051

.208
.413

The answer to the


question is false.

ters II
Slide
20

Normality of the transformed dependent variable:


spouses highest degree

The "log of spouse's highest academic degree


[LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.091) was between -1.0 and +1.0 and
the kurtosis of the distribution (-0.678) was between
-1.0 and +1.0.
The "log of spouse's highest academic degree
[LGSPDEG=LG10(1+SPDEG)]" was substituted for
"spouse's highest academic degree" [spdeg] in the
analysis.

ters II
Slide
21

Normality of the control variable: age

Next, we will evaluate the


assumption of normality for
the control variable, age.

ters II
Slide
22

Normality of the control variable: age


Descriptives
AGE OF RESPONDENT Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

The independent variable "age" [age]


satisfied the criteria for a normal distribution.
The skewness of the distribution (0.595) was
between -1.0 and +1.0 and the kurtosis of
the distribution (-0.351) was between -1.0
and +1.0.

Statistic
45.99
43.98

Std. Error
1.023

48.00
45.31
43.50
282.465
16.807
19
89
70
24.00
.595
-.351

.148
.295

ters II
Slide
23

Normality of the predictor variable:


highest academic degree

Next, we will evaluate the


assumption of normality for
the predictor variable,
highest academic degree.

ters II
Slide
24

Normality of the predictor variable:


respondents highest academic degree
Descriptives
RS HIGHEST DEGREE

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

The independent variable "highest academic


degree" [degree] satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.948) was between -1.0 and
+1.0 and the kurtosis of the distribution
(-0.051) was between -1.0 and +1.0.

Statistic
1.41
1.27

Std. Error
.071

1.55
1.35
1.00
1.341
1.158
0
4
4
1.00
.948
-.051

.149
.297

ters II
Slide
25

Assumption of linearity for spouses degree and


respondents degree - question

The metric independent variables satisfied the criteria for


normality, but the dependent variable did not.
However, the logarithmic transformation of "spouse's highest
academic degree" produced a variable that was normally
distributed and will be tested as a substitute in the analysis.
The script for linearity will support our using the transformed
dependent variable without having to add it to the data set.

ters II
Slide
26

Run the script to test linearity

When the linearity option is


selected, a default set of
transformations to test is marked.

First, click on the Linearity


option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.

Second , since we have decided to


use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.

Third, click on the


OK button to
produce the output.

ters II
Slide
27

Linearity test: spouses highest degree and


respondents highest academic degree

The correlation between "highest


academic degree" and logarithmic
transformation of "spouse's highest
academic degree" was statistically
significant (r=.519, p<0.001). A
linear relationship exists between
these variables.

ters II
Slide
28

Linearity test: spouses highest degree and


respondents age

The assessment of the linear


relationship between logarithmic
transformation of "spouse's highest
academic degree"
[LGSPDEG=LG10(1+SPDEG)] and "age"
[age] indicated that the relationship
was weak, rather than nonlinear.
Neither the correlation between
logarithmic transformation of "spouse's
highest academic degree" and "age"
nor the correlations with the
transformations were statistically
significant.
The correlation between "age" and
logarithmic transformation of "spouse's
highest academic degree" was not
statistically significant (r=.009,
p=0.921). The correlations for the
transformations were: the logarithmic
transformation (r=.061, p=0.482); the
square root transformation (r=.034,
p=0.692); the inverse transformation
(r=.112, p=0.194); and the square
transformation (r=-.037, p=0.668)

ters II
Slide
29

Assumption of homogeneity of variance - question

Sex is the only dichotomous


independent variable in the analysis.
We will test if for homogeneity of
variance using the logarithmic
transformation of the dependent
variable which we have already
decided to use.

ters II
Slide
30

Run the script to test


homogeneity of variance

When the homogeneity of variance


option is selected, a default set of
transformations to test is marked.
First, click on the
Homogeneity of variance
option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.

Second , since we have decided to


use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.

Third, click on the


OK button to
produce the output.

ters II
Slide
31

Assumption of homogeneity of variance evidence and


answer

Based on the Levene Test, the


variance in "log of spouse's highest
academic degree
[LGSPDEG=LG10(1+SPDEG)]" was
homogeneous for the categories of
"sex" [sex]. The probability
associated with the Levene statistic
(0.687) was p=0.409, greater than
the level of significance for testing
assumptions (0.01). The null
hypothesis that the group variances
were equal was not rejected.
The homogeneity of variance
assumption was satisfied. The
answer to the question is true.

ters II
Slide
32

Including the transformed variable in the data set - 1

In the evaluation for normality, we resolved a problem with


normality for spouses highest academic degree with a
logarithmic transformation. We need to add this transformed
variable to the data set, so that we can incorporate it in our
detection of outliers.
We can use the script to compute transformed variables and add
them to the data set.
We select an assumption to test (Normality is the easiest), mark
the check box for the transformation we want to retain, and
clear the check box "Delete variables created in this analysis."

NOTE: this will leave the transformed


variable in the data set. To remove it,
you can delete the column or close the
data set without saving.

ters II
Slide
33

Including the transformed variable in the data set - 2


First, move the variable
SPDEG to the list box for
the dependent variable.

Second, click on the


Normality option button to
request that SPSS do the test
for normality, including the
transformation we will mark.

Third, mark the transformation


we want to retain (Logarithmic)
and clear the checkboxes for
the other transformations.

Fourth, clear the check


box for the option
"Delete variables
created in this analysis".

Fifth, click on
the OK button.

ters II
Slide
34

Including the transformed variable in the data set - 3

If we scroll to the rightmost


column in the data editor, we
see than the log of SPDEG in
included in the data set.

ters II
Slide
35

Including the transformed variable in the list of


variables in the script - 1

If we scroll to the bottom of


the list of variables, we see
that the log of SPDEG is not
included in the list of available
variables.

To tell the script to add the


log of SPDEG to the list of
variables in the script, click
on the Reset button. This
will start the script over
again, with a new list of
variables from the data set.

ters II
Slide
36

Including the transformed variable in the list of


variables in the script - 2

If we scroll to the bottom of


the list of variables now, we
see that the log of SPDEG is
included in the list of available
variables.

ters II
Slide
37

Detection of outliers - question

In multiple regression, an outlier in the solution


can be defined as a case that has a large residual
because the equation did a poor job of predicting
its value.
We will run the regression again incorporating any
transformations we have decided to test, and have
SPSS compute the standardized residual for each
case. Cases with a standardized residual larger
than +/- 3.0 will be treated as outliers.

ters II
Slide
38

The revised regression using transformations

To run the regression to


detect outliers, select the
Linear Regression command
from the menu that drops
down when you click on the
Dialog Recall button.

ters II
Slide
39

The revised regression:


substituting transformed variables

Remove the variable SPDEG


from the list of independent
variables. Include the log of
the variable, LGSPDEG.

Click on the Statistics


button to select statistics
we will need for the
analysis.

ters II
Slide
40

The revised regression: selecting statistics


First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.

Second, mark the checkboxes for Model


Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Third, mark the


Durbin-Watson
statistic on the
Residuals panel.

Fourth, mark the


checkbox for the
Casewise diagnostics,
which will be used to
identify outliers.

Sixth, click on
the Continue
button to close
the dialog box.

Fifth, mark the


Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.

ters II
Slide
41

The revised regression: saving standardized residuals

Mark the checkbox for


Standardized Residuals so
that SPSS saves a new
variable in the data editor.
We will use this variable to
omit outliers in the revised
regression model.

Click on the
Continue
button to close
the dialog box.

ters II
Slide
42

The revised regression: obtaining output

Click on the OK
button to obtain
the output for the
revised model.

ters II
Slide
43

Outliers in the analysis


If cases have a standardized residual larger than +/- 3.0,
SPSS creates a table titled Casewise Diagnostics, in which it
lists the cases and values that results in their being an outlier.
If there are no outliers, SPSS does not print the Casewise
Diagnostics table. There was no table for this problem. The
answer to the question is true.

We can verify that all standardized residuals


were less than +/- 3.0 by looking the
minimum and maximum standardized
residuals in the table of Residual Statistics.
Both the minimum and maximum fell in the
acceptable range.
Since there were no outliers,
we can use the regression just
completed to make our decision
about which model to interpret.

ters II
Slide
44

Selecting the model to interpret - question

Since there were no outliers, we can


use the regression just completed to
make our decision about which
model to interpret.
If the R for the revised model is
higher by 2% or more, we will base
out interpretation on the revised
model; otherwise, we will interpret
the baseline model.

ters II
Slide
45

Selecting the model to interpret evidence and


answer

Prior to any transformations of variables to


satisfy the assumptions of multiple regression
and the removal of outliers, the proportion of
variance in the dependent variable explained by
the independent variables (R) was 28.1%.
After substituting transformed variables, the
proportion of variance in the dependent variable
explained by the independent variables (R)
was 27.1%.
Since the revised regression model did not
explain at least two percent more variance than
explained by the baseline regression analysis,
the baseline regression model with all cases and
the original form of all variables should be used
for the interpretation.
The transformations used to satisfy the
assumptions will not be used, so cautions
should be added for the assumptions violated.
False is the correct answer to the question.

ters II
Slide
46

Re-running the baseline regression - 1

Having decided to use the baseline


model for the interpretation of this
analysis, the SPSS regression
output was re-created.

To run the baseline regression


again, select the Linear
Regression command from
the menu that drops down
when you click on the Dialog
Recall button.

ters II
Slide
47

Re-running the baseline regression - 2

Remove the transformed


variable lgspdeg from the
dependent variable textbox
and add the variable spdeg.

Click on the Save


button to remove
the request to
save standardized
residuals to the
data editor.

ters II
Slide
48

Revised regression using transformations


and omitting outliers - 3

Clear the checkbox for


Standardized Residuals
so that SPSS does not
save a new set of them
in the data editor when it
runs the new regression.

Click on the
Continue
button to close
the dialog box.

ters II
Slide
49

Re-running the baseline regression - 4

Click on the OK
button to
request the
regression
output.

ters II
Slide
50

Assumption of independence of errors - question

We can now check the


assumption of independence
of errors for the analysis we
will interpret.

ters II
Slide
51

Assumption of independence of errors:


evidence and answer
Model Summaryc

Model
1
2

Having selected Adjusted


a regression
model
Std. Error
of for
R Square
can now
Rinterpretation,
R Square we
R Square
theexamine
Estimate the
Change
final
of-.015
independence
.014a assumptions
.000
1.290 of
.000
b
errors.
.531
.281
.265
1.098
.281

Change Statistics
F Change
.013
51.670

df1

df2
133
132

2
1

Sig. F Change
.987
.000

Durbin-W
atson

a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT

The
Durbin-Watson statistic is used to
b. Predictors:
(Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREE
test for the presence of serial correlation
among the residuals, i.e., the
assumption of independence of errors,
which requires that the residuals or
errors in prediction do not follow a
pattern from case to case.

c. Dependent Variable: SPOUSES HIGHEST DEGREE

The value of the Durbin-Watson statistic


ranges from 0 to 4. As a general rule of
thumb, the residuals are not correlated
if the Durbin-Watson statistic is
approximately 2, and an acceptable
range is 1.50 - 2.50.

The Durbin-Watson
statistic for this problem is
1.754 which falls within
the acceptable range.
If the Durbin-Watson
statistic was not in the
acceptable range, we
would add a caution to the
findings for a violation of
regression assumptions.
The answer to the
question is true.

1.754

ters II
Slide
52

Multicollinearity - question

The final condition that can have


an impact on our interpretation
is multicollinearity.

ters II
Slide
53

Multicollinearity evidence and answer

The tolerance values for all of the independent variables


are larger than 0.10: "highest academic degree" [degree]
(.990), "age" [age] (.954) and "sex" [sex] (.947).
Multicollinearity is not a problem in this regression analysis.
True is the correct answer to the question.

ters II
Slide
54

Overall relationship between dependent variable


and independent variables - question

The first finding we want to


confirm concerns the
relationship between the
dependent variable and the set
of predictors after including the
control variables in the analysis.

ters II
Slide
55

Overall relationship between dependent variable


and independent variables evidence and answer
Hierarchical multiple regression was performed to test the
hypothesis that there was a relationship between the dependent
variable "spouse's highest academic degree" [spdeg] and the
predictor independent variables "highest academic degree"
[degree] after controlling for the effect of the control independent
variables "age" [age] and "sex" [sex]. In hierarchical regression,
the interpretation for overall relationship focuses on the change in
R. If change in R is statistically significant, the overall
relationship for all independent variables will be significant as well.

ters II
Slide
56

Overall relationship between dependent variable


and independent variables evidence and answer

Based on model 2 in the Model Summary table where the predictors


were added , (F(1, 132) = 51.670, p<0.001), the predictor
variable, highest academic degree, did contribute to the overall
relationship with the dependent variable, spouse's highest academic
degree. Since the probability of the F statistic (p<0.001) was less
than or equal to the level of significance (0.05), the null hypothesis
that change in R was equal to 0 was rejected. The research
hypothesis that highest academic degree reduced the error in
predicting spouse's highest academic degree was supported.

ters II
Slide
57

Overall relationship between dependent variable


and independent variables evidence and answer

The increase in R by including the predictor variables


("highest academic degree") in the analysis was 0.281,
not 0.241.
Using a proportional reduction in error interpretation for
R, information provided by the predictor variables
reduced our error in predicting "spouse's highest
academic degree" [spdeg] by 28.1%, not 24.1%.

The answer to the


question is false because
the problem stated an
incorrect statistical value.

ters II
Slide
58

Relationship of the predictor variable and the


dependent variable - question

In these hierarchical regression


problems, we will focus the
interpretation of individual relationships
on the predictor variables and ignore the
contribution of the control variables.

ters II
Relationship of the predictor variable and the
dependent variable evidence and answer

Slide
59

Coefficientsa

Model
1

(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078

Standardized
Coefficients
Beta
.009
-.009
.037
.044
.533

t
3.085
.100
-.100
1.007
.495
.575
7.188

a. Dependent Variable: SPOUSES HIGHEST DEGREE

Based on the statistical test of the b coefficient


(t = 7.188, p<0.001) for the independent
variable "highest academic degree" [degree],
the null hypothesis that the slope or b
coefficient was equal to 0 (zero) was rejected.
The research hypothesis that there was a
relationship between "highest academic
degree" and "spouse's highest academic
degree" was supported.

Sig.
.002
.920
.920
.316
.622
.566
.000

Collinearity Statistics
Tolerance
VIF
.956
.956

1.046
1.046

.954
.947
.990

1.049
1.056
1.010

ters II
Relationship of the predictor variable and the
dependent variable evidence and answer

Slide
60

Coefficientsa

Model
1

(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078

a. Dependent Variable: SPOUSES HIGHEST DEGREE

Standardized
Coefficients
Beta

Collinearity Statistics
Tolerance
VIF

t
Sig.
3.085
.002
The b coefficient for the relationship
.009
.956
between
the .100
dependent.920
variable "spouse's
-.009 academic
-.100 degree"
.920[spdeg].956
highest
and the
independent1.007
variable "highest
academic
.316
degree"
[degree].
was
.559,
which.954
implies
.037
.495
.622
a direct relationship because the sign of
.575
.566Higher numeric
.947
the.044
coefficient
is positive.
values
for
the
independent
variable
.533
7.188
.000
.990

"highest academic degree" [degree] are


associated with higher numeric values for
the dependent variable "spouse's highest
academic degree" [spdeg].

The statement in the problem that "survey


respondents who had higher academic
degrees had spouses with higher academic
degrees" is correct. The answer to the
question is true with caution. Caution in
interpreting the relationship should be
exercised because of an ordinal variable
treated as metric; and violation of the
assumption of normality.

1.046
1.046
1.049
1.056
1.010

ters II
Slide
61

Validation analysis - question

The problem states the


random number seed to use
in the validation analysis.

ters II
Slide
62

Validation analysis:
set the random number seed

Validate the results of


your regression analysis
by conducting a 75/25%
cross-validation, using
998794 as the random
number seed.

To set the random number


seed, select the Random
Number Seed command
from the Transform menu.

ters II
Slide
63

Set the random number seed

First, click on the


Set seed to option
button to activate
the text box.

Second, type in the


random seed stated in
the problem.

Third, click on the OK


button to complete the
dialog box.
Note that SPSS does not
provide you with any
feedback about the change.

ters II
Slide
64

Validation analysis:
compute the split variable

To enter the formula for the


variable that will split the
sample in two parts, click
on the Compute
command.

ters II
Slide
65

The formula for the split variable


First, type the name for the
new variable, split, into the
Target Variable text box.
Second, the formula for the
value of split is shown in the
text box.
The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.75.

Third, click on the


OK button to
complete the dialog
box.

If the random number is less


than or equal to 0.75, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.75,
the formula will return a 0,
the SPSS numeric equivalent
to false.

ters II
Slide
66

The split variable in the data editor

In the data editor, the


split variable shows a
random pattern of zeros
and ones.
To select the cases for the
training sample, we select
the cases where split = 1.

ters II
Slide
67

Repeat the regression for the validation

To run the regression for the


validation training sample,
select the Linear Regression
command from the menu that
drops down when you click on
the Dialog Recall button.

ters II
Slide
68

Using "split" as the selection variable

First, scroll
down the list of
variables and
highlight the
variable split.

Second, click on the


right arrow button to
move the split variable
to the Selection
Variable text box.

ters II
Slide
69

Setting the value of split to select cases

When the variable named


split is moved to the
Selection Variable text
box, SPSS adds "=?" after
the name to prompt up to
enter a specific value for
split.

Click on the
Rule button
to enter a
value for split.

ters II
Slide
70

Completing the value selection

First, type the value


for the training
sample, 1, into the
Value text box.

Second, click on the


Continue button to
complete the value entry.

ters II
Slide
71

Requesting output for the validation analysis

Click on the OK
button to
request the
output.

When the value entry


dialog box is closed, SPSS
adds the value we entered
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.

ters II
Slide
72

Validation analysis - 1

The validation analysis requires that the


regression model for the 75% training
sample replicate the pattern of statistical
significance found for the full data set.

In the analysis of the 75% training sample, the


relationship between the set of independent
variables and the dependent variable was
statistically significant, F(3, 103) = 11.569,
p<0.001, as was the overall relationship in the
analysis of the full data set, F(3, 132) = 17.235,
p<0.001

ters II
Slide
73

Validation analysis - 2
The validation of a hierarchical regression
model also requires that the change in R
demonstrate statistical significance in the
analysis of the 75% training sample.

The R change of 0.249


satisfied this requirement
(F change(1, 103) =
34.319, p<0.001).

ters II
Slide
74

Validation analysis - 3
The pattern of significance for the individual
relationships between the dependent variable and
the predictor variable was the same for the
analysis using the full data set and the 75%
training sample.

The relationship between highest academic degree and


spouse's highest academic degree was statistically significant
in both the analysis using the full data set (t=7.188,
p<0.001) and the analysis using the 75% training sample
(t=5.484, p<0.001). The pattern of statistical significance of
the independent variables for the analysis using the 75%
training sample matched the pattern identified in the
analysis of the full data set.

ters II
Slide
75

Validation analysis - 4

The total proportion of variance explained in the


model using the training sample was 25.2%
(.502), compared to 40.6% (.637) for the
validation sample. The value of R for the
validation sample was actually larger than the
value of R for the training sample, implying a
better fit than obtained for the training sample.
This supports a conclusion that the regression
model would be effective in predicting scores for
cases other than those included in the sample.

The validation analysis


supported the
generalizability of the
findings of the analysis to
the population
represented by the sample
in the data set.
The answer to the
question is true.

SW388R7
Data Analysis &
Computers II
Slide 76

Steps in complete hierarchical


regression analysis
The following flow charts depict the process for solving the complete
regression problem and determining the answer to each of the
questions encountered in the complete analysis.
Text in italics (e.g. True, False, True with caution, Incorrect
application of a statistic) represent the answers to each specific
question.
Many of the steps in hierarchical regression analysis are identical to
the steps in standard regression analysis. Steps that are different are
identified with a magenta background, with the specifics of the
difference underlined.

ters II
Slide
77

Complete Hierarchical multiple regression analysis:


level of measurement
Question: do variables included in the analysis satisfy the level
of measurement requirements?

Is the dependent
variable metric and the
independent variables
metric or dichotomous?
Examine all independent
variables controls as
well as predictors

No

Incorrect
application of
a statistic

Yes
Ordinal variables included
in the relationship?

No
True

Yes

True with caution

ters II
Slide
78

Complete Hierarchical multiple regression analysis:


sample size
Question: Number of variables and cases satisfy sample size
requirements?
Compute the baseline
regression in SPSS

Ratio of cases to
independent variables at
least 5 to 1?

Include both controls and


predictors, in the count of
independent variables

No

Inappropriate
application of
a statistic

Yes

Ratio of cases to
independent variables at
preferred sample size of at
least 15 to 1?

Yes
True

No

True with caution

ters II
Slide
79

Complete Hierarchical multiple regression analysis:


assumption of normality
Question: each metric variable satisfies the assumption of
normality?
Test the dependent
variable and both
controls and predictor
independent variables
The variable satisfies
criteria for a normal
distribution?

Yes
True
If more than one
transformation
satisfies normality,
use one with
smallest skew

No

False

Log, square root, or


inverse
transformation
satisfies normality?

Yes
Use transformation
in revised model,
no caution needed

No

Use untransformed
variable in analysis,
add caution to
interpretation for
violation of normality

ters II
Complete Hierarchical multiple regression analysis:
assumption of linearity

Slide
80

Question: relationship between dependent variable and metric


independent variable satisfies assumption of linearity?
If dependent variable was
transformed for normality, use
transformed dependent
variable in the test for linearity.

Probability of Pearson
correlation (r) <=
level of significance?

If independent variable
was transformed to
satisfy normality, skip
check for linearity.

No

If more than one


transformation
satisfies
linearity, use one
with largest r
Probability of correlation
(r) for relationship with
any transformation of IV
<= level of significance?

No
Test both
control and
predictor
independen
t variables

Yes

Yes

Use transformation
in revised model

True

Weak
relationship.
No caution
needed

ters II
Slide
81

Complete Hierarchical multiple regression analysis:


assumption of homogeneity of variance
Question: variance in dependent variable is uniform across the
categories of a dichotomous independent variable?
If dependent variable was
transformed for normality,
substitute transformed
dependent variable in the test
for the assumption of
homogeneity of variance
Test both
control and
predictor
independen
t variables

Probability of Levene
statistic <= level of
significance?

No
True

Yes

False

Do not test transformations of


dependent variable, add caution to
interpretation for violation of
homoscedasticity

ters II
Slide
82

Complete Hierarchical multiple regression


analysis: detecting outliers
Question: After incorporating any transformations, no outliers
were detected in the regression analysis.
If any variables were transformed
for normality or linearity, substitute
transformed variables in the
regression for the detection of
outliers.

Is the standardized residual


for any case greater than
+/-3.00?

Yes

False

No
True

Remove outliers and run


revised regression again.

ters II
Slide
83

Complete Hierarchical multiple regression analysis:


picking regression model for interpretation
Question: interpretation based on model that includes
transformation of variables and removes outliers?

Yes
Pick revised regression with
transformations and omitting
outliers for interpretation

True

R for revised regression


greater than R for
baseline regression by 2%
or more?

No
Pick baseline regression with
untransformed variables and all
cases for interpretation

False

ters II
Slide
84

Complete Hierarchical multiple regression analysis:


assumption of independence of errors
Question: serial correlation of errors is not a problem in this regression
analysis?

Residuals are
independent,
Durbin-Watson between
1.5 and 2.5?

Yes

True

No

False

NOTE: caution
for violation of
assumption of
independence of
errors

ters II
Slide
85

Complete Hierarchical multiple regression analysis:


multicollinearity
Question: Multicollinearity is not a problem in this regression analysis?

Tolerance for all IVs


greater than 0.10,
indicating no
multicollinearity?

Yes
True

No

False

NOTE: halt the


analysis until
problem is
diagnosed

ters II
Slide
86

Complete Hierarchical multiple regression analysis:


overall relationship
Question: Finding about overall relationship between
dependent variable and independent variables.
Probability of F test of R
change less than/equal to
level of significance?

No

False

Yes

Strength of R change for


predictor variables
interpreted correctly?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

True with caution

ters II
Slide
87

Complete Hierarchical multiple regression analysis:


individual relationships
Question: Finding about individual relationship between
independent variable and dependent variable.
Probability of t test
between predictors and DV
<= level of significance?

No

False

Yes

Direction of relationship
between predictors and DV
interpreted correctly?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

True with caution

ters II
Slide
88

Complete Hierarchical multiple regression analysis:


individual relationships
Question: Finding about independent variable with largest
impact on dependent variable.

Does the stated variable


have the largest beta
coefficient (ignoring sign)
among predictors?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

True with caution

ters II
Slide
89

Complete Hierarchical multiple regression analysis:


validation analysis - 1
Question: The validation analysis supports the generalizability of the
findings?
Set the random seed and randomly
split the sample into 75% training
sample and 25% validation
sample.

Probability of ANOVA test


for training sample <=
level of significance?

No

False

Yes

Probability of F for R
change for training sample
<= level of significance?

Yes

No

False

ters II
Slide
90

Complete Hierarchical multiple regression analysis:


validation analysis - 2

Pattern of significance for


predictor variables in
training sample matches
pattern for full data set?

No

False

Yes

Shrinkage in R (R for
training sample - R for
validation sample) < 2%?

Yes
True

No

False