Vous êtes sur la page 1sur 90

# SW388R7

Computers II

Slide 1

## Differences between hierarchical

and standard multiple regression
Sample problem
Steps in hierarchical multiple regression

Compu
ters II

## Differences between standard and hierarchical

multiple regression

Slide 2

## Standard multiple regression is used to evaluate the relationship

between a set of independent variables and a dependent variable.

## Hierarchical regression is used to evaluate the relationship between a

set of independent variables and the dependent variable, controlling
for or taking into account the impact of a different set of independent
variables on the dependent variable.

## For example, a research hypothesis might state that there are

differences between the average salary for male employees and female
employees, even after we take into account differences between
education levels and prior work experience.

## In hierarchical regression, the independent variables are entered into

the analysis in a sequence of blocks, or groups that may contain one or
more variables. In the example above, education and work experience
would be entered in the first block and sex would be entered in the
second block.

Compu
ters II
Differences in statistical results

Slide 3

## SPSS shows the statistical results (Model Summary, ANOVA,

Coefficients, etc.) as each block of variables is entered into the
analysis.

In addition (if requested), SPSS prints and tests the key statistic
used in evaluating the hierarchical hypothesis: change in R for
each additional block of variables.

## The null hypothesis for the addition of each block of variables

to the analysis is that the change in R (contribution to the
explanation of the variance in the dependent variable) is zero.

## If the null hypothesis is rejected, then our interpretation

indicates that the variables in block 2 had a relationship to the
dependent variable, after controlling for the relationship of the
block 1 variables to the dependent variable.

Compu
ters II
Variations in hierarchical regression - 1

Slide 4

## A hierarchical regression can have as many blocks as there are

independent variables, i.e. the analyst can specify a hypothesis
that specifies an exact order of entry for variables.

## A more common hierarchical regression specifies two blocks of

variables: a set of control variables entered in the first block
and a set of predictor variables entered in the second block.

## Control variables are often demographics which are thought to

make a difference in scores on the dependent variable.
Predictors are the variables in whose effect our research
question is really interested, but whose effect we want to
separate out from the control variables.

Compu
ters II
Variations in hierarchical regression - 2

Slide 5

## Support for a hierarchical hypothesis would be expected to

require statistical significance for the addition of each block of
variables.

## However, many times, we want to exclude the effect of blocks

of variables previously entered into the analysis, whether or not
a previous block was statistically significant. The analysis is
interested in obtaining the best indicator of the effect of the
predictor variables. The statistical significance of previously
entered variables is not interpreted.

problems.

Compu
ters II

problems

Slide 6

## R change, i.e. the increase when the predictors variables are

added to the analysis is interpreted rather than the overall R
for the model with all variables entered.

## In the interpretation of individual relationships, the

relationship between the predictors and the dependent variable
is presented.

## Similarly, in the validation analysis, we are only concerned with

verifying the significance of the predictor variables.
Differences in control variables are ignored.

Compu
ters II
Slide 7

## The problem asks us to examine the feasibility

of doing multiple regression to evaluate the
relationships among these variables. The
inclusion of the controlling for phrase
indicates that this is a hierarchical multiple
regression problem.
Multiple regression is feasible if the dependent
variable is metric and the independent
variables (both predictors and controls) are
metric or dichotomous, and the available data
is sufficient to satisfy the sample size
requirements.

Compu
ters II
Slide 8

## Level of measurement - answer

Hierarchical multiple regression
requires that the dependent
variable be metric and the
independent variables be metric
or dichotomous.

## "Spouse's highest academic degree" [spdeg] is ordinal, satisfying the

metric level of measurement requirement for the dependent variable, if
we follow the convention of treating ordinal level variables as metric.
Since some data analysts do not agree with this convention, a note of
caution should be included in our interpretation.
"Age" [age] is interval, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
"Highest academic degree" [degree] is ordinal, satisfying the metric or
dichotomous level of measurement requirement for independent
variables, if we follow the convention of treating ordinal level variables
as metric. Since some data analysts do not agree with this convention, a
note of caution should be included in our interpretation.
"Sex" [sex] is dichotomous, satisfying the metric or dichotomous level of
measurement requirement for independent variables.
True with caution
is the correct

Compu
ters II
Slide 9

## The second question asks about the

sample size requirements for multiple
regression.
To answer this question, we will run the
initial or baseline multiple regression to
obtain some basic data about the
problem and solution.

ters II
Slide
10

## After we check for violations of

assumptions and outliers, we will
make a decision whether we should
interpret the model that includes the
transformed variables and omits
outliers (the revised model), or
whether we will interpret the model
that uses the untransformed
variables and includes all cases
including the outliers (the baseline
model).
In order to make this decision, we
run the baseline regression before
we examine assumptions and
outliers, and record the R for the
baseline model. If using
transformations and outliers
substantially improves the analysis
(a 2% increase in R), we interpret
the revised model. If the increase is
smaller, we interpret the baseline
model.

## To run the baseline

model, select Regression
| Linear from the
Analyze model.

ters II
Slide
11

## The baseline regression - 2

First, move the
dependent variable spdeg
to the Dependent text
box.

## Second, move the

independent variables to
control for age and sex
to the Independent(s)
list box.

## Fourth, click on the Next

button to tell SPSS to add
another block of variables
to the regression analysis.

## Third, select the method for

entering the variables into the
analysis from the drop down
Method menu. In this example,
we accept the default of Enter for
direct entry of all variables in the
first block which will force the
controls into the regression.

ters II
Slide
12

## The baseline regression - 3

SPSS identifies that we
will now be adding
variables to a second
block.

## First, move the

predictor independent
variable degree to the
Independent(s) list box
for block 2.

## Second, click on the

Statistics button to
specify the statistics
options that we want.

ters II
Slide
13

## The baseline regression - 4

First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.

## Second, mark the checkboxes for Model

Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Fifth, click on
the Continue
button to close
the dialog box.

Durbin-Watson
statistic on the
Residuals panel.

## Fourth, mark the

Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.

ters II
Slide
14

Click on the OK
button to
request the
regression
output.

ters II
Slide
15

## R for the baseline model

The R of 0.281 is the benchmark
that we will use to evaluate the
utility of transformations and the
elimination of outliers.

## Prior to any transformations of variables

to satisfy the assumptions of multiple
regression or the removal of outliers,
the proportion of variance in the
dependent variable explained by the
independent variables (R) was 28.1%.
The relationship is statistically
significant, though we would not stop if
it were not significant because the lack
of significance may be a consequence of
violation of assumptions or the inclusion
of outliers.

ters II
Slide
16

## Sample size evidence and answer

Descriptive Statistics
Mean
SPOUSES HIGHEST
DEGREE
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Std. Deviation

1.78

1.281

136

45.80
1.60
1.65

14.534
.491
1.220

136
136
136

## Hierarchical multiple regression requires that the

minimum ratio of valid cases to independent
variables be at least 5 to 1. The ratio of valid
cases (136) to number of independent variables
(3) was 45.3 to 1, which was equal to or greater
than the minimum ratio. The requirement for a
minimum ratio of cases to independent variables
was satisfied.
In addition, the ratio of 45.3 to 1 satisfied the
preferred ratio of 15 cases per independent
variable.
The answer to the question is true.

ters II
Slide
17

## Assumption of normality for the dependent

variable - question

## Having satisfied the level of measurement

and sample size requirements, we turn our
attention to conformity with three of the
assumptions of multiple regression:
normality, linearity, and homoscedasticity.
First, we will evaluate the assumption of
normality for the dependent variable.

ters II
Slide
18

## Run the script to test normality

First, move the variables to the
list boxes based on the role that
the variable plays in the analysis
and its level of measurement.

## Second, click on the Normality option

button to request that SPSS produce
the output needed to evaluate the
assumption of normality.

Fourth, click on
the OK button to
produce the output.
Third, mark the checkboxes
for the transformations that
we want to test in evaluating
the assumption.

ters II
Slide
19

## Normality of the dependent variable:

spouses highest degree
Descriptives
SPOUSES
HIGHEST DEGREE

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

## The dependent variable "spouse's highest

academic degree" [spdeg] did not satisfy the
criteria for a normal distribution. The
skewness of the distribution (0.573) was
between -1.0 and +1.0, but the kurtosis of
the distribution (-1.051) fell outside the
range from -1.0 to +1.0.

Statistic
1.78
1.56

Std. Error
.110

2.00
1.75
1.00
1.640
1.281
0
4
4
2.00
.573
-1.051

.208
.413

## The answer to the

question is false.

ters II
Slide
20

## Normality of the transformed dependent variable:

spouses highest degree

## The "log of spouse's highest academic degree

[LGSPDEG=LG10(1+SPDEG)]" satisfied the criteria
for a normal distribution. The skewness of the
distribution (-0.091) was between -1.0 and +1.0 and
the kurtosis of the distribution (-0.678) was between
-1.0 and +1.0.
The "log of spouse's highest academic degree
[LGSPDEG=LG10(1+SPDEG)]" was substituted for
"spouse's highest academic degree" [spdeg] in the
analysis.

ters II
Slide
21

## Next, we will evaluate the

assumption of normality for
the control variable, age.

ters II
Slide
22

## Normality of the control variable: age

Descriptives
AGE OF RESPONDENT Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

## The independent variable "age" [age]

satisfied the criteria for a normal distribution.
The skewness of the distribution (0.595) was
between -1.0 and +1.0 and the kurtosis of
the distribution (-0.351) was between -1.0
and +1.0.

Statistic
45.99
43.98

Std. Error
1.023

48.00
45.31
43.50
282.465
16.807
19
89
70
24.00
.595
-.351

.148
.295

ters II
Slide
23

## Next, we will evaluate the

assumption of normality for
the predictor variable,

ters II
Slide
24

## Normality of the predictor variable:

respondents highest academic degree
Descriptives
RS HIGHEST DEGREE

Mean
95% Confidence
Interval for Mean

Lower Bound
Upper Bound

5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skewness
Kurtosis

## The independent variable "highest academic

degree" [degree] satisfied the criteria for a
normal distribution. The skewness of the
distribution (0.948) was between -1.0 and
+1.0 and the kurtosis of the distribution
(-0.051) was between -1.0 and +1.0.

Statistic
1.41
1.27

Std. Error
.071

1.55
1.35
1.00
1.341
1.158
0
4
4
1.00
.948
-.051

.149
.297

ters II
Slide
25

## Assumption of linearity for spouses degree and

respondents degree - question

## The metric independent variables satisfied the criteria for

normality, but the dependent variable did not.
However, the logarithmic transformation of "spouse's highest
academic degree" produced a variable that was normally
distributed and will be tested as a substitute in the analysis.
The script for linearity will support our using the transformed
dependent variable without having to add it to the data set.

ters II
Slide
26

## When the linearity option is

selected, a default set of
transformations to test is marked.

## First, click on the Linearity

option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.

## Second , since we have decided to

use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.

## Third, click on the

OK button to
produce the output.

ters II
Slide
27

## Linearity test: spouses highest degree and

respondents highest academic degree

## The correlation between "highest

academic degree" and logarithmic
transformation of "spouse's highest
academic degree" was statistically
significant (r=.519, p<0.001). A
linear relationship exists between
these variables.

ters II
Slide
28

respondents age

## The assessment of the linear

relationship between logarithmic
transformation of "spouse's highest
[LGSPDEG=LG10(1+SPDEG)] and "age"
[age] indicated that the relationship
was weak, rather than nonlinear.
Neither the correlation between
logarithmic transformation of "spouse's
highest academic degree" and "age"
nor the correlations with the
transformations were statistically
significant.
The correlation between "age" and
logarithmic transformation of "spouse's
highest academic degree" was not
statistically significant (r=.009,
p=0.921). The correlations for the
transformations were: the logarithmic
transformation (r=.061, p=0.482); the
square root transformation (r=.034,
p=0.692); the inverse transformation
(r=.112, p=0.194); and the square
transformation (r=-.037, p=0.668)

ters II
Slide
29

## Sex is the only dichotomous

independent variable in the analysis.
We will test if for homogeneity of
variance using the logarithmic
transformation of the dependent
variable which we have already
decided to use.

ters II
Slide
30

## Run the script to test

homogeneity of variance

## When the homogeneity of variance

option is selected, a default set of
transformations to test is marked.
First, click on the
Homogeneity of variance
option button to request
that SPSS produce the
output needed to evaluate
the assumption of linearity.

## Second , since we have decided to

use the log transformation of the
dependent variable, we mark the
check box for the Logarithmic
transformation and clear the check
box for the Untransformed version
of the dependent variable.

## Third, click on the

OK button to
produce the output.

ters II
Slide
31

## Based on the Levene Test, the

variance in "log of spouse's highest
[LGSPDEG=LG10(1+SPDEG)]" was
homogeneous for the categories of
"sex" [sex]. The probability
associated with the Levene statistic
(0.687) was p=0.409, greater than
the level of significance for testing
assumptions (0.01). The null
hypothesis that the group variances
were equal was not rejected.
The homogeneity of variance
assumption was satisfied. The
answer to the question is true.

ters II
Slide
32

## In the evaluation for normality, we resolved a problem with

normality for spouses highest academic degree with a
logarithmic transformation. We need to add this transformed
variable to the data set, so that we can incorporate it in our
detection of outliers.
We can use the script to compute transformed variables and add
them to the data set.
We select an assumption to test (Normality is the easiest), mark
the check box for the transformation we want to retain, and
clear the check box "Delete variables created in this analysis."

## NOTE: this will leave the transformed

variable in the data set. To remove it,
you can delete the column or close the
data set without saving.

ters II
Slide
33

## Including the transformed variable in the data set - 2

First, move the variable
SPDEG to the list box for
the dependent variable.

## Second, click on the

Normality option button to
request that SPSS do the test
for normality, including the
transformation we will mark.

## Third, mark the transformation

we want to retain (Logarithmic)
and clear the checkboxes for
the other transformations.

## Fourth, clear the check

box for the option
"Delete variables
created in this analysis".

Fifth, click on
the OK button.

ters II
Slide
34

## If we scroll to the rightmost

column in the data editor, we
see than the log of SPDEG in
included in the data set.

ters II
Slide
35

## Including the transformed variable in the list of

variables in the script - 1

## If we scroll to the bottom of

the list of variables, we see
that the log of SPDEG is not
included in the list of available
variables.

## To tell the script to add the

log of SPDEG to the list of
variables in the script, click
on the Reset button. This
will start the script over
again, with a new list of
variables from the data set.

ters II
Slide
36

## Including the transformed variable in the list of

variables in the script - 2

## If we scroll to the bottom of

the list of variables now, we
see that the log of SPDEG is
included in the list of available
variables.

ters II
Slide
37

## In multiple regression, an outlier in the solution

can be defined as a case that has a large residual
because the equation did a poor job of predicting
its value.
We will run the regression again incorporating any
transformations we have decided to test, and have
SPSS compute the standardized residual for each
case. Cases with a standardized residual larger
than +/- 3.0 will be treated as outliers.

ters II
Slide
38

## To run the regression to

detect outliers, select the
Linear Regression command
from the menu that drops
down when you click on the
Dialog Recall button.

ters II
Slide
39

## The revised regression:

substituting transformed variables

## Remove the variable SPDEG

from the list of independent
variables. Include the log of
the variable, LGSPDEG.

## Click on the Statistics

button to select statistics
we will need for the
analysis.

ters II
Slide
40

## The revised regression: selecting statistics

First, mark the
checkboxes for
Estimates on the
Regression
Coefficients panel.

## Second, mark the checkboxes for Model

Fit, Descriptives, and R squared change.
The R squared change statistic will tell
us whether or not the variables added
after the controls have a relationship to
the dependent variable.

Durbin-Watson
statistic on the
Residuals panel.

## Fourth, mark the

checkbox for the
Casewise diagnostics,
which will be used to
identify outliers.

Sixth, click on
the Continue
button to close
the dialog box.

## Fifth, mark the

Collinearity diagnostics
to get tolerance values
for testing
multicollinearity.

ters II
Slide
41

## Mark the checkbox for

Standardized Residuals so
that SPSS saves a new
variable in the data editor.
We will use this variable to
omit outliers in the revised
regression model.

Click on the
Continue
button to close
the dialog box.

ters II
Slide
42

## The revised regression: obtaining output

Click on the OK
button to obtain
the output for the
revised model.

ters II
Slide
43

## Outliers in the analysis

If cases have a standardized residual larger than +/- 3.0,
SPSS creates a table titled Casewise Diagnostics, in which it
lists the cases and values that results in their being an outlier.
If there are no outliers, SPSS does not print the Casewise
Diagnostics table. There was no table for this problem. The
answer to the question is true.

## We can verify that all standardized residuals

were less than +/- 3.0 by looking the
minimum and maximum standardized
residuals in the table of Residual Statistics.
Both the minimum and maximum fell in the
acceptable range.
Since there were no outliers,
we can use the regression just
completed to make our decision
about which model to interpret.

ters II
Slide
44

## Since there were no outliers, we can

use the regression just completed to
make our decision about which
model to interpret.
If the R for the revised model is
higher by 2% or more, we will base
out interpretation on the revised
model; otherwise, we will interpret
the baseline model.

ters II
Slide
45

## Prior to any transformations of variables to

satisfy the assumptions of multiple regression
and the removal of outliers, the proportion of
variance in the dependent variable explained by
the independent variables (R) was 28.1%.
After substituting transformed variables, the
proportion of variance in the dependent variable
explained by the independent variables (R)
was 27.1%.
Since the revised regression model did not
explain at least two percent more variance than
explained by the baseline regression analysis,
the baseline regression model with all cases and
the original form of all variables should be used
for the interpretation.
The transformations used to satisfy the
assumptions will not be used, so cautions
should be added for the assumptions violated.
False is the correct answer to the question.

ters II
Slide
46

## Having decided to use the baseline

model for the interpretation of this
analysis, the SPSS regression
output was re-created.

## To run the baseline regression

again, select the Linear
Regression command from
the menu that drops down
when you click on the Dialog
Recall button.

ters II
Slide
47

## Remove the transformed

variable lgspdeg from the
dependent variable textbox
and add the variable spdeg.

## Click on the Save

button to remove
the request to
save standardized
residuals to the
data editor.

ters II
Slide
48

## Revised regression using transformations

and omitting outliers - 3

## Clear the checkbox for

Standardized Residuals
so that SPSS does not
save a new set of them
in the data editor when it
runs the new regression.

Click on the
Continue
button to close
the dialog box.

ters II
Slide
49

Click on the OK
button to
request the
regression
output.

ters II
Slide
50

## We can now check the

assumption of independence
of errors for the analysis we
will interpret.

ters II
Slide
51

Model Summaryc

Model
1
2

## Having selected Adjusted

a regression
model
Std. Error
of for
R Square
can now
Rinterpretation,
R Square we
R Square
theexamine
Estimate the
Change
final
of-.015
independence
.014a assumptions
.000
1.290 of
.000
b
errors.
.531
.281
.265
1.098
.281

Change Statistics
F Change
.013
51.670

df1

df2
133
132

2
1

Sig. F Change
.987
.000

Durbin-W
atson

## a. Predictors: (Constant), RESPONDENTS SEX, AGE OF RESPONDENT

The
Durbin-Watson statistic is used to
b. Predictors:
(Constant), RESPONDENTS SEX, AGE OF RESPONDENT, RS HIGHEST DEGREE
test for the presence of serial correlation
among the residuals, i.e., the
assumption of independence of errors,
which requires that the residuals or
errors in prediction do not follow a
pattern from case to case.

## The value of the Durbin-Watson statistic

ranges from 0 to 4. As a general rule of
thumb, the residuals are not correlated
if the Durbin-Watson statistic is
approximately 2, and an acceptable
range is 1.50 - 2.50.

The Durbin-Watson
statistic for this problem is
1.754 which falls within
the acceptable range.
If the Durbin-Watson
statistic was not in the
acceptable range, we
would add a caution to the
findings for a violation of
regression assumptions.
The answer to the
question is true.

1.754

ters II
Slide
52

Multicollinearity - question

## The final condition that can have

an impact on our interpretation
is multicollinearity.

ters II
Slide
53

## The tolerance values for all of the independent variables

are larger than 0.10: "highest academic degree" [degree]
(.990), "age" [age] (.954) and "sex" [sex] (.947).
Multicollinearity is not a problem in this regression analysis.
True is the correct answer to the question.

ters II
Slide
54

## Overall relationship between dependent variable

and independent variables - question

## The first finding we want to

confirm concerns the
relationship between the
dependent variable and the set
of predictors after including the
control variables in the analysis.

ters II
Slide
55

## Overall relationship between dependent variable

and independent variables evidence and answer
Hierarchical multiple regression was performed to test the
hypothesis that there was a relationship between the dependent
variable "spouse's highest academic degree" [spdeg] and the
predictor independent variables "highest academic degree"
[degree] after controlling for the effect of the control independent
variables "age" [age] and "sex" [sex]. In hierarchical regression,
the interpretation for overall relationship focuses on the change in
R. If change in R is statistically significant, the overall
relationship for all independent variables will be significant as well.

ters II
Slide
56

## Overall relationship between dependent variable

and independent variables evidence and answer

## Based on model 2 in the Model Summary table where the predictors

were added , (F(1, 132) = 51.670, p<0.001), the predictor
variable, highest academic degree, did contribute to the overall
relationship with the dependent variable, spouse's highest academic
degree. Since the probability of the F statistic (p<0.001) was less
than or equal to the level of significance (0.05), the null hypothesis
that change in R was equal to 0 was rejected. The research
hypothesis that highest academic degree reduced the error in
predicting spouse's highest academic degree was supported.

ters II
Slide
57

## Overall relationship between dependent variable

and independent variables evidence and answer

## The increase in R by including the predictor variables

("highest academic degree") in the analysis was 0.281,
not 0.241.
Using a proportional reduction in error interpretation for
R, information provided by the predictor variables
reduced our error in predicting "spouse's highest
academic degree" [spdeg] by 28.1%, not 24.1%.

## The answer to the

question is false because
the problem stated an
incorrect statistical value.

ters II
Slide
58

## Relationship of the predictor variable and the

dependent variable - question

## In these hierarchical regression

problems, we will focus the
interpretation of individual relationships
on the predictor variables and ignore the
contribution of the control variables.

ters II
Relationship of the predictor variable and the
dependent variable evidence and answer

Slide
59

Coefficientsa

Model
1

(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078

Standardized
Coefficients
Beta
.009
-.009
.037
.044
.533

t
3.085
.100
-.100
1.007
.495
.575
7.188

## Based on the statistical test of the b coefficient

(t = 7.188, p<0.001) for the independent
variable "highest academic degree" [degree],
the null hypothesis that the slope or b
coefficient was equal to 0 (zero) was rejected.
The research hypothesis that there was a
relationship between "highest academic
degree" and "spouse's highest academic
degree" was supported.

Sig.
.002
.920
.920
.316
.622
.566
.000

Collinearity Statistics
Tolerance
VIF
.956
.956

1.046
1.046

.954
.947
.990

1.049
1.056
1.010

ters II
Relationship of the predictor variable and the
dependent variable evidence and answer

Slide
60

Coefficientsa

Model
1

(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
(Constant)
AGE OF RESPONDENT
RESPONDENTS SEX
RS HIGHEST DEGREE

Unstandardized
Coefficients
B
Std. Error
1.781
.577
.001
.008
-.023
.231
.525
.521
.003
.007
.114
.198
.559
.078

## a. Dependent Variable: SPOUSES HIGHEST DEGREE

Standardized
Coefficients
Beta

Collinearity Statistics
Tolerance
VIF

t
Sig.
3.085
.002
The b coefficient for the relationship
.009
.956
between
the .100
dependent.920
variable "spouse's
-.100 degree"
.920[spdeg].956
highest
and the
independent1.007
variable "highest
.316
degree"
[degree].
was
.559,
which.954
implies
.037
.495
.622
a direct relationship because the sign of
.575
.566Higher numeric
.947
the.044
coefficient
is positive.
values
for
the
independent
variable
.533
7.188
.000
.990

## "highest academic degree" [degree] are

associated with higher numeric values for
the dependent variable "spouse's highest

## The statement in the problem that "survey

degrees" is correct. The answer to the
question is true with caution. Caution in
interpreting the relationship should be
exercised because of an ordinal variable
treated as metric; and violation of the
assumption of normality.

1.046
1.046
1.049
1.056
1.010

ters II
Slide
61

## The problem states the

random number seed to use
in the validation analysis.

ters II
Slide
62

Validation analysis:
set the random number seed

## Validate the results of

by conducting a 75/25%
cross-validation, using
998794 as the random
number seed.

## To set the random number

seed, select the Random
Number Seed command
from the Transform menu.

ters II
Slide
63

## First, click on the

Set seed to option
button to activate
the text box.

## Second, type in the

random seed stated in
the problem.

## Third, click on the OK

button to complete the
dialog box.
Note that SPSS does not
provide you with any
feedback about the change.

ters II
Slide
64

Validation analysis:
compute the split variable

## To enter the formula for the

variable that will split the
sample in two parts, click
on the Compute
command.

ters II
Slide
65

## The formula for the split variable

First, type the name for the
new variable, split, into the
Target Variable text box.
Second, the formula for the
value of split is shown in the
text box.
The uniform(1) function
generates a random decimal
number between 0 and 1.
The random number is
compared to the value 0.75.

## Third, click on the

OK button to
complete the dialog
box.

## If the random number is less

than or equal to 0.75, the
value of the formula will be 1,
the SPSS numeric equivalent
to true. If the random
number is larger than 0.75,
the formula will return a 0,
the SPSS numeric equivalent
to false.

ters II
Slide
66

## In the data editor, the

split variable shows a
random pattern of zeros
and ones.
To select the cases for the
training sample, we select
the cases where split = 1.

ters II
Slide
67

## To run the regression for the

validation training sample,
select the Linear Regression
command from the menu that
drops down when you click on
the Dialog Recall button.

ters II
Slide
68

First, scroll
down the list of
variables and
highlight the
variable split.

## Second, click on the

right arrow button to
move the split variable
to the Selection
Variable text box.

ters II
Slide
69

## When the variable named

split is moved to the
Selection Variable text
box, SPSS adds "=?" after
the name to prompt up to
enter a specific value for
split.

Click on the
Rule button
to enter a
value for split.

ters II
Slide
70

## First, type the value

for the training
sample, 1, into the
Value text box.

## Second, click on the

Continue button to
complete the value entry.

ters II
Slide
71

Click on the OK
button to
request the
output.

## When the value entry

dialog box is closed, SPSS
adds the value we entered
after the equal sign. This
specification now tells
SPSS to include in the
analysis only those cases
that have a value of 1 for
the split variable.

ters II
Slide
72

Validation analysis - 1

## The validation analysis requires that the

regression model for the 75% training
sample replicate the pattern of statistical
significance found for the full data set.

## In the analysis of the 75% training sample, the

relationship between the set of independent
variables and the dependent variable was
statistically significant, F(3, 103) = 11.569,
p<0.001, as was the overall relationship in the
analysis of the full data set, F(3, 132) = 17.235,
p<0.001

ters II
Slide
73

Validation analysis - 2
The validation of a hierarchical regression
model also requires that the change in R
demonstrate statistical significance in the
analysis of the 75% training sample.

## The R change of 0.249

satisfied this requirement
(F change(1, 103) =
34.319, p<0.001).

ters II
Slide
74

Validation analysis - 3
The pattern of significance for the individual
relationships between the dependent variable and
the predictor variable was the same for the
analysis using the full data set and the 75%
training sample.

## The relationship between highest academic degree and

spouse's highest academic degree was statistically significant
in both the analysis using the full data set (t=7.188,
p<0.001) and the analysis using the 75% training sample
(t=5.484, p<0.001). The pattern of statistical significance of
the independent variables for the analysis using the 75%
training sample matched the pattern identified in the
analysis of the full data set.

ters II
Slide
75

Validation analysis - 4

## The total proportion of variance explained in the

model using the training sample was 25.2%
(.502), compared to 40.6% (.637) for the
validation sample. The value of R for the
validation sample was actually larger than the
value of R for the training sample, implying a
better fit than obtained for the training sample.
This supports a conclusion that the regression
model would be effective in predicting scores for
cases other than those included in the sample.

## The validation analysis

supported the
generalizability of the
findings of the analysis to
the population
represented by the sample
in the data set.
The answer to the
question is true.

SW388R7
Data Analysis &
Computers II
Slide 76

## Steps in complete hierarchical

regression analysis
The following flow charts depict the process for solving the complete
regression problem and determining the answer to each of the
questions encountered in the complete analysis.
Text in italics (e.g. True, False, True with caution, Incorrect
application of a statistic) represent the answers to each specific
question.
Many of the steps in hierarchical regression analysis are identical to
the steps in standard regression analysis. Steps that are different are
identified with a magenta background, with the specifics of the
difference underlined.

ters II
Slide
77

## Complete Hierarchical multiple regression analysis:

level of measurement
Question: do variables included in the analysis satisfy the level
of measurement requirements?

Is the dependent
variable metric and the
independent variables
metric or dichotomous?
Examine all independent
variables controls as
well as predictors

No

Incorrect
application of
a statistic

Yes
Ordinal variables included
in the relationship?

No
True

Yes

ters II
Slide
78

## Complete Hierarchical multiple regression analysis:

sample size
Question: Number of variables and cases satisfy sample size
requirements?
Compute the baseline
regression in SPSS

Ratio of cases to
independent variables at
least 5 to 1?

## Include both controls and

predictors, in the count of
independent variables

No

Inappropriate
application of
a statistic

Yes

Ratio of cases to
independent variables at
preferred sample size of at
least 15 to 1?

Yes
True

No

ters II
Slide
79

## Complete Hierarchical multiple regression analysis:

assumption of normality
Question: each metric variable satisfies the assumption of
normality?
Test the dependent
variable and both
controls and predictor
independent variables
The variable satisfies
criteria for a normal
distribution?

Yes
True
If more than one
transformation
satisfies normality,
use one with
smallest skew

No

False

## Log, square root, or

inverse
transformation
satisfies normality?

Yes
Use transformation
in revised model,
no caution needed

No

Use untransformed
variable in analysis,
interpretation for
violation of normality

ters II
Complete Hierarchical multiple regression analysis:
assumption of linearity

Slide
80

## Question: relationship between dependent variable and metric

independent variable satisfies assumption of linearity?
If dependent variable was
transformed for normality, use
transformed dependent
variable in the test for linearity.

Probability of Pearson
correlation (r) <=
level of significance?

If independent variable
was transformed to
satisfy normality, skip
check for linearity.

No

## If more than one

transformation
satisfies
linearity, use one
with largest r
Probability of correlation
(r) for relationship with
any transformation of IV
<= level of significance?

No
Test both
control and
predictor
independen
t variables

Yes

Yes

Use transformation
in revised model

True

Weak
relationship.
No caution
needed

ters II
Slide
81

## Complete Hierarchical multiple regression analysis:

assumption of homogeneity of variance
Question: variance in dependent variable is uniform across the
categories of a dichotomous independent variable?
If dependent variable was
transformed for normality,
substitute transformed
dependent variable in the test
for the assumption of
homogeneity of variance
Test both
control and
predictor
independen
t variables

Probability of Levene
statistic <= level of
significance?

No
True

Yes

False

## Do not test transformations of

dependent variable, add caution to
interpretation for violation of
homoscedasticity

ters II
Slide
82

## Complete Hierarchical multiple regression

analysis: detecting outliers
Question: After incorporating any transformations, no outliers
were detected in the regression analysis.
If any variables were transformed
for normality or linearity, substitute
transformed variables in the
regression for the detection of
outliers.

## Is the standardized residual

for any case greater than
+/-3.00?

Yes

False

No
True

## Remove outliers and run

revised regression again.

ters II
Slide
83

## Complete Hierarchical multiple regression analysis:

picking regression model for interpretation
Question: interpretation based on model that includes
transformation of variables and removes outliers?

Yes
Pick revised regression with
transformations and omitting
outliers for interpretation

True

## R for revised regression

greater than R for
baseline regression by 2%
or more?

No
Pick baseline regression with
untransformed variables and all
cases for interpretation

False

ters II
Slide
84

## Complete Hierarchical multiple regression analysis:

assumption of independence of errors
Question: serial correlation of errors is not a problem in this regression
analysis?

Residuals are
independent,
Durbin-Watson between
1.5 and 2.5?

Yes

True

No

False

NOTE: caution
for violation of
assumption of
independence of
errors

ters II
Slide
85

## Complete Hierarchical multiple regression analysis:

multicollinearity
Question: Multicollinearity is not a problem in this regression analysis?

## Tolerance for all IVs

greater than 0.10,
indicating no
multicollinearity?

Yes
True

No

False

analysis until
problem is
diagnosed

ters II
Slide
86

## Complete Hierarchical multiple regression analysis:

overall relationship
Question: Finding about overall relationship between
dependent variable and independent variables.
Probability of F test of R
change less than/equal to
level of significance?

No

False

Yes

## Strength of R change for

predictor variables
interpreted correctly?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

ters II
Slide
87

## Complete Hierarchical multiple regression analysis:

individual relationships
Question: Finding about individual relationship between
independent variable and dependent variable.
Probability of t test
between predictors and DV
<= level of significance?

No

False

Yes

Direction of relationship
between predictors and DV
interpreted correctly?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

ters II
Slide
88

## Complete Hierarchical multiple regression analysis:

individual relationships
Question: Finding about independent variable with largest
impact on dependent variable.

## Does the stated variable

have the largest beta
coefficient (ignoring sign)
among predictors?

No

False

Yes
Small sample, ordinal
variables, or violation of
assumption in the
relationship?

No
True

Yes

ters II
Slide
89

## Complete Hierarchical multiple regression analysis:

validation analysis - 1
Question: The validation analysis supports the generalizability of the
findings?
Set the random seed and randomly
split the sample into 75% training
sample and 25% validation
sample.

## Probability of ANOVA test

for training sample <=
level of significance?

No

False

Yes

Probability of F for R
change for training sample
<= level of significance?

Yes

No

False

ters II
Slide
90

## Complete Hierarchical multiple regression analysis:

validation analysis - 2

## Pattern of significance for

predictor variables in
training sample matches
pattern for full data set?

No

False

Yes

Shrinkage in R (R for
training sample - R for
validation sample) < 2%?

Yes
True

No

False