Vous êtes sur la page 1sur 68

By Hui Bian

Office For Faculty Excellence


Spring 2012
1
What is structural equation modeling (SEM)
Used to test the hypotheses about potential
interrelationships among the constructs as well as
their relationships to the indicators or measures
assessing them.



2
Theory of planned
behavior (TPB)

Goals of SEM
To determine whether the theoretical model is
supported by sample data or the model fits the data
well.
It helps us understand the complex relationships
among constructs.
3
Factor1
Factor2
Indica1
Indica2
Indica3
Indica4
Indica5
Indica6
error1
error2
error3
error6
error4
error5
4
Example of SEM
5

Measurement
model
Measurement
model
Structural
model
Example of SEM
Basic components of SEM
Latent variables (constructs/factors)
Are the hypothetical constructs of interest in a study, such as: self-
control, self-efficacy, intention, etc.
They cannot be measured directly.
Observed variables (indicators)
Are the variables that are actually measured in the process of data
collection by the researchers using developed instrument/test.
They are used to define or infer the latent variable or construct.
Each of observed variables represents one definition of the latent
variable.



6
Basic components of SEM
Endogenous variables (dependent variables):
variables have at least one arrow leading into it from
another variable.
Exogenous variables (independent variables): any
variable that does not have an arrow leading to it.

7
Basic components of SEM
Measurement error terms
Represents amount of variation in the indicator that is
due to measurement error.
Structural error terms or disturbance terms
Unexplained variance in the latent endogenous variables
due to all unmeasured causes.


8
Basic components of SEM
Covariance: is a measure of how much two variables
change together.
We use two-way arrow to show covariance.

9
Graphs in AMOS
Rectangle represents observed variable
Circle or eclipse represents unobserved variable
Two-way arrow: covariance or correlation
One-way arrow: unidirectional relationship


10
11
Latent
variable
Latent
variable
Observed
variable
Measurement
Error terms
Covariance
Path
Structural
Error term
Model parameters
Are those characteristics of model unknown to the
researchers.
They have to be estimated from the sample
covariance or correlation matrix.
12
Model parameters
Regression weights/Factor loadings
Structural Coefficient
Variance
Covariance
Each potential parameter in a model must be specified
to be fixed, free, or constrained parameters




13
Model parameters
Free parameters: unknown and need to be estimated.
Fixed parameters: they are not free, but are fixed to a
specified value, either 0 or 1.
Constrained parameters: unknown, but are
constrained to equal one or more other parameters.

14
15
Fixed
Free
If opp_v1 = opp_v2,
they are constrained
parameters
Build SEM models
Model specification: is the exercise of formally stating
a model. Prior to data collection, develop a theoretical
model based on theory or empirical study, etc.
Which variables are included in the model.
How these variables are related.
Misspecified model: due to errors of omission and/or
inclusion of any variable or parameter.




16
Model identification: the model can in theory and in practice
be estimated with observed data.
Under-identified model: if one or more parameters
may not be uniquely determined from observed data.
A model for which it is not possible to estimate all of
the model's parameters.


17
Model identification
Just-identified model(saturated model): if all of the
parameters are uniquely determined. For each free
parameter, a value can be obtained through only one
manipulation of observed data.
The degree of freedom is equal to zero (number of free
parameters exactly equals the number of known values).
Model fits the data perfectly.
Over-identified model: A model for which all the parameters are
identified and for which there are more knowns than free
parameters.
18
Just or over identified model is identified model
If a model is under-identified, additional
constraints may make model identified.
The number of free parameters to be estimated
must be less than or equal to the number of
distinct values in the matrix S.
The number of distinct values in matrix S is equal
to p (p+1)/2, p is the number of observed
variables.

19
How to avoid identification problems
To achieve identification, one of the factor loadings must
be fixed to one. The variable with a fixed loading of one is
called a marker variable or reference item.
This method can solve the scale indeterminacy problem.
There are "enough indicators of each latent variable. A
simple rule that works most of the time is that there need
to be at least two indicators per latent variable and those
indicators' errors are uncorrelated.
Use recursive model
Design a parsimonious model

20
Rules for building SEM model
All variances of independent variables are model
parameters.
All covariances between independent variables are
model parameters.
All factor loadings connecting the latent variables and
their indicators are parameters.
All regression weights between observed or latent
variables are parameters.




21
Rules for building SEM model
The variance and covariances between dependent
variables and covariances between dependent and
independent variables are NOT parameters.
*For each latent variable included in the model, the metric
of its latent scale needs to be set.
For any independent latent variable: a path leaving the
latent variable is set to 1.
*Paths leading from the error terms to their corresponding
observed variables are assumed to be equal to 1.
22
23

Build SEM models: Model estimation
How SEM programs estimate the parameters?
The proposed model makes certain assumptions
about the relationships between the variables in
the model.
The proposed model has specific implications
for the variances and covariances of the
observed variables.
24
How SEM programs estimate the parameters?
We want to estimate the parameters specified in the
model that produce the implied covariance matrix .
We want matrix is as close as possible to matrix S,
sample covariance matrix of the observed variables.
If elements in the matrix S minus the elements in the
matrix is equal to zero, then chi-square is equal to
zero, and we have a perfect fit.

25
How SEM programs estimate the parameters?
In SEM, the parameters of a proposed model are estimated
by minimizing the discrepancy between the empirical
covariance matrix, S, and a covariance matrix implied by
the model, . How should this discrepancy be measured?
This is the role of the discrepancy function.
S is the sample covariance matrix calculated from the
observed data.
is covariance matrix implied by the proposed model or
the reproduced (or model-implied) covariance matrix is
determined by the proposed model.

26
How SEM programs estimate the parameters?
In SEM, if the difference between S and (distance
between matrices) is small, then one can conclude
that the proposed model is consistent with the
observed data.
If the difference between S and is large, one can
conclude that the proposed model doesnt fit the data.
The proposed model is deficient.
The data is not good.

27
Build SEM models
Model estimation
Estimation of parameters.
Estimation process uses a particular fit function
to minimize the difference between S and .
If the difference = 0, one has a perfect model fit
to the data.

28
Model estimation methods
The two most commonly used estimation techniques
are Maximum likelihood (ML) and normal theory
generalized least square (GLS).
ML and GLS: large sample size, continuous data, and
assumption of multivariate normality
Unweighted least squares (ULS): scale dependent.
Asymptotically distribution free (ADF) (Weighted least
squares, WLS): serious departure from normality.


29

30
Assume
normality
No normality
assumed
Model testing
We want to know how well the model fits the data.
If S and are similar, we may say the proposed model
fits the data.
Model fit indices.
For individual parameter, we want to know whether a
free parameter is significantly different from zero.
Whether the estimate of a free parameter makes sense.

31
Chi-square test
Value ranges from zero for a saturated model with all
paths included to a maximum for the independence
model (the null model or model with no parameters
estimated).

32
Build SEM models
Model modification
If the model doesnt fit the data, then we need to modify
the model .
Perform specification search: change the original model
in the search for a better fitting model .

33
Goodness-of-fit tests based on predicted vs.
observed covariances (absolute fit indexes)
Chi-square (CMIN): a non-significant
2
value

indicates
S and are similar.
2
should NOT be significant if
there is a good model fit.
Goodness-of-fit (GFI) and adjusted goodness-of-fit
(AGFI). GFI measures the amount of variance and
covariance in S that is predicted by . AGFI is adjusted
for the degree of freedom of a model relative to the
number of variables.


34
Goodness-of-fit tests based on predicted vs.
observed covariances (absolute fit indexes)
Root-mean-square residual index (RMR). The closer RMR is
to 0, the better the model fit.
Hoelter's critical N, also called the Hoelter index, is used to
judge if sample size is adequate. By convention, sample
size is adequate if Hoelter's N > 200. A Hoelter's N under 75
is considered unacceptably low to accept a model by chi-
square. Two N's are output, one at the .05 and one at the
.01 levels of significance.


35
Information theory goodness of fit: absolute fit
indexes.
Measures in this set are appropriate when comparing models
using maximum likelihood estimation.
AIC,BIC,CAIC,and BCC.
For model comparison, the lower AIC reflects the better-fitting
model. AIC also penalizes for lack of parsimony.
BIC: BIC is the Bayesian Information Criterion. It penalizes for
sample size as well as model complexity. It is recommended when
sample size is large or the number of parameters in the model is
small.
36
Information theory goodness of fit: absolute
fit indexes.
CAIC: an alternative to AICC, also penalizes for sample
size as well as model complexity (lack of parsimony).
The penalty is greater than AIC or BCC but less than
BIC. The lower the CAIC measure, the better the fit.
BCC: It should be close to .9 to consider fit good. BCC
penalizes for model complexity (lack of parsimony)
more than AIC.
37
Goodness-of-fit tests comparing the given model
with a null or an alternative model.
CFI, NFI, NFI
Goodness-of-fit tests penalizing for lack of
parsimony.
parsimony ratio (PRATIO), PNFI, PCFI


38
Scaling and normality assumption
Maximum likelihood and normal theory generalized least
squares assume that the measured variables are
continuous and have a multivariate normal distribution.
In social sciences, we use a lot of variables that are
dichotomous or ordered categories rather than truly
continuous.
In social sciences, it is normal that the distribution of
observed variables departs substantially from multivariate
normality.


39
Scaling and normality assumption
Nominal or ordinal variables should have at least five
categories and not be strongly skewed or kurtotic.
Values of skewness and kurtosis are within -1 and + 1.

40
Problems of non-normality(practical
implications)
Inflated
2
goodness-of-fit statistics.
Make inappropriate modifications in theoretically
adequate models.
Findings can be expected to fail to be replicated and
contributing to confusion in research areas.

41
How to detect normality of observed data?
Screen the data before the data analysis to check the
distributions.
Skewness and kurtosis: univariate normality.
AMOS provides normality results.

42
Solutions to nonnormality
The asymptotically distribution free (ADF) estimation:
ADF produces asymptotically unbiased estimates of
the
2
goodness-of-fit test, parameter estimates, and
standard errors.
Limitation: require large sample size.
43
Solutions to nonnormality
Unweighted least square (ULS): No assumption of
normality and no significance tests available. Scale
dependent.
Bootstrapping: it doesnt rely on normal distribution.
Bayesian estimation: if ordered-categorical data are
modeled.



44
Sample size (Rules of thumb)
10 subjects per variable or 20 subjects per variable
250-500 subjects (Schumacker & Lomax, 2004)




45
Computer programs for SEM
AMOS
EQS
LISERAL
MPLUS
SAS

46
AMOS is short for Analysis of MOment
Structures.
A software used for data analysis known as
structural equation modeling (SEM).
It is a program for visual SEM.

47
Path diagrams
They are the ways to communicate a SEM model.
They are drawing pictures to show the relationships
among latent/observed variables.
In AMOS: rectangles represent observed variables and
eclipses represent latent variables.


48
Examples of using AMOS tool bar to draw a
diagram.
Example
Two latent variables: intention and self-efficacy
Four observed variables: intention01, intention02,
self_efficacy01, and self_efficacy02
Five error terms




49
The model should be like this
50
Go to All programs from Start > IBM SPSS Statistics > IBM
SPSS AMOS19 > AMOS Graphics

51
Latent
variables
Observed
variables
52
Tool bar
Draw observed variables use Rectangle
Draw latent variables use ellipse
Draw error terms use

53
Use duplicate objects to get another part of the
model , then use Reflect
54
55
Open data: File Data Files
56
Click
Your file
Put observed variable names to the graphs
Go to View > Variables in Dataset
Then drag each variable to each rectangle


57
Put latent variables in the graph
Put the mouse over one latent variable and right click
Get this menu
Click Object Properties
Type Self-efficacy here
58
For error terms, double click the ellipse and get Object
Property window.
Constrain parameters: double click a path from Self-efficacy
to Self-efficy01, type 1 for regression weight, then click Close.




59
Click
The data is from AMOS examples (IBM SPSS)
Attig repeated the study with the same 40
subjects after a training exercise intended to
improve memory performance. There were thus
three performance measures before training and
three performance measures after training.
60
Draw diagram
61
Conduct analysis: Analyze > Calculate Estimates
Text output
62
1. Number of distinct sample
moments: sample means,
variances, and covariances
(AMOS ignores means). We
also use 4(4+1)/2 = 10.
2. Number of distinct
parameters to be estimated: 4
variances and 6 covariances.
3. Degrees of freedom:
number of distinct sample
moments minus number of
distinct parameters
Text output
63
There is no null hypothesis
being tested for this example.
The Chi-square result is not
very interesting.

For hypothesis test, the chi-square value is a
measure of the extent to which the data were
incompatible with the hypothesis.
For hypothesis test, the result will be positive
degrees of freedom.
A chi-square value of 0 indicates no departure
from the null hypothesis.
64
Text output
65
Minimum was achieved: this
line indicates that Amos
successfully estimated the
variances and covariances.
When Amos fails, it is
because you have posed a
problem that has no solution,
or no unique solution (model
identification problem).
Text output
66
1. Estimate means covariance: for
example the covariance between
recall1 and recall2 is 2.556.
2. S.E. means an estimate of the
standard error of the covariance,
1.16.
3. C.R. is the critical ratio obtained
by dividing the covariance
estimate by its standard error.
4. For a significance level of 0.05,
critical ratio that exceeds 1.96
would be called significant. This
ratio is relevant to the null
hypothesis that, the covariance
between recall1 and recall2 is 0.
Text output
67
5. In this example, 2.203 is greater
than 1.96, then the covariance
between recall1 and recall2 is
significantly different from 0 at the
0.05 level.
6. P value of 0.028 (two-tailed) is for
testing the null hypothesis that the
parameter value is 0 in the
population.

68

Vous aimerez peut-être aussi