Vous êtes sur la page 1sur 19

# Saeed Pahlevan Sharif 1/09/2013

1

## STRUCTURAL EQUATION MODELING (SEM)

& AMOS WORKSHOP
1 ST & 8 TH S E P T E M B E R 2 0 1 3

## SAEED PAHLEVAN SHARIF

WWW.SAEEDSHARIF.COM

Data Screening
2

 Data analysis
 Summarization
 Model fitting
 Testing hypotheses

 Data screening
 Exposure
 Preparation for modeling
 Checking the adequacy of assumptions.

##  Your data should be “clean”

 Reliable and valid

SEM & AMOS Workshop 1
Saeed Pahlevan Sharif 1/09/2013

3

##  Handle Missing Data

 Address outliers and influential cases.
 Meet multivariate statistical assumptions for
alternative tests

## Problems Resulting from Missing Data

4

 Loss of Information
 Bias
 Power Loss

SEM & AMOS Workshop 2
Saeed Pahlevan Sharif 1/09/2013

5

##  Missing much of your data

 Can’t calculate the estimated model.

##  EFA, CFA, and path models require a certain minimum

number of data
 Greater model complexity and improved power require larger
samples.

6

##  Systematic bias due to a common cause (poor

formulation, sensitivity etc).
 Gender Moderator
 Salary
 Etc.

SEM & AMOS Workshop 3
Saeed Pahlevan Sharif 1/09/2013

Detecting
Missing
Values

www.saeedsharif.com

8

## Hair et al.’s (2009) Rules of Thumb:

 Missing data under 10% for an individual case or
observation can generally be ignored, except when the
missing data occurs in a specific nonrandom fashion.
 The number of cases with no missing data must be
sufficient for the selected analysis technique if
replacement values will not be substituted (imputed)
for the missing data.

• DV is missing
• Impute and run models with and without missing data

SEM & AMOS Workshop 4
Saeed Pahlevan Sharif 1/09/2013

Imputation Methods
(Hair et al. (2009), table 2-2)
9

##  Use only valid data

 No imputation, just use valid cases or variables
 In SPSS: Exclude Pairwise (variable), Listwise (case)

##  Use known replacement values

 Match missing value with similar case’s value

##  Use calculated replacement values

 Use variable mean, median, or mode
 Regression based on known relationships

##  Model based methods

 Iterative two step estimation of value and descriptives to find
most appropriate replacement value

Imputation in SPSS
10

## 2. Include each variable

that has values that need
imputing

## 3. For each variable you can

choose the new name (for the
imputed column) and the type
of imputation
www.saeedsharif.com

SEM & AMOS Workshop 5
Saeed Pahlevan Sharif 1/09/2013

Method
Mean • Easily implemented • Reduces variance of the • Relative low levels of
Substitution • Provides all cases with distribution missing data
complete information • Distorts
11 distribution of the • Relatively strong
data relationships among
• Depresses observed variables
correlations
Regression • Employs actual • Reinforces existing • Moderate to high
Imputation relationships among the relationships and reduces levels of missing data
variables generalizability • Relationships
• Replacement values • Must have sufficient sufficiently established
calculated based on an relationships among so as to not impact
observation’s own values variables to generate valid generalizability
on other variables. predicted values.
• Unique set of predictors • Understates variance
can be used for each unless error term added to
variable with missing replacement value.
data. • Replacement values may
be “out of range”
Model-Based • Accommodates both • Complex model • Only method that can
Methods nonrandom and random specification by researcher accommodate
missing data processes • Requires specialized nonrandom missing
• Best representation of software data process
original distribution of • Typically not available in • High levels of missing
values with least bias. software programs (except data require least
EM method in SPSS) biased method to
ensure generalizability

12

##  Short surveys (pre testing critical!)

 Easy to understand and answer survey items
 Force completion (incentives, technology)
 Digital surveys (rather than paper)
 Put dependent variables at the beginning of
the survey!

SEM & AMOS Workshop 6
Saeed Pahlevan Sharif 1/09/2013

13

##  Outliers can influence your results, pulling the mean

away from the median.
 Outliers also affect distributional assumptions and
often reflect false or mistaken responses
 Two type of outliers:
 outliers for individual variables (univariate)
 Extreme values for a single variable
 outliers for the model (multivariate)
 Extreme (uncommon) values for a correlation

## Detecting Univariate Outliers

14

SEM & AMOS Workshop 7
Saeed Pahlevan Sharif 1/09/2013

15

50%
should
Mean 99%
fall within
the box should
fall within
this range

Outliers!

16

##  Should be examined on a case by case basis.

 If the outlier is truly abnormal, and not
representative of your population, then it is okay to
remove. But this requires careful examination of
the data points
 e.g., you are studying dogs, but somehow a cat got ahold of
 e.g., someone answered “1” for all 75 questions on the survey

SEM & AMOS Workshop 8
Saeed Pahlevan Sharif 1/09/2013

17

##  Multivariate outliers refer to sets of data points that

do not fit the standard sets of correlations exhibited
by the other data points in the dataset with regards
 Exercise and Weight loss
 Mahalanobis d-squared.

These are
Anything less than .05 in the
row
p1 column is abnormal, and
numbers
is candidate for inspection
from SPSS
18

SEM & AMOS Workshop 9
Saeed Pahlevan Sharif 1/09/2013

19

##  Create a new variable in SPSS called “Outlier”

 Code 0 for Mahalanobis > .05

##  AMOS: “Outlier” as a grouping variable

 This then runs your model with only non-outliers

## Before and after removing outliers

20
N=340 N=295

BEFORE AFTER
Even after you remove outliers, the Mahalanobis will come up with a whole new set of outliers, so
www.saeedsharif.com
these should be checked on a case by case basis, using the Mahalanobis as a guide for inspection.

SEM & AMOS Workshop 10
Saeed Pahlevan Sharif 1/09/2013

21

##  It is a bad idea to remove outliers, unless they are

truly “abnormal” and do not represent accurate
observations from the population.
 Removing outliers is risky
 Generalizability

Normality
22

##  PLS or binomial regressions do not require such

assumptions
 t tests and F tests assume normal distributions
 Normality is assessed in many ways: shape,
skewness, and kurtosis (flat/peaked).
 Normality issues affect small sample sizes (<50)
much more than large sample sizes (>200)

SEM & AMOS Workshop 11
Saeed Pahlevan Sharif 1/09/2013

Bimodal Flat
23 Shape

Skewness

Kurtosis

24

 Inverse: 1/X

 Squared: X*X

 Cubed: X*X*X

##  Fix positive skewed distribution with:

 Square root: SQRT(X)

 Logarithm: LG10(X)

SEM & AMOS Workshop 12
Saeed Pahlevan Sharif 1/09/2013

Normality in AMOS
25

## –Refer to the “Assessment of normality” in the

Text View output
–Data is considered to be normal if:
:: Skewness is
between -3 to +3

:: Kurtosis is
between -7 to +7

## What is Structural Equations Modeling (SEM)?

26

 Two components:
 Measurement model (CFA) = A visual representation that specifies
the model’s constructs, indicator variables, and interrelationships.
CFA provides quantitative measures of the reliability and validity of
the constructs.
 Structural model (SEM) = A set of dependence relationships linking
the hypothesized model’s constructs. SEM determines whether
relationships exist between the constructs – and along with CFA
enables you to accept or reject your theory.

##  Developing CFA and SEM models and developing

hypotheses:
 Theory
 Prior experience

SEM & AMOS Workshop 13
Saeed Pahlevan Sharif 1/09/2013

27

##  EFA (Exploratory Factor Analysis):

 Use the data to determine the underlying structure.

##  CFA (Confirmatory Factor Analysis):

1) Specify the factor structure on the basis of a ‘good’ theory
2) Use CFA to determine whether there is empirical support for
the proposed theoretical factor structure.

CFA
28

##  The major objective in CFA is determining if the relationships

between the variables in the hypothesized model resemble the
relationships between the variables in the observed data set.
 More formally: the analysis determines the extent to which
the proposed covariance matches the observed covariance.
 CFA assesses how well the predicted interrelationships
between the variables match the interrelationships between
the actual or observed interrelationships. If the two matrices
(the proposed and the actual) are consistent with one another,
then the model can be considered a credible explanation for
the hypothesized relationships.
 CFA provides quantitative measures that assess the validity
and reliability of theoretical model

SEM & AMOS Workshop 14
Saeed Pahlevan Sharif 1/09/2013

29

Practice

## Recommended Criteria for Fit Indices

30

SEM & AMOS Workshop 15
Saeed Pahlevan Sharif 1/09/2013

31

##  Jaccard and Wan (1996) is one of often-cited

recommendation: reporting at least three fit tests- one
absolute, one relative, and one parsimonious- to reflect
diverse criteria.
 Recently: Kline (2005) and Thompson (2004):
recommend fit measures without reference to their
classification.
 Meyers et al: Reporting chi square, NFI, CFI, RMSEA.
Although chi square is less informative as an assessment
of a single model, it is useful in comparing nested models
and the model with lower chi square value is considered
to be preferable model.

Model Fit
32
Some of researchers believe that they must be more than 0.7, otherwise they must
be excluded from the model and we report that these items are not good
indicators for it.

## • How many indicators per factor?

2 is the minimum
3 is safer, especially if factor correlations are weak
4 provides safety
5 or more is more than enough (If too many indicators then combine
indicators into sets)

 Normality Test:
Based on Barbara’s book -3 < Skewness < 3 and -7 < Kurtosis < 7 are
acceptable and we consider them Normal. Otherwise the item that cannot meet
these conditions will be removed from the model.

SEM & AMOS Workshop 16
Saeed Pahlevan Sharif 1/09/2013

Model Fit
33

 Model Fit:
According to Robert Ho’s book, we need at least three indices to be met
to claim that the model is fit.
GFI, CFI … > 0.9 are OK. (Near 0.9 is acceptable as well).
P-value for CMIN table (Chi-Square) > 0.05 is OK because we want to
prove the null hypothesis here.
Robert Ho, Page 285: RMSEA < 0.05 is excellent. 0.05 < RMSEA <
0.08 is good. 0.08 < RMSEA < 1 is moderate and RMSEA > 1 is weak.
We should report three satisfied indices and also RMSEA and Chi-
Square (CMIN), even these two items are not satisfied.

##  The correlation between latent variables must be less than

0.9; otherwise we will combine those two high correlated
latent variables because actually they are measuring the same
thing!
So, based on Barbara we take them on the second order.

Modification Indices
34

SEM & AMOS Workshop 17
Saeed Pahlevan Sharif 1/09/2013

Residuals
35

##  A significant standardized residual is one with an absolute

value greater than 4.0. Significant residuals significantly decrease
your model fit. Fixing model fit per the residuals matrix is similar to
fixing model fit per the modification indices. The same rules apply.

Construct Validity
36

 If you have convergent validity issues, then your variables do not correlate

well with each other within their parent factor; i.e. the latent factor is not well
explained by its observed variables.

 If you have discriminant validity issues, then your variables correlate more

highly with variables outside their parent factor than with the variables within
their parent factor; i.e., the latent factor is better explained by some other
variables (from a different factor), than by its own observed variables.

SEM & AMOS Workshop 18
Saeed Pahlevan Sharif 1/09/2013

## Validity and Reliability

37
 It is absolutely necessary to establish convergent and discriminant validity, as
well as reliability, when doing a CFA. If your factors do not demonstrate
adequate validity and reliability, moving on to test a causal model will be
useless - garbage in, garbage out!
 There are a few measures that are useful for establishing validity and reliability:
Reliability
 CR > 0.7
CR : Composite Reliability
Convergent Validity AVE : Average Variance Explained
 CR > AVE MSV : Maximum Shared Squared Variance
ASV : Average Shared Squared Variance
 AVE > 0.5
Discriminant Validity
 MSV < AVE
 ASV < AVE

38

 Andrew Hayes
 Andy Field
 Bahaman Abu Samah