Vous êtes sur la page 1sur 19

Saeed Pahlevan Sharif 1/09/2013

Data Screening and CFA


1

STRUCTURAL EQUATION MODELING (SEM)


& AMOS WORKSHOP
1 ST & 8 TH S E P T E M B E R 2 0 1 3

SAEED PAHLEVAN SHARIF


WWW.SAEEDSHARIF.COM

www.saeedsharif.com Taylor’s Graduate School

Data Screening
2

 Data analysis
 Summarization
 Model fitting
 Testing hypotheses

 Data screening
 Exposure
 Preparation for modeling
 Checking the adequacy of assumptions.

 Your data should be “clean”


 Reliable and valid

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 1
Saeed Pahlevan Sharif 1/09/2013

Necessary Data Screening To Do:


3

 Handle Missing Data


 Address outliers and influential cases.
 Meet multivariate statistical assumptions for
alternative tests

www.saeedsharif.com Taylor’s Graduate School

Problems Resulting from Missing Data


4

 Loss of Information
 Bias
 Power Loss

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 2
Saeed Pahlevan Sharif 1/09/2013

Statistical Problems with Missing Data


5

 Missing much of your data


 Can’t calculate the estimated model.

 EFA, CFA, and path models require a certain minimum


number of data
 Greater model complexity and improved power require larger
samples.

www.saeedsharif.com Taylor’s Graduate School

Logical Problem with Missing Data


6

 Systematic bias due to a common cause (poor


formulation, sensitivity etc).
 Gender Moderator
 Salary
 Etc.

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 3
Saeed Pahlevan Sharif 1/09/2013

Detecting
Missing
Values

www.saeedsharif.com

Handling Missing Data


8

Hair et al.’s (2009) Rules of Thumb:


 Missing data under 10% for an individual case or
observation can generally be ignored, except when the
missing data occurs in a specific nonrandom fashion.
 The number of cases with no missing data must be
sufficient for the selected analysis technique if
replacement values will not be substituted (imputed)
for the missing data.

• DV is missing
• Impute and run models with and without missing data

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 4
Saeed Pahlevan Sharif 1/09/2013

Imputation Methods
(Hair et al. (2009), table 2-2)
9

 Use only valid data


 No imputation, just use valid cases or variables
 In SPSS: Exclude Pairwise (variable), Listwise (case)

 Use known replacement values


 Match missing value with similar case’s value

 Use calculated replacement values


 Use variable mean, median, or mode
 Regression based on known relationships

 Model based methods


 Iterative two step estimation of value and descriptives to find
most appropriate replacement value

www.saeedsharif.com Taylor’s Graduate School

Imputation in SPSS
10

2. Include each variable


that has values that need
imputing

3. For each variable you can


choose the new name (for the
imputed column) and the type
of imputation
www.saeedsharif.com

SEM & AMOS Workshop 5
Saeed Pahlevan Sharif 1/09/2013

Imputation Advantages Disadvantages Best Used When:


Method
Mean • Easily implemented • Reduces variance of the • Relative low levels of
Substitution • Provides all cases with distribution missing data
complete information • Distorts
11 distribution of the • Relatively strong
data relationships among
• Depresses observed variables
correlations
Regression • Employs actual • Reinforces existing • Moderate to high
Imputation relationships among the relationships and reduces levels of missing data
variables generalizability • Relationships
• Replacement values • Must have sufficient sufficiently established
calculated based on an relationships among so as to not impact
observation’s own values variables to generate valid generalizability
on other variables. predicted values.
• Unique set of predictors • Understates variance
can be used for each unless error term added to
variable with missing replacement value.
data. • Replacement values may
be “out of range”
Model-Based • Accommodates both • Complex model • Only method that can
Methods nonrandom and random specification by researcher accommodate
missing data processes • Requires specialized nonrandom missing
• Best representation of software data process
original distribution of • Typically not available in • High levels of missing
values with least bias. software programs (except data require least
EM method in SPSS) biased method to
ensure generalizability
www.saeedsharif.com Taylor’s Graduate School

Best Method – Prevention!


12

 Short surveys (pre testing critical!)


 Easy to understand and answer survey items
 Force completion (incentives, technology)
 Bribe/motivate (iPad drawing)
 Digital surveys (rather than paper)
 Put dependent variables at the beginning of
the survey!

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 6
Saeed Pahlevan Sharif 1/09/2013

Outliers and Influentials


13

 Outliers can influence your results, pulling the mean


away from the median.
 Outliers also affect distributional assumptions and
often reflect false or mistaken responses
 Two type of outliers:
 outliers for individual variables (univariate)
 Extreme values for a single variable
 outliers for the model (multivariate)
 Extreme (uncommon) values for a correlation

www.saeedsharif.com Taylor’s Graduate School

Detecting Univariate Outliers


14

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 7
Saeed Pahlevan Sharif 1/09/2013

Detecting Univariate Outliers


15

50%
should
Mean 99%
fall within
the box should
fall within
this range

Outliers!

Handling Univariate Outliers


16

 Should be examined on a case by case basis.


 If the outlier is truly abnormal, and not
representative of your population, then it is okay to
remove. But this requires careful examination of
the data points
 e.g., you are studying dogs, but somehow a cat got ahold of
your survey
 e.g., someone answered “1” for all 75 questions on the survey

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 8
Saeed Pahlevan Sharif 1/09/2013

Detecting Multivariate Outliers


17

 Multivariate outliers refer to sets of data points that


do not fit the standard sets of correlations exhibited
by the other data points in the dataset with regards
to your causal model.
 Exercise and Weight loss
 Mahalanobis d-squared.

www.saeedsharif.com Taylor’s Graduate School

These are
Anything less than .05 in the
row
p1 column is abnormal, and
numbers
is candidate for inspection
from SPSS
18

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 9
Saeed Pahlevan Sharif 1/09/2013

Handling Multivariate Outliers


19

 Create a new variable in SPSS called “Outlier”


 Code 0 for Mahalanobis > .05

 Code 1 for Mahalanobis < .05

 AMOS: “Outlier” as a grouping variable


 This then runs your model with only non-outliers

www.saeedsharif.com Taylor’s Graduate School

Before and after removing outliers


20
N=340 N=295

BEFORE AFTER
Even after you remove outliers, the Mahalanobis will come up with a whole new set of outliers, so
www.saeedsharif.com
these should be checked on a case by case basis, using the Mahalanobis as a guide for inspection.

SEM & AMOS Workshop 10
Saeed Pahlevan Sharif 1/09/2013

“Best Practice” for outliers


21

 It is a bad idea to remove outliers, unless they are


truly “abnormal” and do not represent accurate
observations from the population.
 Removing outliers is risky
 Generalizability

www.saeedsharif.com Taylor’s Graduate School

Normality
22

 PLS or binomial regressions do not require such


assumptions
 t tests and F tests assume normal distributions
 Normality is assessed in many ways: shape,
skewness, and kurtosis (flat/peaked).
 Normality issues affect small sample sizes (<50)
much more than large sample sizes (>200)

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 11
Saeed Pahlevan Sharif 1/09/2013

Bimodal Flat
23 Shape

Skewness

Kurtosis

www.saeedsharif.com Taylor’s Graduate School

Fixing Normality Issues


24

 Fix flat distribution with:


 Inverse: 1/X

 Fix negative skewed distribution with:


 Squared: X*X

 Cubed: X*X*X

 Fix positive skewed distribution with:


 Square root: SQRT(X)

 Logarithm: LG10(X)

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 12
Saeed Pahlevan Sharif 1/09/2013

Normality in AMOS
25

–Refer to the “Assessment of normality” in the


Text View output
–Data is considered to be normal if:
:: Skewness is
between -3 to +3

:: Kurtosis is
between -7 to +7

www.saeedsharif.com Taylor’s Graduate School

What is Structural Equations Modeling (SEM)?


26

 Two components:
 Measurement model (CFA) = A visual representation that specifies
the model’s constructs, indicator variables, and interrelationships.
CFA provides quantitative measures of the reliability and validity of
the constructs.
 Structural model (SEM) = A set of dependence relationships linking
the hypothesized model’s constructs. SEM determines whether
relationships exist between the constructs – and along with CFA
enables you to accept or reject your theory.

 Developing CFA and SEM models and developing


hypotheses:
 Theory
 Prior experience

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 13
Saeed Pahlevan Sharif 1/09/2013

What is the Difference between EFA and CFA?


27

 EFA (Exploratory Factor Analysis):


 Use the data to determine the underlying structure.

 CFA (Confirmatory Factor Analysis):


1) Specify the factor structure on the basis of a ‘good’ theory
2) Use CFA to determine whether there is empirical support for
the proposed theoretical factor structure.

www.saeedsharif.com Taylor’s Graduate School

CFA
28

 The major objective in CFA is determining if the relationships


between the variables in the hypothesized model resemble the
relationships between the variables in the observed data set.
 More formally: the analysis determines the extent to which
the proposed covariance matches the observed covariance.
 CFA assesses how well the predicted interrelationships
between the variables match the interrelationships between
the actual or observed interrelationships. If the two matrices
(the proposed and the actual) are consistent with one another,
then the model can be considered a credible explanation for
the hypothesized relationships.
 CFA provides quantitative measures that assess the validity
and reliability of theoretical model

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 14
Saeed Pahlevan Sharif 1/09/2013

29

Practice
www.saeedsharif.com Taylor’s Graduate School

Recommended Criteria for Fit Indices

30

SEM & AMOS Workshop 15
Saeed Pahlevan Sharif 1/09/2013

Which Fit Measures to Report?


31

 Jaccard and Wan (1996) is one of often-cited


recommendation: reporting at least three fit tests- one
absolute, one relative, and one parsimonious- to reflect
diverse criteria.
 Recently: Kline (2005) and Thompson (2004):
recommend fit measures without reference to their
classification.
 Meyers et al: Reporting chi square, NFI, CFI, RMSEA.
Although chi square is less informative as an assessment
of a single model, it is useful in comparing nested models
and the model with lower chi square value is considered
to be preferable model.

www.saeedsharif.com Taylor’s Graduate School

Model Fit
32
 Factor Loading:
Some of researchers believe that they must be more than 0.7, otherwise they must
be excluded from the model and we report that these items are not good
indicators for it.
Based on Garson we accept factor loading greater than 0.5.

• How many indicators per factor?


2 is the minimum
3 is safer, especially if factor correlations are weak
4 provides safety
5 or more is more than enough (If too many indicators then combine
indicators into sets)

 Normality Test:
Based on Barbara’s book -3 < Skewness < 3 and -7 < Kurtosis < 7 are
acceptable and we consider them Normal. Otherwise the item that cannot meet
these conditions will be removed from the model.

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 16
Saeed Pahlevan Sharif 1/09/2013

Model Fit
33

 Model Fit:
According to Robert Ho’s book, we need at least three indices to be met
to claim that the model is fit.
GFI, CFI … > 0.9 are OK. (Near 0.9 is acceptable as well).
P-value for CMIN table (Chi-Square) > 0.05 is OK because we want to
prove the null hypothesis here.
Robert Ho, Page 285: RMSEA < 0.05 is excellent. 0.05 < RMSEA <
0.08 is good. 0.08 < RMSEA < 1 is moderate and RMSEA > 1 is weak.
We should report three satisfied indices and also RMSEA and Chi-
Square (CMIN), even these two items are not satisfied.

 The correlation between latent variables must be less than


0.9; otherwise we will combine those two high correlated
latent variables because actually they are measuring the same
thing!
So, based on Barbara we take them on the second order.

www.saeedsharif.com Taylor’s Graduate School

Modification Indices
34

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 17
Saeed Pahlevan Sharif 1/09/2013

Residuals
35

 A significant standardized residual is one with an absolute


value greater than 4.0. Significant residuals significantly decrease
your model fit. Fixing model fit per the residuals matrix is similar to
fixing model fit per the modification indices. The same rules apply.

www.saeedsharif.com Taylor’s Graduate School

Construct Validity
36

 If you have convergent validity issues, then your variables do not correlate

well with each other within their parent factor; i.e. the latent factor is not well
explained by its observed variables.

 If you have discriminant validity issues, then your variables correlate more

highly with variables outside their parent factor than with the variables within
their parent factor; i.e., the latent factor is better explained by some other
variables (from a different factor), than by its own observed variables.

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 18
Saeed Pahlevan Sharif 1/09/2013

Validity and Reliability


37
 It is absolutely necessary to establish convergent and discriminant validity, as
well as reliability, when doing a CFA. If your factors do not demonstrate
adequate validity and reliability, moving on to test a causal model will be
useless - garbage in, garbage out!
 There are a few measures that are useful for establishing validity and reliability:
Reliability
 CR > 0.7
CR : Composite Reliability
Convergent Validity AVE : Average Variance Explained
 CR > AVE MSV : Maximum Shared Squared Variance
ASV : Average Shared Squared Variance
 AVE > 0.5
Discriminant Validity
 MSV < AVE
 ASV < AVE

For more information visit www.SaeedSharif.com

www.saeedsharif.com Taylor’s Graduate School

38

 Andrew Hayes
 Andy Field
 Bahaman Abu Samah
 James Gaskin
 Joseph Hair et al.
 Lawrence S. Meyers et al
 Robert Ho
 Saeed Pahlevan Sharif

www.saeedsharif.com Taylor’s Graduate School

SEM & AMOS Workshop 19