Académique Documents
Professionnel Documents
Culture Documents
What is it? General method for modeling = () i.e. for modeling covariance structure Intuitively can be thought of as the combination of conrmatory factor analysis (CFA) with path analysis. CFA and path analysis are each special cases of structural equation modeling What is it good for? To account for measurement error in modeling relationships between variables measured with error (i.e. latent variables) As it uses the graphical notation of path analysis, it provides a method for describing the assumed causal relationships between observed variables, between observed and latent, and between latent and latent. To take advantage of multicollinearity in a set of predictors rather than seeing it as a hinderance.
1
This study examines the growth and development of structural equation modeling (SEM) from the years 1994 to 2001. The synchronous development and growth of the Structural Equation Modeling journal was also examined. Abstracts located on PsycINFO were used as the primary source of data. The major results of this investigation were clear: (a) The number of journal articles concerned with SEM increased; (b) the number of journals publishing these articles increased; (c) SEM acquired hegemony among multivariate techniques; and (d) Structural Equation Modeling became the primary source of publication for technical developments in SEM.
FIGURE 1
Cov(f1 , f2 ) 1 + 1 2 + 2 Corr(f1 , f2 ) 1 2 1 + 1 2 + 2
reliability of x reliability of x
1
1 1 + 1
2 2 + 2
Correlation between x1 and x2 will be smaller than the true correlation between the variables we are interested in f1 and f2
5
observed correlation of .5 .2 .4 .6 .8 1.0 .2 .4 - .88 .79 .6 - .83 .72 .65 .8 - .88 .72 .63 .56 1.0 - .79 .65 .56 .50
x3 x5 x10
e1 e2 e3
f 1
. 62 . 85
. 54
x1
. 92 . 58
e4 e5 e6 e7
x7 x8
f 2
. 58 . 57
x9
Notice that it overadjusted, this estimate is actually larger than the true correlation of .54. This may be expected since Cronbachs alpha underestimates reliability when factor loadings are not equal.
9
4.89
13.84
scale1
scale1
scale2
scale2
The variance of scale 1 is 4.89, the variance of scale2 is 13.84. Notice that the simple correlation between the scales is .45 which is smaller than .54.
10
e1
1
3.44616
e2
1
e1
1
e2
1
scale1
1
scale2
1
scale1
1
scale2
1
f1
f2
f1
f2
e1
.42
e2
.50
scale1
scale2
f1
f2
.57
Notice the correlation now has been adjusted for the unreliability
11
12
13
14
Instead, use full SEM - Incorporate CFA into the Path Analysis - thus accounting for measurement error
15
Figure 4. Final model testing among adolescent girls: Correlates of unhealthy weight-control behaviors. BMI body mass index. * p .01.
Figure 5. Final model testing among adolescent boys: Correlates of unhealthy weight-control behaviors. BMI body mass index. * p .01.
16
17
TUTORIAL IN BIOSTATISTICS Comparison of multiple regression to two latent variable techniques for estimation and prediction
Melanie M. Wall1; ; ; and Ruifeng Li2;
1
Division of Biostatistics; School of Public Health; University of Minnesota; Minneapolis; MN 55455; U.S.A. 2 Department of Epidemiology; School of Public Health; Harvard University; U.S.A.
SUMMARY In the areas of epidemiology, psychology, sociology, and other social and behavioural sciences, researchers often encounter situations where there are not only many variables contributing to a particular phenomenon, but there are also strong relationships among many of the predictor variables of interest. By using the traditional multiple regression on all the predictor variables, it is possible to have problems with interpretation and multicollinearity. As an alternative to multiple regression, we explore the use of a latent variable model that can address the relationship among the predictor variables. We consider two di erent methods for estimation and prediction for this model: one that uses multiple regression on factor score estimates and the other that uses structural equation modelling. The rst method uses multiple regression but on a set of predicted underlying factors (i.e. factor scores), and the second method is a full-information maximum-likelihood technique that incorporates the complete covariance structure of the data. In this tutorial, we will explain the model and each estimation method, including how to carry out prediction. A data example will be used for demonstration, where respiratory disease death rates by county in Minnesota are predicted by ve county-level census variables. A simulation study is performed to evaluate the e ciency of prediction using the two latent variable modelling techniques compared to multiple regression. Copyright ? 2003 John Wiley & Sons, Ltd.
KEY WORDS:
18
19
20
educhs
. 03 . 01 . 00 . 04 . 00 . 05 . 02 -02 . -01 . -02 . . 04 -02 . . 11 . 11 . 33 . 53 . 03 . 07
m edhhi nc
e1 1 r espm or t
per capi n
pubw at er
. 01
1. 29
w ood
resp = 0 + 1 eduhs + 2 medhhin + 3 percapit + 4 pubwater + 5 wood +
21
22
Regression W eights respmort respmort respmort respmort respmort Estimate <-- wood 1.294362 <-- pubwater 0.108254 <-- percapin -0.017170 <-- medhhinc 0.072829 <-- educhs 0.528606 S.E. 0.452701 0.228837 0.321008 0.098331 0.718153 C.R. 2.859198 0.473061 -0.053488 0.740651 0.736064 P 0.004247 0.636170 0.957343 0.458905 0.461692
23
m edhhi nc
. 20
e1
. 23
per capi n
. 10
r espm or t
pubw at er
. 58
w ood
Interpretation Problem: highly correlated - multicollinearity
24
Sample Correlations-Estimates educhs medhhinc educhs 1.000000 0.841414 medhhinc 0.841414 1.000000 percapin 0.859896 0.925435 pubwater 0.355381 0.433244 wood -0.304913 -0.425844 respmort 0.157416 0.102968
26
e6
. 26 . 03 2. 76 1 e3 m edhhi 1. nc 00 . 00 1
. 00 1
educhs
. 04
ses
. 40
. 03
e1
e2
per capi n
r espm or t
. 02 -63 . . 01 1 . 00 1 . 03 1. 00 -49 .
e5 pubw at er e4 w ood
Regression W eights percapin medhhinc wood pubwater educhs respmort respmort <-<-<-<-<-<-<-Estimate ses 1.000000 ses 2.764592 accessto_util -0.489196 accessto_util 1.000000 ses 0.256296 accessto_util -0.628848 ses 0.398241 S.E. C.R. P
0.133679 20.680802 0.000000 0.043761 -11.178832 0.000000 0.016835 15.224053 0.000000 0.139034 -4.522980 0.000006 0.124171 3.207206 0.001340
28
e6 e3
. 88 . 89 . 95 m edhhi . nc 98 . 96
educhs
ses
. 37
e1
. 23
e2 per n capi
. 51 -54 . . 85
r espm or t
e5 pubwat er e4 wood
. 92
. -95 90.
29
SEM-FIML Estimate (s.e.) 7:85 (0.16) 0.00004 (0.00001) 0.013 (0.003) 0.21 P-value 0:0001 0.0010 0.0001
30
e2
per capi n
. 81
1. 03 . 90
e3 m edhhi nc
. 56 . 15
ses
. 35
e1
. 19
r espm or t
-51 . . 39 . 95
e6 e5 e4
educhs
. 90
pubw at er w ood
. 84 -92 .
33
1.000 0.489
0.000 0.044
0.000 11.244
0.181 0.089
0.923 0.947
1.554 -0.629
0.488 0.138
3.182 -4.549
0.078 -0.114
0.373 -0.545
0.003 0.033
0.000 0.006
5.192 5.288
1.000 1.000
1.000 1.000
34
35
36
model: ses by eduhs medhhin percapit; ruralness by pubwater wood1; intsesru | ses xwith ruralness; <------------ intsesru is newly defined variable using the xwith command to create cross-product of latent ses and latent ruralness <------------ simply include intsesru as another variable in the structural model
37
1.000 0.477
0.000 0.062
0.000 7.709
0.005
0.001
3.550
.003 .034
.001 .007
4.766 4.843
38
Wall, M.M. and Amemiya, Y, (2001) Generalized appended product indicator procedure for nonlinear structural equation analysis. Journal of Educational and Behavioral Statistics, 26, 1-29. Wall, M.M. and Amemiya, Y, (2003) A method of moments technique for fitting interaction effects in structural equation models, British Journal of Mathematical and Statistical Psychology, 56, 47-64. Wall M.M. and Amemiya, Y. (2007) A review of nonlinear factor analysis and nonlinear structural equation modeling In Factor Analysis at 100: Historical Developments and Future Directions, eds. Robert Cudeck and Robert C. MacCallum, Chapter 16 pp 337-362, Lawrence Erlbaum Associates. Wall M.M. and Amemiya, Y. (2007) Nonlinear structural equation modeling as a statistical method In Handbook of Latent Variable and related Models, ed Sik-Yum Lee, Chapter 15, 321-344, Elsevier, The Netherlands.
40
Multilevel modeling
Data collection involves: patients within clinics, students within classrooms, employees within units, repeated measures within patient (i.e. longitudinal data). In each case the grouping or clustering variable is: clinics, classrooms, units, patient.
from Heck (2001) Multilevel Modeling in SEM, Chapter 4, New Developments and Techniques in SEM, eds Marcoulides and Schumacker, 89-127
Ignoring the presence of substantial similarities among individuals within groups can result in substantially biased estimates of the models parameters, standard errors, and t indexes.
41
Thus indicates the proportion of the total variability that can be attributed to variability between the groups. The should be zero when the data are independent - thus, its magnitude depends on characteristics of the variable measured and the attributes of the groups. The larger the intraclass correlation, the larger the distortion in parameter estimation that results from ignoring this similarity. Note it is typically assumed that dierent groups are independent of one another.
42