Vous êtes sur la page 1sur 42

Structural Equation Modeling (Hybrid Models)

What is it? General method for modeling = () i.e. for modeling covariance structure Intuitively can be thought of as the combination of conrmatory factor analysis (CFA) with path analysis. CFA and path analysis are each special cases of structural equation modeling What is it good for? To account for measurement error in modeling relationships between variables measured with error (i.e. latent variables) As it uses the graphical notation of path analysis, it provides a method for describing the assumed causal relationships between observed variables, between observed and latent, and between latent and latent. To take advantage of multicollinearity in a set of predictors rather than seeing it as a hinderance.
1

SEM - becoming ubiquitous


STRUCTURAL EQUATION MODELING, 10(1), 3546 Copyright 2003, Lawrence Erlbaum Associates, Inc.

The Growth of Structural Equation Modeling: 19942001


Scott L. Hershberger
Department of Psychology California State University, Long Beach

This study examines the growth and development of structural equation modeling (SEM) from the years 1994 to 2001. The synchronous development and growth of the Structural Equation Modeling journal was also examined. Abstracts located on PsycINFO were used as the primary source of data. The major results of this investigation were clear: (a) The number of journal articles concerned with SEM increased; (b) the number of journals publishing these articles increased; (c) SEM acquired hegemony among multivariate techniques; and (d) Structural Equation Modeling became the primary source of publication for technical developments in SEM.

SEM - becoming ubiquitous

FIGURE 1

Distribution of number of articles and journals by year.

SEM - becoming ubiquitous


Over 1100 Selected Publications that Cite Amos for Structural Equation Modeling March, 2004, http://www.amosdevelopment.com/

Eect of ignoring measurement error


Let x1 = f1 + 1 and x2 = f2 + 2 where V ar( 1 ) = 1 , V ar( 2 ) = 2 , V ar(f1 ) = 1 , V ar(f2 ) = 2 , and Corr(f1 , f2 ) = . That is, the true correlation between the variables of interest f1 and f2 is . Now, as we did not observe f1 and f2 directly we will instead have to deal with the observed x1 and x2 . What is the Corr(x1 , x2 )? Is it close to the Corr(f1 , f2 ) = ? Corr(x1 , x2 ) = = = = = x (V Cov(x,(V )arx ) arx )
1 2 1 2

Cov(f1 , f2 ) 1 + 1 2 + 2 Corr(f1 , f2 ) 1 2 1 + 1 2 + 2

reliability of x reliability of x
1

1 1 + 1

2 2 + 2

Correlation between x1 and x2 will be smaller than the true correlation between the variables we are interested in f1 and f2
5

SEM takes the measurement error into account


Rather than taking scales with less than perfect reliability and using them as if they are perfect measurements of the latent variable, SEM models incorporates the measurement error and thus adjusts the correlations and path coecients appropriately. Assuming the model specication is correct (as usual). Two nice papers discussing this: Charles EP (2005) The Correction for Attenuation Due to Measurement Error: Clarifying Concepts and Creating Condence Sets, Psychological Methods 10(2) 206-226. DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models. Psychological M ethods, 3(4), 412-423.

Examples of correction for attenuation


observed .2 .4 .2 .4 - .75 .6 .87 .61 .8 .75 .53 1.0 .67 .47 correlation of .3 .6 .8 1.0 .87 .75 .67 .61 .53 .47 .50 .43 .39 .43 .38 .33 .39 .33 .30

observed correlation of .5 .2 .4 .6 .8 1.0 .2 .4 - .88 .79 .6 - .83 .72 .65 .8 - .88 .72 .63 .56 1.0 - .79 .65 .56 .50

A useful example from CFA


. 93

x3 x5 x10

e1 e2 e3

f 1

. 62 . 85

. 54

x1
. 92 . 58

e4 e5 e6 e7

x7 x8

f 2

. 58 . 57

x9

Chi-square = 9.8 d.f. = 13, p-value = .704, Corr(f1 , f2 ) = .54


8

A useful example from CFA


A natural/practical thing to do with these 7 variables is to create two scales. One created from X3, X5, X10, that is Scale1 = X3 + X5 + X10, and one created from X1, X7, X8, and X9, that is Scale2 = X1 + X7 + X8 + X9 Then we can calculate the observed correlation between Scale1 and Scale2 is .45. Obviously this is smaller than the correlation found between the factors using CFA (i.e. SEM). Note that the Cronbachs alpha for Scale 1 is 0.827 and for Scale 2 is 0.751. Might consider xing up the correlation between the scales by their estimated reliabilities. That is, rewriting derivation from two pages ago we have that (also page 197 of Kline) = So can calculate
.45 .827 .751

Corr(x1 , x2 ) reliability ofx1 reliability of x2 = .571

Notice that it overadjusted, this estimate is actually larger than the true correlation of .54. This may be expected since Cronbachs alpha underestimates reliability when factor loadings are not equal.
9

Using a single indicator of a latent factor to adjust for unreliability


3.68
.45

4.89

13.84

scale1

scale1
scale2

scale2

The variance of scale 1 is 4.89, the variance of scale2 is 13.84. Notice that the simple correlation between the scales is .45 which is smaller than .54.

10

Using a single indicator of a latent factor to adjust for unreliability


To adjust for the unreliability, x the variance of the error terms to be equal to Variance of scale time (1-reliability). Here the Cronbachs alpha for scale 1 is .827 and for scale 2 is .751.
4.89*(1-.827) 13.84*(1-.751)
.84597

e1
1

3.44616

e2
1

e1
1

e2
1

scale1
1

scale2
1

scale1
1

scale2
1

f1

f2

f1

f2

e1
.42

e2
.50

scale1

scale2

f1

f2

.57

Notice the correlation now has been adjusted for the unreliability
11

Example of Structural equation modeling


From Neumark-Sztainer D, Wall MM, Story M, Perry C (2003) Correlates of unhealthy weight-control behaviors among adolescents: Implications for prevention programs, Health Psychology, 22(1), 88-98.

Figure 1. Proposed model: Correlates of unhealthy weight-control behaviors among adolescents.

12

Example of SEM - Measuring the latent variables


Table 2 Results From Confirmatory Factor Analysis Including Standardized Factor Loadings and Correlation Between the Factors
Model and factors Personal measurement model 1. Weightbody concerns Weight concerns Weight importance Body dissatisfaction 2. Psychological well-being Self-esteem Depressive mood 3. Healthnutrition attitudes Concern about health Perceived benefits of healthy eating Socioenvironmental measurement model 4. Familypeer weight norms Parental concernsbehaviors Peer dieting 5. Weight teasing Frequency of teasing Source of teasing 6. Family connectedness Family communication Atmosphere at family meals Factor loading 1 .68 .54 .75 .73 .91 .56 .05 .77 .34 4 .71 .39 .26 .83 .72 .06 .85 .55 .21 .21 5 .26 6 .06 .22 .22 Correlation between factors 2 .73 3 .05

13

Could create scales and do Path Analysis - Ignoring measurement error

14

Instead, use full SEM - Incorporate CFA into the Path Analysis - thus accounting for measurement error

15

Final results of SEM

Figure 4. Final model testing among adolescent girls: Correlates of unhealthy weight-control behaviors. BMI body mass index. * p .01.

Figure 5. Final model testing among adolescent boys: Correlates of unhealthy weight-control behaviors. BMI body mass index. * p .01.

16

Common to use 2-step approach to SEM


1. Develop measurement model (CFA) relating observed variables to latent variables. Examine goodness of t of this model on its own. Examine correlations between all variables (usually latent variables) of interest by looking at correlations between factors from CFA. 2. Develop full structural equation model. That is, change the spuriously correlated relationships in the CFA to impose theoretical causal direct eects between variables and drop relationships not assumed by theory. Examine goodness of t of this model as a whole. Common reference advocating this approach is Anderson, J.C. and Gerbing, D.W. (1988) Psychological Bulletin

17

Comparing Multiple Regression and SEM


STATISTICS IN MEDICINE Statist. Med. 2003; 22:36713685 (DOI: 10.1002/sim.1588)

TUTORIAL IN BIOSTATISTICS Comparison of multiple regression to two latent variable techniques for estimation and prediction
Melanie M. Wall1; ; ; and Ruifeng Li2;
1

Division of Biostatistics; School of Public Health; University of Minnesota; Minneapolis; MN 55455; U.S.A. 2 Department of Epidemiology; School of Public Health; Harvard University; U.S.A.

SUMMARY In the areas of epidemiology, psychology, sociology, and other social and behavioural sciences, researchers often encounter situations where there are not only many variables contributing to a particular phenomenon, but there are also strong relationships among many of the predictor variables of interest. By using the traditional multiple regression on all the predictor variables, it is possible to have problems with interpretation and multicollinearity. As an alternative to multiple regression, we explore the use of a latent variable model that can address the relationship among the predictor variables. We consider two di erent methods for estimation and prediction for this model: one that uses multiple regression on factor score estimates and the other that uses structural equation modelling. The rst method uses multiple regression but on a set of predicted underlying factors (i.e. factor scores), and the second method is a full-information maximum-likelihood technique that incorporates the complete covariance structure of the data. In this tutorial, we will explain the model and each estimation method, including how to carry out prediction. A data example will be used for demonstration, where respiratory disease death rates by county in Minnesota are predicted by ve county-level census variables. A simulation study is performed to evaluate the e ciency of prediction using the two latent variable modelling techniques compared to multiple regression. Copyright ? 2003 John Wiley & Sons, Ltd.
KEY WORDS:

multiple regression; factor analysis; structural equation modelling; respiratory disease

18

Data Source - MN county example


Minnesota county-level census death record data from 1990 to 1998 Outcome: Log of age-adjusted respiratory disease death rate Observed Predictors: Five census variables on the county-level Goal: establish the relation of predictors with outcome for interpretation and prediction

19

FIVE PREDICTORS (all on the county-level)- MN county example


eduhs: percent with high school education medhhin: median households income (in dollars) percapit: per capita income (in dollars) pubwater: percent of households with access to public water wood: percent of households using wood to heat the home

20

Multiple Regression -MN county example


. 00

educhs
. 03 . 01 . 00 . 04 . 00 . 05 . 02 -02 . -01 . -02 . . 04 -02 . . 11 . 11 . 33 . 53 . 03 . 07

m edhhi nc

e1 1 r espm or t

per capi n

pubw at er
. 01

1. 29

w ood
resp = 0 + 1 eduhs + 2 medhhin + 3 percapit + 4 pubwater + 5 wood +
21

Tool in AMOS to draw many covariance arrows


Click on each variable in the set which will be correlated with each other Click on Tools then Macros then Draw Covariance This will then draw all the desired double headed arrows.

22

Multiple Regression-MN county example


Examining unstandardized estimates. The coecients are scaled up by 10 (for the percents) and 1000 for the dollar amounts compared to numbers in the original paper because units of raw data are scaled down.

Regression W eights respmort respmort respmort respmort respmort Estimate <-- wood 1.294362 <-- pubwater 0.108254 <-- percapin -0.017170 <-- medhhinc 0.072829 <-- educhs 0.528606 S.E. 0.452701 0.228837 0.321008 0.098331 0.718153 C.R. 2.859198 0.473061 -0.053488 0.740651 0.736064 P 0.004247 0.636170 0.957343 0.458905 0.461692

Only WOOD is signicant.

23

Multiple Regression-MN county example


educhs
. 84 . 14 . 86 . 36 -02 . -30 . . 43 . 53 -43 . -47 . -87 . . 93

m edhhi nc
. 20

e1
. 23

per capi n
. 10

r espm or t

pubw at er

. 58

w ood
Interpretation Problem: highly correlated - multicollinearity

24

Multiple Regression-MN county example

Sample Correlations-Estimates educhs medhhinc educhs 1.000000 0.841414 medhhinc 0.841414 1.000000 percapin 0.859896 0.925435 pubwater 0.355381 0.433244 wood -0.304913 -0.425844 respmort 0.157416 0.102968

percapin 0.859896 0.925435 1.000000 0.534107 -0.469779 0.074351


25

pubwater 0.355381 0.433244 0.534107 1.000000 -0.874654 -0.276009

wood -0.304913 -0.425844 -0.469779 -0.874654 1.000000 0.369008

respmort 0.157416 0.102968 0.074351 -0.276009 0.369008 1.000000

Consider latent variables as explanation for correlationMN county example


Hence we consider the following latent variable model that takes into account the existence of the two latent factors ruralness and SES: eduhs = 10 + 11SES + u1 medhhin = 20 + 21SES + u2 percapit = SES + u3 pubwater = 30 + 32ruralness + u4 wood = ruralness + u5 RESP = 0 + 1SES + 2ruralness +

26

Structural equation model-MN county example

e6

. 26 . 03 2. 76 1 e3 m edhhi 1. nc 00 . 00 1

. 00 1

educhs

. 04

ses
. 40

. 03

e1

e2

per capi n

r espm or t
. 02 -63 . . 01 1 . 00 1 . 03 1. 00 -49 .

e5 pubw at er e4 w ood

access t o utltes iii

Chi-square = 21.1 d.f = 7, ratio = 3.0


27

Structural equation model-MN county example

Regression W eights percapin medhhinc wood pubwater educhs respmort respmort <-<-<-<-<-<-<-Estimate ses 1.000000 ses 2.764592 accessto_util -0.489196 accessto_util 1.000000 ses 0.256296 accessto_util -0.628848 ses 0.398241 S.E. C.R. P

0.133679 20.680802 0.000000 0.043761 -11.178832 0.000000 0.016835 15.224053 0.000000 0.139034 -4.522980 0.000006 0.124171 3.207206 0.001340
28

Structural equation model-MN county example


. 77

e6 e3

. 88 . 89 . 95 m edhhi . nc 98 . 96

educhs

ses
. 37

e1
. 23

e2 per n capi
. 51 -54 . . 85

r espm or t

e5 pubwat er e4 wood

. 92

. -95 90.

access t o utltes iii

29

Results from paper where 2nd latent variable coded dierently


Table IV. Estimated coe cients incorporating the latent variable model.
Regression on factor scores Parameter
0:

SEM-FIML Estimate (s.e.) 7:85 (0.16) 0.00004 (0.00001) 0.013 (0.003) 0.21 P-value 0:0001 0.0010 0.0001

Estimate (s.e.) 7:68 (0.14) 0.00003 (0.00001) 0.010 (0.003) 0.14

P-value 0:0001 0.0161 0.0006

intercept 1 : SES 2 : ruralness R2

30

Explanation of how SEM might predict better than Multiple regression


How is it possible for the SEM-FIML technique to beat ordinary least squares in terms of prediction? It is well known that E(Y |X) is the best mean square predictor of Y . Although in general the form of E(Y |X) is unknown, when (Y; X) is jointly normal, with E(Y; X) = ( Y ; \X ), Var(X) = XX and Cov(Y; X) = Y X , then E(Y |X) = Y + Y X 1 (X \X ). XX The best predictor given a particular data set is then equal to E(Y |X), with the maximumlikelihood estimates plugged in for \Y , \X , Y X and XX . When nothing is assumed about the p(p + 1)=2 unique elements of the symmetric matrix XX , the maximum-likelihood estimator for XX is simply the sample covariance matrix of X, i.e. every element is estimated independently, and the E(Y |X) with maximum-likelihood estimators plugged in yields the OLS predictor. However, if we have some model for the elements of XX as is the case in the latent variable model where XX is a function of fewer parameters than p(p + 1)=2, and these parameters also appear in Y X , then maximum-likelihood estimators based on the modelled XX and Y X plugged into E(Y |X) such as (11) should be best with respect to mean squared prediction error. Simply put, if the SEM model () is a good model for , XX then Y X () 1 () is more e cient than the OLS estimator Sxy S1 for estimating Y X 1 . xx XX Furthermore, we point out that like the ordinary least-squares regression predictor, the predictor using factor score estimates (9) is also a linear predictor (i.e. a linear function of Y ). On the other hand, the SEM-FIML predictor (11) is not linear since the parameter estimators are non-linear functions of both the Y and X variables. This may help to further explain how it performs more e ciently than the other methods.
31

Structural equation model - misspecied measurement model


1. 05

e2

per capi n
. 81

1. 03 . 90

e3 m edhhi nc
. 56 . 15

ses
. 35

e1
. 19

r espm or t
-51 . . 39 . 95

e6 e5 e4

educhs
. 90

pubw at er w ood

. 84 -92 .

access t o utltes iii

Chi-square = 125.2 d.f = 7, ratio = 17.9


32

Code for SEM in Mplus


Here is code for tting the MN county data SEM in Mplus
data: file is mncountycensus.txt; variable: names are eduhs medhhin percapit pubwater wood resp; usevariables are eduhs medhhin percapit pubwater resp wood1; define: wood1 = 1-wood; analysis: Type = general; model: ses by eduhs medhhin percapit; ruralness by pubwater wood1; resp on ses ruralness; output: standardized sampstat; As before the by command is used to describe the indicators of new latent variables (in this case ses and ruralness). The on command is used to create the path analysis (structural) part of the model in theis case resp on ses ruralness, note that either observed or latent variables can be included in an on command. Note, in this code the variable wood has been recoded (and renamed wood1) so that it is represents the percent of households that do NOT use wood to heat their home. There were optimization problems in Mplus when this variable was coded the other direction. Note that results in a change in sign for the loading of wood as compared to the previous results. The Define: command is used in Mplus to create new variable. It is necessary to put the new variable name on the usevariables are command, and it is necessary that this new varname comes at the end of the list.

33

Results for SEM in Mplus


THE MODEL ESTIMATION TERMINATED NORMALLY MODEL RESULTS Estimates SES BY EDUHS MEDHHIN PERCAPIT RURALNESS BY PUBWATER WOOD1 RESP ON SES UTILITY RURALNESS SES Variances SES UTILITY Residual Variances EDUHS MEDHHIN PERCAPIT PUBWATER RESP WOOD1 WITH 0.005 0.001 3.905 0.513 0.513 S.E. Est./S.E. Std StdYX

1.000 10.787 3.902

0.000 0.759 0.255

0.000 14.217 15.312

0.050 0.541 0.196

0.879 0.945 0.980

R-SQUARE Observed Variable EDUHS MEDHHIN PERCAPIT PUBWATER RESP WOOD1

R-square .772 .893 .961 .853 .227 .897

1.000 0.489

0.000 0.044

0.000 11.244

0.181 0.089

0.923 0.947

1.554 -0.629

0.488 0.138

3.182 -4.549

0.078 -0.114

0.373 -0.545

0.003 0.033

0.000 0.006

5.192 5.288

1.000 1.000

1.000 1.000

0.001 0.035 0.002 0.006 0.034 0.001

0.000 0.009 0.001 0.002 0.005 0.001

5.822 4.083 1.728 2.324 6.388 1.600

0.001 0.035 0.002 0.006 0.034 0.001

0.228 0.107 0.039 0.147 0.773 0.103

34

Examining Moderator (Interaction) eects


There are basically two general methods for examining moderator eects: 1. Stratify the data into dierent levels of the moderator and then examine the relationship between the predictor and the outcome in each of the strata. If the relationship between the predictor and outcome is dierent across the dierent strata, then it can be said there is a moderator eect, if the relationships are not signicantly dierent, then there is not a moderator eect. 2. Create a new variable which is the cross-product between the predictor and the moderator. Include this interaction term directly into the path model.
If the moderator and predictor variable are observed, then method 1 or 2 is straightforward to implement. For Method 1, if the moderator is continuous, some decision would be necessary for how to stratify the moderator (maybe split at the median, or else create several equally spaced cut-offs). For Method 2, a cross-product is formed (not if one of the variables is categorical, then separate cross products with dummy variables representing the different groups is necessary) and included. The Define: command can be used to create new cross-product variables. If the moderator is observed and the predictor is latent, method 1 can be implemented in AMOS and other basic SEM software (LISREL, Proc CALIS). Method 2 can be implemented in Mplus 4 and beyond using the special xwith command. If either the moderator is latent or both the moderator and predictor are latent then method 1 could not actually be done, since it would not be possible to stratify the data on the latent variable. Method 2 can be can be implemented in Mplus 4 and beyond using the special xwith command.

35

Examining Moderator (Interaction) eects


NOTE: When considering a moderator of the relationship between a predictor and an outcome, it is the case that the predictor also moderates the relationship between the moderator and the outcome. The two variables moderate each others relationships with the outcome. For more on conceptualizing moderators (and mediators), see e.g., Petrosino (2000) Mediators and moderators in the evaluation of programs for children. Current Practice and Agenda for Improvement. Evaluation Review, 24(1) 4772.

36

Interaction between SES and ruralness?


data: file is mncountycensus.txt; variable: names are eduhs medhhin percapit pubwater wood resp; usevariables are eduhs medhhin percapit pubwater resp wood1; define: wood1 = 1-wood; analysis: Type = random; algorithm = integration; <-------- Notice the type is random" and we have to specify that the algorithm is integration...Mplus is doing full maximum likelihood using the EM algorithm, it is NOT analyzing the sample covariance matrix as in linear SEM.

model: ses by eduhs medhhin percapit; ruralness by pubwater wood1; intsesru | ses xwith ruralness; <------------ intsesru is newly defined variable using the xwith command to create cross-product of latent ses and latent ruralness <------------ simply include intsesru as another variable in the structural model

resp on ses ruralness intsesru; output: standardized

37

Interaction between SES and ruralness?


*** WARNING in Output command STANDARDIZED option is not available for analysis with TYPE = RANDOM. Request for STANDARDIZED is ignored. 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS THE MODEL ESTIMATION TERMINATED NORMALLY MODEL RESULTS Estimates SES BY EDUHS MEDHHIN PERCAPIT RURALNES BY PUBWATER WOOD1 RESP ON SES RURALNESS INTSESRU RURALNES WITH SES Intercepts EDUHS MEDHHIN PERCAPIT PUBWATER RESP WOOD1 S.E. Est./S.E. <----Note it will not produce standardized results

1.000 10.780 3.908

0.000 0.720 0.237

0.000 14.975 16.516

1.000 0.477

0.000 0.062

0.000 7.709

1.155 -0.535 2.783

0.794 0.212 2.769

1.455 -2.521 1.005 <------ Interaction Not significant

0.005

0.001

3.550

0.752 2.505 1.123 0.564 -7.290 0.900

0.006 0.061 0.021 0.021 0.027 0.010

122.853 40.823 52.452 26.824 -273.185 89.704

Variances SES RURALNESS

.003 .034

.001 .007

4.766 4.843

38

Latent interaction models - nonlinear latent variables


Once the model includes a nonlinear function of a latent variables, the traditional methods for estimating SEM (which are based on modeling the observed covariance matrix S) are not useful. The traditional methods only apply to linear structural models, NOTE, the well-known term LISREL stands for Linear structural relations. During the past decade, much work has been done to develop methods for estimating nonlinear structural relations. Mplus implements one such method which allows the direct tting of products of latent variables in the structural part of the model. Note, quadratic terms can also be created by taking a latent variable xwithed with itself, e.g. sesquad | ses xwith ses; would create a latent quadratic ses term. In Mplus, the estimation method is directly tting the latent interaction using full maximum likelihood (via the EM algorithm) with the nonlinear structural model directly included. Full maximum likelihood can also now be done using SAS PROC NLMIXED and can similarly be implemented in Winbugs (within a Bayesian framework). See Wall M.M. Maximum likelihood and Bayesian estimation for nonlinear structural equation models using SAS, Mplus, and Winbugs Research Report 2007-021, Division of Biostatistics, University of Minnesota, 2007.
39

Latent interaction models - nonlinear latent variables


The one drawback of the full maximum likelihood or fully Bayesian method is that it make distributional assumptions about the latent variables. Other methods are developed (although not implemented easily in existing software) that do not require strong distributional assumptions on the latent variables, see
Wall, M.M. and Amemiya, Y, (2000) Estimation for polynomial structural equation models. 929-940. JASA, 95,

Wall, M.M. and Amemiya, Y, (2001) Generalized appended product indicator procedure for nonlinear structural equation analysis. Journal of Educational and Behavioral Statistics, 26, 1-29. Wall, M.M. and Amemiya, Y, (2003) A method of moments technique for fitting interaction effects in structural equation models, British Journal of Mathematical and Statistical Psychology, 56, 47-64. Wall M.M. and Amemiya, Y. (2007) A review of nonlinear factor analysis and nonlinear structural equation modeling In Factor Analysis at 100: Historical Developments and Future Directions, eds. Robert Cudeck and Robert C. MacCallum, Chapter 16 pp 337-362, Lawrence Erlbaum Associates. Wall M.M. and Amemiya, Y. (2007) Nonlinear structural equation modeling as a statistical method In Handbook of Latent Variable and related Models, ed Sik-Yum Lee, Chapter 15, 321-344, Elsevier, The Netherlands.

and references therein.

40

Multilevel modeling
Data collection involves: patients within clinics, students within classrooms, employees within units, repeated measures within patient (i.e. longitudinal data). In each case the grouping or clustering variable is: clinics, classrooms, units, patient.
from Heck (2001) Multilevel Modeling in SEM, Chapter 4, New Developments and Techniques in SEM, eds Marcoulides and Schumacker, 89-127

Ignoring the presence of substantial similarities among individuals within groups can result in substantially biased estimates of the models parameters, standard errors, and t indexes.

41

Multilevel modeling - Intraclass correlation


The intraclass correlation describes the degree of correspondence within clusters or groups and can be expressed as:
2 b = 2 2 b + w 2 2 where b is the variability between groups and w is the within-group varibility.

Thus indicates the proportion of the total variability that can be attributed to variability between the groups. The should be zero when the data are independent - thus, its magnitude depends on characteristics of the variable measured and the attributes of the groups. The larger the intraclass correlation, the larger the distortion in parameter estimation that results from ignoring this similarity. Note it is typically assumed that dierent groups are independent of one another.

42

Vous aimerez peut-être aussi