Bayesian Structural Equation Modelling Using Mplus
Overview: Major Steps in the Bayesian Approach to Data Analysis
Research Question Estimation Model fit Hypotheses evaluation - Model selection The Data to be Collected: Variables and Sample Size How to enter the data into Mplus Missing data The Statistical Model How to specify a statistical model in Mplus How to specify an imputation model in Mplus The Prior Distribution Default Uninformative and how to specify in Mplus Informative and how to specify in Mplus The Posterior Distribution Estimates and credible intervals How to check convergence Model fit Hypothesis evaluation and Model selection How to interpret Mplus output Lecture 1: Bayesian Estimation
Data, Research Question, and Statistical Model Research Question ? The Data N=65 ... ... ... ... 20 3 7 13 21 1 4 11 22 2 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 9 16 27 5 3 7 28 5 5 8 29 11 6 14 30 6 6 10 31 7 5 11 32 8 8 10 33 9 5 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies title: Mediation Model for the Stork Data;
data: file = stork.txt;
variable: names = ID stork urban babies; usev = stork urban babies; The Statistical Model - 1 model: urban on stork (a); babies on urban stork (b c); [urban] (d); [babies] (e); urban (f); babies (g); Stork Urban Babies a b c f g The Statistical Model - 2 d a f Urban = d + a Stork + error with error ~ N(0,f) The Statistical Model - 3 City Village Rural c Babies = e + c Stork + b Urban + error with error ~ N(0,g) Lecture 1: Bayesian Estimation
Introducing Prior, Posterior, and Sampling Based Estimation Using One Variable The Prior Distribution - 1 - Introduction - Non Informative Prior Distribution A simple example based on expert elicitation: How many babies are born per 1,000 inhabitants per year in the Netherlands . ... ... 20 13 21 11 21 9 22 11 23 9 24 16 25 16 26 7 27 8 ... ... ID Babies Data The mean is 9, the standard error of the mean is .5 this means that the data tell us that between 8 and 10 babies are born.
Note I computed a confidence interval for the mean using 9 +/- 2 x .5. Note that 2 is almost 1.96 a more precise value for the computation of a confidence interval.
No prior information was used, that is, an uninformative prior distribution was used. model: [babies] (a); babies (b); The Prior Distribution - 2 - Introduction - Informative Prior Distribution Expert Elicitation:
I assume that in each region containing 1,000 persons, the age distribution is uniform between 0-100 years of age.
This means that each year 200 persons are between 20 and 40 years of age (the fertile years). Which renders 80 couples and 40 bachelors.
On average I expect each couple to have 2 children, that is, 160 children over the course of 20 years. This means 8 children per year per region containing 1,000 persons.
In my line of argument Im most uncertain about the uniform age distribution. I know the amount of elderly is increasing, so maybe there are only 160 persons between 20 and 40 years of age 64 couples 128 children about 6 children per year. On the other hand there may still be less elderly than young, so maybe 240 96 couples 192 chidren about 10 per year.
In summary, I expect 8, but my credible interval is between 6 and 10 which means my personal standard error is 1 (8 +/- 2 x 1 gives my credible interval). The Prior Distribution - 3 - Introduction The Normal Prior Distribution Used for means and regression coefficients. MODEL PRIORS: a ~ N(8,1); 8 8 10 6 14 10 12 12 14 6 4 2 4 2 8 14 10 12 6 4 2 MODEL PRIORS: a ~ N(8,9); MODEL PRIORS: a ~ N(8,100000); a a a The Prior Distribution - 4 - Introduction The Inverse Gamma Prior Distribution Used for variances.
MODEL PRIORS: b ~ IG(.001,.001); b 0 The default in Mpus is uninformative improper MODEL PRIORS: b ~ IG(-1,.0); b 0 Uninformative proper The Posterior Distribution - 1 - Introduction Combining Data Knowledge and Prior Knowledge a - Mean Number of Babies 9 8 Prior Data Posterior The Posterior Distribution - 2 - Introduction The posterior distribution combines the information with respect to the mean number of babies in the data with the information in the prior distribution. This combination is executed by Mplus.
Using sampling the information in the posterior distribution with respect to the mean number of babies is made accesible: 9.1 7.9 8.3 9.9 7.1 ... ... ... ...
a Estimate: mean or median
SD
Credible Interval: central or highest a mean median 2.5% 97.5% analysis: estimator = bayes; process = 2; fbiter = 100000; point = median; fbiter output: tech1 tech8 standardized(stdyx) cinterval(hpd);
plot: type = plot1 plot2 plot3; Data + Prior The Posterior Distribution - 3 - Introduction model: [babies] (a); babies (b);
MODEL PRIORS: a ~ N(8,1); b ~ IG(.001,.001); model: [babies] (a); babies (b);
MODEL PRIORS: a ~ N(0,100000); b ~ IG(.001,.001); Estimate S.D. Lower 2.5% Upper 2.5% Means BABIES 9.078 0.443 8.203 9.945 A Non Informative Prior Distribution for the Mean Number of Babies An Informative Prior Distribution for the Mean Number of Babies Estimate S.D. Lower 2.5% Upper 2.5% Means BABIES 8.904 0.405 8.098 9.688 The Posterior Distribution - 4 - Introduction Lecture 1: Bayesian Estimation
Introducing Prior, Posterior, and Sampling Based Estimation Using The Stork Data (three variables) and Uninformative Priors The Prior Distribution -5 - Uninformative Prior Distributions for the Stork Data MODEL PRIORS: a ~ N(0,100000); b ~ N(0,100000); c ~ N(0,100000); d ~ N(0,100000); e ~ N(0,100000); f ~ IG(.001,.001); g ~ IG(.001,.001); model: urban on stork (a); babies on urban stork (b c); [urban] (d); [babies] (e); urban (f); babies (g); User Specified Mplus Default MODEL PRIORS: a ~ N(0,Infinity); b ~ N(0,Infinity); c ~ N(0,Infinity); d ~ N(0,Infinity); e ~ N(0,Infinity); f ~ IG(-1,0); g ~ IG(-1,0); The Posterior Distribution - 5 - Bayesian Estimation Using Markov Chain Monte Carlo Methods model constraint: new(indirect); indirect = a*b; a b c d e f g indirect initial values initial values initial values initial values ... ... ... ... ... ... ... ... .35 1.14 -.11 2.89 4.00 3.46 7.15 .42 .29 1.69 -.32 1.75 5.10 3.01 7.30 .49 ... ... ... ... ... ... ... ... fbiter fb fb fb fb fb fb fb
Stork Urban Babies a b c f g analysis: estimator = bayes; process = 2; fbiter = 100000; point = median; output: tech1 tech8 standardized(stdyx) cinterval(hpd);
plot: type = plot1 plot2 plot3; The Posterior Distribution - 6 - Output Computed Using the MCMC Sample Babies on Stork The Posterior Distribution - 7 - Histograms, Estimates and Credible Intervals Indirect Note that the credible interval is not symmetric! The Posterior Distribution - 8 - Histograms, Estimates and Credible Intervals MODEL RESULTS Posterior One-Tailed 95% C.I. Estimate S.D. P-Value Lower 2.5% Upper 2.5%
Residual Variances URBAN 0.696 0.085 0.000 0.536 0.866 BABIES 0.553 0.091 0.000 0.381 0.735 The Posterior Distribution - 12 - Standardized Parameter Estimates and Credible Intervals Lecture 1: Bayesian Estimation
INTERMEZZO P-values 0 90% CI 95%CI If 90% CI touches 0 the one-tailed p-value is .05.
If 95% CI touches 0 the one-tailed p-value is .025.
For about normal posterior distributions multiplication with 2 renders a two-tailed p-value. Urban On Stork (a) The Posterior Distribution - 10 - The one-tailed p-value p-values: For example .05 Surely God loves the .06 as much as the .05 Publication bias Multiple hypotheses testing and capitalization on chance
Credible Intervals and Confidence Intervals What is the value of the parameter of interest Is the parameter positive, negative or is zero also in the ball-park With multiple parameters still capitalization on chance
Model Selection Compare a few carefully chosen models Very power-full in combination with credible intervals and standardized estimates
The Posterior Distribution - 11 - p-values, credible intervals, and model selection Lecture 1: Bayesian Estimation
Introducing Prior, Posterior, and Sampling Based Estimation Using The Stork Data (three variables) and informative Priors The Prior Distribution - 6- Informative Based on Historical Data ... ... ... ... 20 3 7 13 21 1 4 11 22 2 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 9 16 27 5 3 7 28 5 5 8 29 11 6 14 30 6 6 10 31 7 5 11 32 8 8 10 33 9 5 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies 1 5 6 2 11 8 ... ... ... ... ... ... 80 0 1 ID Stork Urban The Current Data Historical Data model: urban on stork (a); [urban] ; urban; MODEL RESULTS Estimate S.D. URBAN ON STORK 0.400 0.050 a ~ N(.400,.0025) How relevant is the historical data for the current data which is from the Netherlands in 2010. a ~ N(.400,.0025) is the information rendered by 80 persons a ~ N(.400,10% x 80 x .0025) ~ N(.400,.02) is the information rendered by 8 persons What if the historical data is from Morocco in 1920? Relevance = 10%, that is, 10% of the information in the historical data can be added to the current data. That is, 10% x 80 is 8 persons. What if the historical data is from the Netherlands in 1920? Relevance = 25%
What if the historical data is from Germany in 2008? Relevance = 60% The construction of informative prior distributions is NOT an exact science!!. It is SUBJECTIVE and should be motivated by INTER-PEER-AGREEMENT about the choices made. Note that also in META ANALYSIS studies have to be weighted with respect to their RELEVANCE. This approach poses many DIFFICULTIES but also many OPPORTUNITIES. It is not an ESTABLISHED and WELL-RESEARCHED approach. It is a BABY that is about to be delivered by a STORK.
The Prior Distribution - 7- Informative Based on Historical Data MODEL PRIORS: a ~ N(.400,.0025); b ~ N(0,100000); c ~ N(0,100000); d ~ N(0,100000); e ~ N(0,100000); f ~ IG(.001,.001); g ~ IG(.001,.001); model: urban on stork (a); babies on urban stork (b c); [urban] (d); [babies] (e); urban (f); babies (g); User Specified Suppose the data are collected by another research group in the Netherlands in 2010. The Prior Distribution - 8- Informative Based on Historical Data MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5% URBAN ON STORK 0.375 0.072 0.236 0.517 INDIRECT 0.422 0.108 0.225 0.644 MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5% URBAN ON STORK 0.391 0.041 0.314 0.473 INDIRECT 0.444 0.086 0.283 0.621
MODEL PRIORS: a ~ N(.400,.0025); MODEL PRIORS: a ~ N(0,100000); The result of using subjective priors is a gain in information. But, do you trust this? Would you be willing to use and defend this approach? The Posterior Distribution - 12- Comparing Results from Uninformative and Informative Priors The Prior Distribution - 9 - Extra Tools for the Specification of Informative Priors MODEL PRIORS: b ~ N (0, 1); c ~ N (0, 1); COVARIANCE (b, c) = 0.5; output: tech1 tech3 tech8 standardized(stdyx) cinterval(hpd); Summary Research Question
Statistical Model
Prior Distribution - Informative Prior Distributions
Posterior Distribution - Assymetric Credible Intervals - Small Sample Inferences no Asymptotic Approximations - No Heywood Cases, Like, for Example, Negative Variances - Sampling will ofter Work where Maximum Likelihood Fails
References Bayesian Structural Equation Modelling A relatively accessible introduction to Bayesian structural equation modeling can be found in:
Kaplan, D. and Depaoli, S. (2012). Bayesian Structural Equation Modeling. In R.H. Hoyle (Ed.), Handbook of Structural Equation Modeling, pp. 650-673. New York: The Guilford Press.
A classic about the elicitation of prior knowledge is:
OHagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaithe, P.H., Jenkinson, D.J.., Oakley, J.E., and Rakow, T. (2006). Uncertain Judgements. Eliciting Experts Probabilities. Chichester: Wiley.
A classic introduction to Bayesian data analysis is:
Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian Data Analysis. Boca Raton, FL: Chapman & Hall/CRC.
The documentation provided by Mplus is:
Muthen, B. (2010). Bayesian analysis in Mplus: A brief introduction.
Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation. Lecture 2: Bayesian Estimation in the Presence of Missing Data
Introduction ... ... ... ... 20 3 7 13 21 1 4 11 22 999 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 999 999 27 5 3 7 28 5 5 8 29 999 6 14 30 6 6 10 31 7 5 999 32 8 8 10 33 999 999 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies variable: names = ID stork urban babies; usev = stork urban babies; missing = all (999); Missing Data - 1 - Introduction Stork Urban Babies a b c f g Missing Data - 2 - Introduction By default Mplus with analysis: estimator = bayes; will use the statistical model that is specified to impute the missing data.
First I will explain what is meant by imputation of the missing data.
Secondly I will explain why it is usually NOT a good idea to used the statistical model that is specified to impute the missing data.
One exception occurs if the amount of missing values is very small. A good question is what is a small amount of missing values?
Another exception occurs if missings occur in variables that are ONLY a dependent variable and if the missingness is MAR given the predictors of the dependent variable.
Third of all I will introduce Multiple imputation using a general imputation model Analysis of each imputed data set using a statistical model that is consistent with the imputation model Summarizing the results obtained from the analysis of each imputed data set
Lecture 2: Bayesian Estimation in the Presence of Missing Data
Multiple Imputation Multiple Imputation Using the Statistical Model - 1 d a f Multiple Imputation Using the Statistical Model - 2 a b c d e f g 22-S 26-U 26-B 29-U 31-B 33-S 33-U 0 0 0 0 0 1 1 0 0 0 0 0 0 0 ... ... ... ... ... ... ... ... ... ... ... ... ... ... .35 1.14 -.11 2.89 4.00 3.46 7.15 5 5 12 7 9 2 3 .29 1.69 -.32 1.75 5.10 3.01 7.30 7 3 11 5 10 3 4 ... ... ... ... ... ... ... ... ... ... ... ... ... ... fbiter fb fb fb fb fb fb fb fb fb fb fb fb fb MODEL RESULTS Posterior One-Tailed 95% C.I. Estimate S.D. P-Value Lower 2.5% Upper 2.5% BABIES ON URBAN 1.143 0.185 0.000 0.781 1.509 STORK -0.111 0.124 0.181 -0.356 0.131
New/Additional Parameters INDIRECT 0.422 0.108 0.000 0.225 0.644 Lecture 2: Bayesian Estimation in the Presence of Missing Data
Data that are not Missing at Random Stork Urban Babies a b c f g ... ... ... ... 20 3 7 13 21 1 4 11 22 999 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 999 999 27 5 3 7 28 5 5 8 29 7 999 14 30 6 6 10 31 7 5 999 32 8 8 10 33 999 999 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies Multiple Imputation Using the Statistical Model- 3 - Data that are NOT Missing at Random Multiple Imputation Using the Statistical Model- 4 - Data that are NOT Missing at Random Urban Babies
Urban Babies Multiple Imputation Using a General Imputation Model - 1 - Data that are Missing at Random ... ... ... ... 20 3 7 13 21 1 4 11 22 999 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 999 999 27 5 3 7 28 5 5 8 29 7 999 14 30 6 6 10 31 7 5 999 32 8 8 10 33 999 999 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies Stork Urban Babies model: stork with urban; stork with babies; urban with babies; [stork]; [urban]; [babies]; Lecture 2: Bayesian Estimation in the Presence of Missing Data
How to do it in Mplus Multiple Imputation Using a General Imputation Model - 2 - How to do it in Mplus title: this is an example of multiple imputation for a set of variables with missing values using a general statistical model;
data: FILE = storkMI.txt;
variable: names = ID stork urban babies; auxiliary = ID; usevariables = stork urban babies; missing = all (999);
analysis: estimator = bayes; fbiter = 10000; proces = 2; data imputation: impute = stork urban babies; ndatasets = 10; thin = 1000; save = storkimp*.dat;
model: stork with urban babies; urban with babies; [stork]; [urban]; [babies];
Use enough variables in the imputation model to feel confident that MAR is a reasonable assumption. There may be variables in the imputation model that do not appear in the statistical model.
Can we in our example think of variables that could be very good predictors of missing data and that are not part of the statistical model?
Never use to many variables in the imputation model. A rule of thumb is 1 variable for every 20 cases in the data file. But this is only a rule of thumb!
Creating a good imputation model is partly ART, partly SKILL, and rather BAYESIAN because it requires carefull prior thinking, that is thinking without using empirical data.
title: Mediation Model for the Stork Data;
data: file = storkimplist.dat; type = imputation;
variable: names = stork urban babies ID; usev = stork urban babies; missing = all (999); Multiple Imputation Using a General Imputation Model - 5 - How to do it in Mplus model: urban on stork (a); babies on urban stork (b c); [urban] (d); [babies] (e); urban (f); babies (g);
model constraint: new(indirect); indirect = a*b;
analysis: estimator = ml;
output: standardized(stdyx); Note the difference between the imputation model and the statistical model!!
It is also quite common that the statistical model contains only a subset of the variables used in the imputation model. Multiple Imputation Using a General Imputation Model - 6 - Analyse Each Imputed Data Set ... ... ... ... 20 3 7 13 21 1 4 11 22 999 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 999 999 27 5 3 7 28 5 5 8 29 999 6 14 30 6 6 10 ... ... ... ... ID Stork Urban Babies ... ... ... ... 3 7 13 20 1 4 11 21 4 4 9 22 3 6 11 23 4 6 9 24 8 7 16 25 11 8 12 26 5 3 7 27 5 5 8 28 9 6 14 29 6 6 10 30 ... ... ... ... ... ... ... 3 7 13 20 1 4 11 21 7 4 9 22 3 6 11 23 4 6 9 24 8 7 16 25 11 9 14 26 5 3 7 27 5 5 8 28 8 6 14 29 6 6 10 30 ... ... ... ... ... ... ... ... 3 7 13 20 1 4 11 21 5 4 9 22 3 6 11 23 4 6 9 24 8 7 16 25 11 8 13 26 5 3 7 27 5 5 8 28 11 6 14 29 6 6 10 30 ... ... ... ... ... ... ... ... 3 7 13 20 1 4 11 21 6 4 9 22 3 6 11 23 4 6 9 24 8 7 16 25 11 5 10 26 5 3 7 27 5 5 8 28 11 6 14 29 6 6 10 30 ... ... ... ... Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID m = 1, ..., M Estimate SD Estimate SD Intercepts Estimate SD Estimate SD 10.109 1.303 9.843 1.221 BABIES 10.567 1.432 9.992 1.271 Estimate 10.002 SD 1.672 Rate of Missing Information .22 Multiple Imputation Using a General Imputation Model - 7 - Relative Efficiency Relative efficiency = 1 / (1 + rate/M) For the example on the previous transparancy: Relative efficiency = 1 / (1 + .22/10) = .98 Multiple Imputation Using a General Imputation Model - 8 - Summarize the Multiple Analyses INDIRECT 0.395 0.114 3.462 0.001 0.184
STDYX Standardization Two-Tailed Rate of Estimate S.E. Est./S.E. P-Value Missing
URBAN ON STORK 0.536 0.095 5.633 0.000 0.123 BABIES ON URBAN 0.693 0.110 6.307 0.000 0.234 STORK -0.123 0.124 -0.986 0.324 0.152 Intercepts URBAN 1.335 0.299 4.463 0.000 0.059 BABIES 1.286 0.343 3.755 0.000 0.109 Residual Variances URBAN 0.712 0.101 7.026 0.000 0.120 BABIES 0.593 0.105 5.626 0.000 0.183 R-SQUARE URBAN 0.288 0.101 2.842 0.004 0.120 BABIES 0.407 0.105 3.867 0.000 0.183 Multiple Imputation Using a General Imputation Model - 9 - Estimator is Bayes Currently, MPLUS does not allow the combination of Multiple Imputation and estimator = Bayes.
Using the R-Package MplusAutomation it is rather easy to run MPLUS 10 times to analyse 10 imputed data sets, and to combine the 10 analyses into one overall result.
However, this is beyond the scope of the current course.
Lecture 2: Bayesian Estimation in the Presence of Missing Data
A Closer Look at the Imputation Model Multiple Imputation Using a General Imputation Model - 10 - consistency Stork Urban Babies a b c f g Stork Urban Babies Multiple Imputation Using a General Imputation Model - 11 - Consistency Stork Babies Stork Urban Babies Multiple Imputation Using a General Imputation Model - 12 - Non Consistency Stork Urban Babies Stork Urban Babies Stork * Stork Multiple Imputation Using a General Imputation Model - 13- Non Consistency Stork Urban Babies Stork Urban Babies Stork * Urban Summary Imputation model and statistical model
Does the imputation model render data that are missing at random?
Are the imputation model and the statistical model congeneal?
The combination of multiple imputation with estimator = ML is possible in Mplus. The combination with estimator = Bayes is not possible.
References Missing Data A non-technical introduction to missing data analysis and multiple imputation can be found in:
Schafer, J.L. And Graham, J.W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7, 147-177.
Classic books about missing data analysis and multiple imputation are
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.
Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton: Chapmann & Hall/CRC.
An important paper with respect to consistency is:
Meng, X-L. (2002). Multiple imputation inferences with uncongenial sources of input. Statistical Science, 9, 538-573.
The documentation provided by Mplus is:
Asparouhov, T. and Muthen, B. (2010). Multiple imputation with Mplus.
Mplusautomation is developed by Michael Hallquist. It can be found at www.statmodel.com under the tab How-To choose Using Mplus via R. Lecture 3: Model Fit Model Fit 1 The Covariance Matrix ... ... ... ... 20 3 7 13 21 1 4 11 22 2 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 9 16 27 5 3 7 28 5 5 8 29 11 6 14 30 6 6 10 31 7 5 11 32 8 8 10 33 9 5 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies S U B S 10. 7 U 4. 0 4. 8 B 3. 4 5. 1 12. 2 The observed covariance matrix displays the relation between each pair of variables in the data matrix. The model implied covariance matrix is a reconstruction of the observed covariance matrix using the statistical model at hand. Model Fit 2 What is model fit? Why is it important? Stork Urban Babies Stork Urban Babies
Stork Urban Babies S U B S 10. 7 U 4. 0 4. 8 B 3. 4 5. 1 12. 2 S U B S 10. 7 U 4. 0 4. 8 B 3. 4 5. 1 12. 2 S U B S 10. 7 U 0 4. 8 B 0 0 2. 2 Observed = Model Implied Model Implied Model Implied Covariance Matrices 9 model parameters 7 model parameters 6 model parameters Model Fit 3 The chi square test is computed for each statistical model. It is a function of - The observed covariance matrix - The model implied covariance matrix - The difference between the number of parameters of the current and the saturated statistical model.
It is a measure of the size of the difference between the observed and implied covariance matrices.
The larger the size of the difference, that is, the larger the chi square value, the less a statistical model is able to reconstruct the observed covariance matrix.
The hypothesis that is tested using the chi square test states that the observed covariance matrix can adequately be reconstructed by the current statistical model. Model Fit 4 Stork Urban Babies Using the observed data and the statistical model at hand Parameters are sampled M-V M-V ... M-V Used to Replicate Data and Impute observed missings Xobs-Xrep Xobs-Xrep ... Xobs-Xrep Used to compute the CHI-test using the parameters and the observed-imputed and replicated data CHIobs-CHIrep CHIobs-CHIrep ... CHIobs-CHIrep The proportion of pairs in which CHIrep is larger than CHIobs is the posterior predictive p-value Model Fit 5 Model Fit 6 Stork Urban Babies MODEL FIT INFORMATION
Number of Free Parameters 6
Bayesian Posterior Predictive Checking using Chi-Square
95% Confidence Interval for the Difference Between the Observed and the Replicated Chi-Square Values
48.046 71.430
Posterior Predictive P-Value 0.000 Posterior predictive p-values around .50 indicate a model that for all practical purposes is well fitting. Note that this approach provides a rough model check and not a classical evaluation of an hypothesis using a p-value. References Model Fit This model fit test was proposed by:
Scheines, R., Hoijtink, H., and Boomsma, A. (1999). Bayesian Estimation and Testing of Structural Equation Models. Psychometrika, 64, 37-52.
Who based it on the work by:
Gelman, A., Meng, X-L, and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica, 6, 733-807.
The documentation provided by Mplus is:
Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.
Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC
What is a Model? Model Selection 1 Introduction What is a model? Stork Urban Babies Stork Urban Babies Stork Urban Babies What is a model? Model Selection 2 Introduction IQ AA LA A A A A A A L L L L L L What is a model? Model Selection 3 Introduction Stork Babies Stork Babies Babies = a + b stork + error b b MODEL PRIORS: a ~ N(4,1) b ~ N(1,1) MODEL PRIORS: a ~ N(4,1) b ~ N(4,1) Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC
What is the Goal of Model Selection? Model Selection 4 Introduction What is the goal of model selection? To select the best model from the models that are under consideration.
What is the best model? There are multiple answers to this question. Later in this lecture we will introduce two options: The model that has the smallest distance to the true model (DIC) The model that maximizes the probability of the data (Bayes factor and BIC) But all answers involve an evaluation of the misfit and complexity of each model.
What if the models are all wrong?
What if the true model is not in the set of models under consideration?
All models are wrong but some are useful
Should the null-hypothesis be among the models under consideration?
Should the alternative hypothesis be among the models under consideration? It can serve as a fail-safe for the models under consideration. A model with restrictions is only a good model if it is better than the corresponding model without restrictions. Model Selection 5 Introduction = Model Selection 6 Introduction Why is model selection consistent with the empirical cycle? Observation (exploratory research!!) Induction: from observations to a theory Deduction: deriving testable consequences from the theory, that is, models or hypotheses Testing: confrontation of models or hypotheses with empirical data Model Selection 7 Introduction Why is Bayesian inference consistent with the empirical cycle? Observation (exploratory research!!) Induction: from observations to a theory Deduction: deriving testable consequences from the theory, that is, models or hypotheses Testing: confrontation of models or hypotheses with empirical data Prior knowledge and prior thinking Plausible models, probably not the true model Select the best model = the current state of knowledge Remember the earth is flat, the earth is round, and the earth is shaped somewhat like an American football. This too is sequential theory updating using new data as they become available. Model Selection 8 Introduction Model selection and sequential updating of scientific theories avoids Publication bias Multiple hypotheses testing and capitalization on chance
In model selection all models (also the null-model) are on equal footing. This implies that model selection can be used to quantify support for the null-hypothesis.
Model selection renders a very power-full approach (in contrast to exploratory research many possible models are a priori excluded) especially in combination with credible intervals and standardized estimates. This will be highlighted in the example that will be presented in the next lecture. Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC
Information Criteria Model Selection 1 Information Criteria IC = misfit + complexity The smaller the value of IC the better the model at hand. Because: We like well-fitting models We like parsimonious, that is specific, not-complex models because we can derive good predictions from them misfit is determined by the posterior distribution of the model parameters
complexity is a function of the number of parameters in model and the amount of information in the prior distribution
to illustrate the main features a number of examples will be given Model Selection 2 Information Criteria x y What is the y-value? ?1 ?2 ?3 x y What is the y-value? ?1 ?2 ?3 Model Selection 3 Information Criteria What is the fit of this model? What is the complexity of this model? Model Selection 4 Information Criteria x y What is the y-value? ?1 ?2 ?3 What is the fit of this model? What is the complexity of this model? Model Selection 5 Information Criteria Stork can not Predict Babies Stork Babies population correlation = 0, N=100 Stork Babies Stork Babies competing models DIC = 274.67 misfit = 268.45 par = 3.11
BIC = 282.30 misfit = 268.38 par = 3.00 DIC = 272.23 misfit = 268.65 par = 1.89
BIC = 277.61 misfit = 268.39 par = 2.00 Model Selection 6 Information Criteria Stork can Predict Babies Stork Babies population correlation = .6, N=100 Stork Babies Stork Babies competing models DIC = 229.54 misfit = 223.32 par = 3.11
BIC = 237.07 misfit = 223.25 par = 3.00 DIC = 273.48 misfit = 269.70 par = 1.89
BIC = 278.86 misfit = 269.65 par = 2.00 TITLE: Illustrate misfit and complexity;
MONTECARLO: NAMES ARE y x; NOBSERVATIONS = 10000; NREPS = 1; SEED = 123;
MODEL POPULATION: y ON x * .6; [y * 0]; y * .64; [x * 0]; x * 1; analysis: estimator = bayes;
MODEL PRIORS: a ~ N(.6,.01);
MODEL: y ON x (a);
OUTPUT: TECH9; Model Selection 7 Information Criteria DIC and BIC can not Evaluate Models that Differ in the Prior Simulate a data matrix Analyse the simulated data matrix Specification of the simulation model Specification of the simulation study y = a + b x + error and error ~ N(0,s2)
var y = b**2 var x + s2 = .6**2 + .64 = 1.0 Why is b in this setup the correlation: Model Selection 8 Information Criteria DIC and BIC can not Evaluate Models that Differ in the Prior MODEL PRIORS: b ~ N(.6,.01) MODEL PRIORS: b ~ N(0,1000000) Stork Babies population correlation = .6 MODEL PRIORS: b ~ N(0,.01) N = 10000 DIC = 24060.54 par = 2.98
BIC = 24082.21 par = 3.00 DIC = 24060.33 par = 2.99
BIC = 24081.98 par = 3.00 DIC = 24060.35 par = 3.00
BIC = 24081.98 par = 3.00
N = 500 DIC = 1198.10 par = 2.88
BIC = 1210.95 par = 3.00 DIC = 1194.66 par = 2.91
BIC = 1207.48 par = 3.00 DIC = 1194.90 par = 3.03
BIC = 1207.47 par = 3.00 Model Selection 9 Information Criteria Summary:
Complexity and (mis) fit Complexity not adequate for models that differ in the prior but Bayes factor can deal with this situation. One example will be given during the last day of this course DIC or BIC? Depends on missing values present or not. Depends on the error rates obtained using DIC and BIC.
Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC
Error Rates Model Selection 1 Error Rates Stork Babies b M1: b = 0 DIC = 273 M2: b 0 DIC = 229 The conclusion is that M2 is a better model than M1
But how certain are we about this?
What are the probabilities of making an incorrect decision? M1: b = 0 BIC = 278 M2: b 0 BIC = 237 deltaDIC = 44 deltaBIC = 41 M2: b 0 M1: b= 0 Populations: ... ... Data Matrices Sampled from Populations deltaDIC or deltaBIC xx xx ... xx xx xx ... xx Model Selection 2 Error Rates - Frequency Evaluations Model Selection 3 Error Rates Frequency Evaluations DIC, 1000 replications 18% > 0 5% < 0 M1: b = 0 DIC = 273 M2: b 0 DIC = 229 deltaDIC = 44 correlation = 0, N=100 correlation = .3, N=100 DIC, 1000 replications Model Selection 4 Error Rates Frequency Evaluations BIC, 1000 replications 3% > 0 19% < 0 correlation = 0, N=100 correlation = .3, N=100 BIC, 1000 replications M1: b = 0 BIC = 278 M2: b 0 BIC = 237 deltaBIC = 41 Model Selection 5 Error Rates A Simple Alternative For Frequency Evaluations TITLE: Error Rates;
MONTECARLO: NAMES ARE y x; NOBSERVATIONS = 100; NREPS = 1000; SEED = 123; RESULTS = PopH0AnH1.txt;
MODEL POPULATION: y ON x * .3; !! y ON x * 0; [y * 0]; y * .91; !! y * 1; [x * 0]; x * 1;
How to determine the populations from which to simulate data. Keep power analysis in the back of you mind. It is closely related. Mplus does not give the error rates. However, in combination with SPSS error rates can be computed. In Exercise 7 from the lab-meeting you have the opportunity to compute error rates in the context of multiple regression. Mplus give a very rough alternative for error rates. The error rates discussed here are unconditional: What is the probability of erroneous decisions if data matrices come from M1 or M2. Very interesting and very Bayesian are conditional error rates: What is the probability that M1 and M2 are true if deltaBIC is equal to 2.45 for the observed data. However, these probabilities are beyond the scope of this workshop. References Model Selection An introduction to model selection can be found in
Burnham, K.P. And Anderson, D.R. (2002). Model Selection and Multi-Model Inference. New York: Springer.
The DIC was introduced by
Spiegelhalter, D. J.,Best, N. G., Carlin, B. P., and Linde, A. V. D. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society, 64, 583639.
The BIC is elaborate in
Kass, R.E. and Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773-795.
A comparison and overview can be found in
Hamaker, E.L., Hattum, P. van, Kuiper, R., and Hoijtink, H. (2010). Model Selection based on information criteria in multilevel modelling. In. J. Hox and K. Roberts. Handbook of Advanced Multilevel Modelling. London, Taylor and Francis. Lecture 5: An Application of Model Selection An Application of Model Selection 1 Introduction of the Twin data
and
Analysis of the first model An Application of Model Selection 2 title: The Twin Data File;
model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; fac on mothed fathed;
analysis: estimator = bayes; process = 2; fbiter = 10000; point = median;
output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd); plot: type = plot1 plot2 plot3; An Application of Model Selection 3 M-ED F-ED F M1 E1 S1 N1 V1 M2 E2 S2 N2 V2 Model: 1 Factor and Education An Application of Model Selection 4 *** WARNING Data set contains cases with missing on x-variables. These cases were not included in the analysis. Number of cases with missing on x-variables: 26 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS For model comparison all analyses must be based on the same number of persons. Therefore you have to deal with the missing data if Mplus excluses persons from the analysis like it does in this example.
If there are relatively few missings like here a quick solution is to do a single imutation using a sensible imputation model.
If there are many missings you have to resort to the use of multiple imputation and DIC4. However, that is beyond the context of this course and also in statistical science an area that is under development. An Application of Model Selection 5 title: Single Imputation of the Twin Data File;
data: FILE = twins.txt;
variable: names = ID sex zygosity mothed fathed income eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; usev = mothed fathed income eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; auxiliary = ID sex zygosity; missing = all(999);
data imputation: impute = mothed fathed income eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; ndatasets = 1; thin = 1000; save = twinimp*.dat;
analysis: estimator = bayes; fbiter = 10000; proces = 2;
An Application of Model Selection 6 model: mothed with fathed income eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; fathed with income eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; income with eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; eng1 with eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; eng2 with math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; math1 with math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; math2 with socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; socsci1 with socsci2 natsci1 natsci2 vocab1 vocab2; socsci2 with natsci1 natsci2 vocab1 vocab2; natsci1 with natsci2 vocab1 vocab2; natsci2 with vocab1 vocab2; vocab1 with vocab2;
output: tech8; An Application of Model Selection 7 Analyse the first model using the single imputed data set An Application of Model Selection 8 title: The Twin Data File;
model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2; fac on mothed fathed;
analysis: estimator = bayes; process = 2; fbiter = 10000; point = median;
output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd); plot: type = plot1 plot2 plot3; An Application of Model Selection 9 In itself these numbers have no meaning. They can only be compared to the same numbers computed for one or more competing models. Model: 1 Factor and Education Information Criterion
Deviance (DIC) 46237.298 Estimated Number of Parameters (pD) 31.861 Bayesian (BIC) 46388.873 An Application of Model Selection 10 M-ED F-ED F1 M1 E1 S1 N1 V1 M2 E2 S2 N2 V2 Model: 2 Factor and Education F2 An Application of Model Selection 11 Income F M1 E1 S1 N1 V1 M2 E2 S2 N2 V2 Model: 1 Factor and Income An Application of Model Selection 12 F1 M1 E1 S1 N1 V1 M2 E2 S2 N2 V2 Model: 2 Factor and Income F2 Income Model: 1 Factor and Education Information Criterion
Deviance (DIC) 46237.298 Estim number of Par (pD) 31.861 Bayesian (BIC) 46388.873 An Application of Model Selection 13 Model: 2 Factor and Education Information Criterion
Deviance (DIC) 46008.581 Estim Number of Par (pD) 34.841 Bayesian (BIC) 46174.343 Model: 2 Factor and Income Model: 1 Factor and Income Information Criterion
Deviance (DIC) 46031.495 Estim Number of Par (pD) 32.818 Bayesian (BIC) 46187.846 Information Criterion
Deviance (DIC) 46263.315 Estim Number of Par (pD) 30.940 Bayesian (BIC) 46410.004 An Application of Model Selection 14 Are the differences in BIC and DIC convincing?
FAC1 0.093 0.020 0.000 0.056 0.132 FAC2 0.100 0.020 0.000 0.063 0.142 An Application of Model Selection 18 And now the empirical cycle has to be restarted !!!! References An Application of Model Selection Loehlin, J.C. and Nichols, R.C. (1976). Genes, Environment and Personality. Austin TX: University of Texas Press. Lecture 6: Model Selection in the Presence of Missing Data Model Selection and Missing Data 1 ... ... ... ... 20 3 7 13 21 1 4 11 22 999 4 9 23 3 6 11 24 4 6 9 25 8 7 16 26 11 999 999 27 5 3 7 28 5 5 8 29 999 6 14 30 6 6 10 31 7 5 999 32 8 8 10 33 999 999 8 34 2 2 1 35 4 4 8 ... ... ... ... ID Stork Urban Babies Model Selection and Missing Data 2 Situation 1: The data are MAR when the statistical model is equal to the imputation model In Mplus, both the misfit and the complexity of the DIC are computed using only the observed data, and, parameter values sampled and estimated using the statistical model to impute the missing values.
This is a valid procedure that can be used without hesitation. DIC = misfit + complexity = misfit + estimated number of parameters Model Selection and Missing Data 3 BIC = misfit + complexity = misfit + log N x P In Mplus in the misfit of the BIC is computed using only the observed data, and, parameter values sampled and estimated using the statistical model to impute the missing values.
The complexity is estimated as the log of the number of persons multiplied with the number of parameters in a statistical model. As to yet it is unknown how N should be determined in the presence of missing data. Mplus uses the sample size. But this is an ad-hoc and unmotivated choice.
Currently it is not advised to used the BIC in the presence of missing data. Model Selection and Missing Data 4 Situation 2: The statistical model is consistent with the imputation model, and, given the imputation model the missing values are MAR Using a three step procedure Mplus can be used to compute the DIC accounting for the fact that some of the data are missing: 1. Multiply impute the data using the imputation model. 2. For each imputed data matrix compute the DIC using Mplus 3. Average the DICs obtained for the M imputed data matrices The results is DIC4 as discussed by Celeux at al. (2006). This is not the definite answer to the computation of the DIC in the presence of missing data, but at least there is some support for this approach in the scientific literature. One is well advised to use the MonteCarlo approach from Mplus to evaluate in each new situation how well the DIC4 performs. It is beyond the scope of this course to show how this can be done. Note that using MplusAutomation this can relatively easily be implemented (as opposed to doing this manually). However, this is also beyond the scope of this course. References Model Selection and Missing Data A paper about the computation of DIC in the presence of missing data
Celeux, G., Forbes, F., Robert, C.P., and Titterington, D.M. (2006). Deviance Information Criteria for Missing Data Models. Bayesian Analysis, 1, 651-674.
A paper about the difference between the imputation and analysis model in the context of missing data
Kuiper, R.M. and Hoijtink, H. (2011). How to Handle Missing Data for Predictor Selection in Regression Models Using the AIC. Statistics Neerlandica, 65, 489-506.
Mplusautomation is developed by Michael Hallquist. If you google for CRAN MPLUSAUTOMATION you will find the website from which the R package and documentation can be downloaded.