Introduction To Bayesian SEM

Bayesian Structural Equation Modelling Using Mplus
Overview: Major Steps in the Bayesian Approach to Data Analysis

Research Question
Estimation
Model fit
Hypotheses evaluation - Model selection
The Data to be Collected: Variables and Sample Size
How to enter the data into Mplus
Missing data
The Statistical Model
How to specify a statistical model in Mplus
How to specify an imputation model in Mplus
The Prior Distribution
Default Uninformative and how to specify in Mplus
Informative and how to specify in Mplus
The Posterior Distribution
Estimates and credible intervals
How to check convergence
Model fit
Hypothesis evaluation and Model selection
How to interpret Mplus output
Lecture 1: Bayesian Estimation

Data, Research Question, and Statistical Model
Research Question
?
The Data N=65
... ... ... ...
20 3 7 13
21 1 4 11
22 2 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 9 16
27 5 3 7
28 5 5 8
29 11 6 14
30 6 6 10
31 7 5 11
32 8 8 10
33 9 5 8
34 2 2 1
35 4 4 8
... ... ... ...
ID Stork Urban Babies
title:
Mediation Model for the Stork Data;

data:
file = stork.txt;

variable:
names = ID stork urban babies;
usev = stork urban babies;
The Statistical Model - 1
model:
urban on stork (a);
babies on urban stork (b c);
[urban] (d);
[babies] (e);
urban (f);
babies (g);
Stork
Urban
Babies
a b
c
f
g
d
a
f
Urban = d + a Stork + error with error ~ N(0,f)
City
Village
Rural
c
Babies = e + c Stork + b Urban + error with error ~ N(0,g)

Introducing Prior, Posterior, and Sampling Based Estimation
Using One Variable
The Prior Distribution - 1 - Introduction - Non Informative Prior Distribution
A simple example based on expert elicitation: How many babies are born per
1,000 inhabitants per year in the Netherlands .
... ...
20 13
21 11
21 9
22 11
23 9
24 16
25 16
26 7
27 8
... ...
ID Babies
Data
The mean is 9, the standard error of the mean
is .5 this means that the data tell us that
between 8 and 10 babies are born.

Note I computed a confidence interval for the
mean using 9 +/- 2 x .5. Note that 2 is almost
1.96 a more precise value for the computation
of a confidence interval.

No prior information was used, that is, an
uninformative prior distribution was used.
model:
[babies] (a);
babies (b);
The Prior Distribution - 2 - Introduction - Informative Prior Distribution
Expert Elicitation:

I assume that in each region containing 1,000 persons, the age distribution is
uniform between 0-100 years of age.

This means that each year 200 persons are between 20 and 40 years of age
(the fertile years). Which renders 80 couples and 40 bachelors.

On average I expect each couple to have 2 children, that is, 160 children over
the course of 20 years. This means 8 children per year per region containing
1,000 persons.

In my line of argument Im most uncertain about the uniform age distribution.
I know the amount of elderly is increasing, so maybe there are only 160 persons
between 20 and 40 years of age 64 couples 128 children about 6 children
per year. On the other hand there may still be less elderly than young, so maybe
240 96 couples 192 chidren about 10 per year.

In summary, I expect 8, but my credible interval is between 6 and 10 which means
my personal standard error is 1 (8 +/- 2 x 1 gives my credible interval).
The Prior Distribution - 3 - Introduction
The Normal Prior Distribution Used for means and regression coefficients.
MODEL PRIORS:
a ~ N(8,1);
8
8
10 6
14 10 12
12 14
6
4 2
4 2
8 14 10 12 6 4 2
MODEL PRIORS:
a ~ N(8,9);
MODEL PRIORS:
a ~ N(8,100000);
a
a
a
The Prior Distribution - 4 - Introduction
The Inverse Gamma Prior Distribution Used for variances.

MODEL PRIORS:
b ~ IG(.001,.001);
b 0
The default in Mpus is
uninformative improper
MODEL PRIORS:
b ~ IG(-1,.0);
b 0
Uninformative
proper
The Posterior Distribution - 1 - Introduction
Combining Data Knowledge and Prior Knowledge
a - Mean Number of Babies
9 8
Prior
Data
Posterior
The posterior distribution combines the information with respect to the mean
number of babies in the data with the information in the prior distribution. This
combination is executed by Mplus.

Using sampling the information in the posterior distribution with respect to the
mean number of babies is made accesible:
9.1
7.9
8.3
9.9
7.1
...
...
...
...

a
Estimate:
mean or median

SD

Credible Interval:
central or highest
a
mean
median
2.5% 97.5%
analysis:
estimator = bayes;
process = 2;
fbiter = 100000;
point = median;
fbiter
output:
tech1 tech8
standardized(stdyx)
cinterval(hpd);

plot:
type = plot1 plot2 plot3;
Data + Prior
model:
[babies] (a);
babies (b);

MODEL PRIORS:
a ~ N(8,1);
b ~ IG(.001,.001);
model:
[babies] (a);
babies (b);

MODEL PRIORS:
a ~ N(0,100000);
b ~ IG(.001,.001);
Estimate S.D. Lower 2.5% Upper 2.5%
Means
BABIES 9.078 0.443 8.203 9.945
A Non Informative Prior Distribution for the Mean Number of Babies
An Informative Prior Distribution for the Mean Number of Babies
Estimate S.D. Lower 2.5% Upper 2.5%
Means
BABIES 8.904 0.405 8.098 9.688

Using The Stork Data (three variables) and
Uninformative Priors
The Prior Distribution -5 - Uninformative Prior Distributions for the Stork Data
MODEL PRIORS:
a ~ N(0,100000);
b ~ N(0,100000);
c ~ N(0,100000);
d ~ N(0,100000);
e ~ N(0,100000);
f ~ IG(.001,.001);
g ~ IG(.001,.001);
model:
urban on stork (a);
[urban] (d);
[babies] (e);
urban (f);
babies (g);
User Specified
Mplus Default
MODEL PRIORS:
a ~ N(0,Infinity);
b ~ N(0,Infinity);
c ~ N(0,Infinity);
d ~ N(0,Infinity);
e ~ N(0,Infinity);
f ~ IG(-1,0);
g ~ IG(-1,0);
The Posterior Distribution - 5 - Bayesian Estimation Using Markov Chain Monte Carlo Methods
model constraint:
new(indirect);
indirect = a*b;
a b c d e f g indirect
initial values initial values initial values initial values
... ... ... ... ... ... ... ...
.35 1.14 -.11 2.89 4.00 3.46 7.15 .42
.29 1.69 -.32 1.75 5.10 3.01 7.30 .49
... ... ... ... ... ... ... ...
fbiter fb fb fb fb fb fb fb

Stork
Urban
Babies
a b
c
f
g
analysis:
estimator = bayes;
process = 2;
fbiter = 100000;
point = median;
output:
tech1 tech8 standardized(stdyx) cinterval(hpd);

plot:
type = plot1 plot2 plot3;
The Posterior Distribution - 6 - Output Computed Using the MCMC Sample
Babies on Stork
The Posterior Distribution - 7 - Histograms, Estimates and Credible Intervals
Indirect
Note that the credible interval
is not symmetric!
The Posterior Distribution - 8 - Histograms, Estimates and Credible Intervals
MODEL RESULTS
Posterior One-Tailed 95% C.I.
Estimate S.D. P-Value Lower 2.5% Upper 2.5%

URBAN ON
STORK 0.375 0.072 0.000 0.236 0.517

BABIES ON
URBAN 1.143 0.185 0.000 0.781 1.509
STORK -0.111 0.124 0.181 -0.356 0.131

Intercepts
URBAN 2.894 0.460 0.000 1.978 3.787
BABIES 4.007 0.847 0.000 2.320 5.646

Residual Variances
URBAN 3.465 0.653 0.000 2.360 4.828
BABIES 7.159 1.359 0.000 4.937 10.094

New/Additional Parameters
INDIRECT 0.422 0.108 0.000 0.225 0.644
The Posterior Distribution - 9 - Estimates and Credible Intervals
STDYX Standardization

URBAN ON
STORK 0.551 0.081 0.000 0.384 0.696

BABIES ON
URBAN 0.711 0.091 0.000 0.518 0.873
STORK -0.101 0.110 0.181 -0.316 0.115

Intercepts
URBAN 1.297 0.281 0.000 0.764 1.860
BABIES 1.109 0.303 0.000 0.538 1.721

Residual Variances
URBAN 0.696 0.085 0.000 0.536 0.866
BABIES 0.553 0.091 0.000 0.381 0.735
The Posterior Distribution - 12 - Standardized Parameter Estimates and Credible Intervals

INTERMEZZO
P-values
0
90% CI
95%CI
If 90% CI touches 0 the one-tailed p-value is .05.

If 95% CI touches 0 the one-tailed p-value is .025.

For about normal posterior distributions multiplication with 2
renders a two-tailed p-value.
Urban On Stork (a)
The Posterior Distribution - 10 - The one-tailed p-value
p-values:
For example .05
Surely God loves the .06 as much as the .05
Publication bias
Multiple hypotheses testing and capitalization on chance

Credible Intervals and Confidence Intervals
What is the value of the parameter of interest
Is the parameter positive, negative or is zero also in the ball-park
With multiple parameters still capitalization on chance

Model Selection
Compare a few carefully chosen models
Very power-full in combination with credible intervals and
standardized estimates

The Posterior Distribution - 11 - p-values, credible intervals, and model selection

Using The Stork Data (three variables) and
informative Priors
The Prior Distribution - 6- Informative Based on Historical Data
... ... ... ...
20 3 7 13
21 1 4 11
22 2 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 9 16
27 5 3 7
28 5 5 8
29 11 6 14
30 6 6 10
31 7 5 11
32 8 8 10
33 9 5 8
34 2 2 1
35 4 4 8
... ... ... ...
1 5 6
2 11 8
... ... ...
... ... ...
80 0 1
ID Stork Urban
The Current Data Historical Data
model:
urban on stork (a);
[urban] ;
urban;
MODEL RESULTS Estimate S.D.
URBAN ON STORK 0.400 0.050
a ~ N(.400,.0025)
How relevant is the historical data for the current data which is from the Netherlands in 2010.
a ~ N(.400,.0025) is the information rendered by 80 persons
a ~ N(.400,10% x 80 x .0025) ~ N(.400,.02) is the information rendered by 8 persons
What if the historical data is from Morocco in 1920? Relevance = 10%, that is, 10%
of the information in the historical data can be added to the current data. That is,
10% x 80 is 8 persons.
What if the historical data is from the Netherlands in 1920? Relevance = 25%

What if the historical data is from Germany in 2008? Relevance = 60%
The construction of informative prior distributions is NOT an exact science!!. It is
SUBJECTIVE and should be motivated by INTER-PEER-AGREEMENT about the choices
made. Note that also in META ANALYSIS studies have to be weighted with respect
to their RELEVANCE. This approach poses many DIFFICULTIES but also many
OPPORTUNITIES. It is not an ESTABLISHED and WELL-RESEARCHED approach. It is a
BABY that is about to be delivered by a STORK.

MODEL PRIORS:
a ~ N(.400,.0025);
b ~ N(0,100000);
c ~ N(0,100000);
d ~ N(0,100000);
e ~ N(0,100000);
f ~ IG(.001,.001);
g ~ IG(.001,.001);
model:
urban on stork (a);
[urban] (d);
[babies] (e);
urban (f);
babies (g);
User Specified
Suppose the data are collected by another research group
in the Netherlands in 2010.
MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%
URBAN ON STORK 0.375 0.072 0.236 0.517
INDIRECT 0.422 0.108 0.225 0.644
MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%
URBAN ON STORK 0.391 0.041 0.314 0.473
INDIRECT 0.444 0.086 0.283 0.621

MODEL PRIORS:
a ~ N(.400,.0025);
MODEL PRIORS:
a ~ N(0,100000);
The result of using subjective priors is a gain in information. But, do you trust this?
Would you be willing to use and defend this approach?
The Posterior Distribution - 12- Comparing Results from Uninformative and Informative Priors
The Prior Distribution - 9 - Extra Tools for the Specification of Informative Priors
MODEL PRIORS:
b ~ N (0, 1);
c ~ N (0, 1);
COVARIANCE (b, c) = 0.5;
output:
tech1 tech3 tech8
standardized(stdyx) cinterval(hpd);
Summary
Research Question

Statistical Model

Prior Distribution - Informative Prior Distributions

Posterior Distribution
- Assymetric Credible Intervals
- Small Sample Inferences no Asymptotic Approximations
- No Heywood Cases, Like, for Example, Negative Variances
- Sampling will ofter Work where Maximum Likelihood Fails

References Bayesian Structural Equation Modelling
A relatively accessible introduction to Bayesian structural equation modeling can be found in:

Kaplan, D. and Depaoli, S. (2012). Bayesian Structural Equation Modeling. In R.H. Hoyle (Ed.),
Handbook of Structural Equation Modeling, pp. 650-673. New York: The Guilford Press.

A classic about the elicitation of prior knowledge is:

OHagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaithe, P.H., Jenkinson, D.J..,
Oakley, J.E., and Rakow, T. (2006). Uncertain Judgements. Eliciting Experts Probabilities.
Chichester: Wiley.

A classic introduction to Bayesian data analysis is:

Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian Data Analysis.
Boca Raton, FL: Chapman & Hall/CRC.

The documentation provided by Mplus is:

Muthen, B. (2010). Bayesian analysis in Mplus: A brief introduction.

Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.
Lecture 2: Bayesian Estimation in the Presence of Missing Data

Introduction
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 999 6 14
30 6 6 10
31 7 5 999
32 8 8 10
33 999 999 8
34 2 2 1
35 4 4 8
... ... ... ...
variable:
missing = all (999);
Missing Data - 1 - Introduction
Stork
Urban
Babies
a b
c
f
g
Missing Data - 2 - Introduction
By default Mplus with analysis: estimator = bayes; will use the statistical model
that is specified to impute the missing data.

First I will explain what is meant by imputation of the missing data.

Secondly I will explain why it is usually NOT a good idea to used the statistical model
that is specified to impute the missing data.

One exception occurs if the amount of missing values is very small. A good question is
what is a small amount of missing values?

Another exception occurs if missings occur in variables that are ONLY a dependent
variable and if the missingness is MAR given the predictors of the dependent variable.

Third of all I will introduce
Multiple imputation using a general imputation model
Analysis of each imputed data set using a statistical model that is consistent with the
imputation model
Summarizing the results obtained from the analysis of each imputed data set


Multiple Imputation
Multiple Imputation Using the Statistical Model - 1
d
a
f
Multiple Imputation Using the Statistical Model - 2
a b c d e f g 22-S 26-U 26-B 29-U 31-B 33-S 33-U
0 0 0 0 0 1 1 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
.35 1.14 -.11 2.89 4.00 3.46 7.15 5 5 12 7 9 2 3
.29 1.69 -.32 1.75 5.10 3.01 7.30 7 3 11 5 10 3 4
... ... ... ... ... ... ... ... ... ... ... ... ... ...
fbiter fb fb fb fb fb fb fb fb fb fb fb fb fb
MODEL RESULTS
BABIES ON
URBAN 1.143 0.185 0.000 0.781 1.509
STORK -0.111 0.124 0.181 -0.356 0.131

New/Additional Parameters
INDIRECT 0.422 0.108 0.000 0.225 0.644

Data that are not Missing at Random
Stork
Urban
Babies
a b
c
f
g
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 7 999 14
30 6 6 10
31 7 5 999
32 8 8 10
33 999 999 8
34 2 2 1
35 4 4 8
... ... ... ...
Multiple Imputation Using the Statistical Model- 3 - Data that are NOT Missing at Random
Multiple Imputation Using the Statistical Model- 4 - Data that are NOT Missing at Random
Urban Babies

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Urban Babies

1 1
2 2
3 3
4 4
5
6
7
8
Urban Babies

1 1
2 2
3 3
4 4
2.5 5
2.5 6
2.5 7
2.5 8
Urban Babies

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Urban Babies
Model:
Babies on Urban;
Urban;
Model:
Babies with Urban;

Urban Babies

Data that are Missing at Random
Multiple Imputation Using the Statistical Model- 5 - Data that are Missing at Random
Urban Babies

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Urban Babies

1 1
2 2
3 3
4 4
5
6
7
8
Urban Babies

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Urban Babies

1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
Urban Babies
Model:
Babies on Urban;
Urban;
Model:
Babies with Urban;

Urban Babies
Multiple Imputation Using a General Imputation Model - 1 - Data that are Missing at Random
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 7 999 14
30 6 6 10
31 7 5 999
32 8 8 10
33 999 999 8
34 2 2 1
35 4 4 8
... ... ... ...
Stork
Urban
Babies
model:
stork with urban;
stork with babies;
urban with babies;
[stork];
[urban];
[babies];

How to do it in Mplus
Multiple Imputation Using a General Imputation Model - 2 - How to do it in Mplus
title: this is an example of multiple imputation
for a set of variables with missing values using
a general statistical model;

data: FILE = storkMI.txt;

variable:
auxiliary = ID;
usevariables = stork urban babies;

analysis: estimator = bayes;
fbiter = 10000;
proces = 2;
data imputation:
impute = stork urban babies;
ndatasets = 10;
thin = 1000;
save = storkimp*.dat;

model: stork with urban babies;
urban with babies;
[stork];
[urban];
[babies];

output: tech8;

plot: type = plot1 plot2 plot3;
Multiple Imputation Using a General Imputation Model - 3 - Multiple Imputations
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 999 6 14
30 6 6 10
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
4 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 8 12 26
5 3 7 27
5 5 8 28
9 6 14 29
6 6 10 30
... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
7 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 9 14 26
5 3 7 27
5 5 8 28
8 6 14 29
6 6 10 30
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
5 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 8 13 26
5 3 7 27
5 5 8 28
11 6 14 29
6 6 10 30
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
6 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 5 10 26
5 3 7 27
5 5 8 28
11 6 14 29
6 6 10 30
... ... ... ...
Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID
m = 1, ..., M
Multiple Imputation Using a General Imputation Model- 4 - Data that are Missing at Random
It can never be ensured that data are missing at random.

Use enough variables in the imputation model to feel confident that
MAR is a reasonable assumption. There may be variables in the imputation
model that do not appear in the statistical model.

Can we in our example think of variables that could be very good
predictors of missing data and that are not part of the statistical model?

Never use to many variables in the imputation model. A rule of thumb is
1 variable for every 20 cases in the data file. But this is only a rule of thumb!

Creating a good imputation model is partly ART, partly SKILL, and rather
BAYESIAN because it requires carefull prior thinking, that is thinking
without using empirical data.

title:
Mediation Model for the Stork Data;

data:
file = storkimplist.dat;
type = imputation;

variable:
names = stork urban babies ID;
Multiple Imputation Using a General Imputation Model - 5 - How to do it in Mplus
model:
urban on stork (a);
[urban] (d);
[babies] (e);
urban (f);
babies (g);

model constraint:
new(indirect);
indirect = a*b;

analysis:
estimator = ml;

output:
standardized(stdyx);
Note the difference between the imputation model
and the statistical model!!

It is also quite common that the statistical model
contains only a subset of the variables used in the
imputation model.
Multiple Imputation Using a General Imputation Model - 6 - Analyse Each Imputed Data Set
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 999 6 14
30 6 6 10
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
4 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 8 12 26
5 3 7 27
5 5 8 28
9 6 14 29
6 6 10 30
... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
7 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 9 14 26
5 3 7 27
5 5 8 28
8 6 14 29
6 6 10 30
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
5 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 8 13 26
5 3 7 27
5 5 8 28
11 6 14 29
6 6 10 30
... ... ... ...
... ... ... ...
3 7 13 20
1 4 11 21
6 4 9 22
3 6 11 23
4 6 9 24
8 7 16 25
11 5 10 26
5 3 7 27
5 5 8 28
11 6 14 29
6 6 10 30
... ... ... ...
Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID Stork Urban Babies ID
m = 1, ..., M
Estimate SD Estimate SD Intercepts Estimate SD Estimate SD
10.109 1.303 9.843 1.221 BABIES 10.567 1.432 9.992 1.271
Estimate 10.002 SD 1.672 Rate of Missing Information .22
Multiple Imputation Using a General Imputation Model - 7 - Relative Efficiency
Relative efficiency = 1 / (1 + rate/M)
For the example on the previous transparancy:
Relative efficiency = 1 / (1 + .22/10) = .98
Multiple Imputation Using a General Imputation Model - 8 - Summarize the Multiple Analyses
INDIRECT 0.395 0.114 3.462 0.001 0.184

STDYX Standardization Two-Tailed Rate of
Estimate S.E. Est./S.E. P-Value Missing

URBAN ON
STORK 0.536 0.095 5.633 0.000 0.123
BABIES ON
URBAN 0.693 0.110 6.307 0.000 0.234
STORK -0.123 0.124 -0.986 0.324 0.152
Intercepts
URBAN 1.335 0.299 4.463 0.000 0.059
BABIES 1.286 0.343 3.755 0.000 0.109
Residual Variances
URBAN 0.712 0.101 7.026 0.000 0.120
BABIES 0.593 0.105 5.626 0.000 0.183
R-SQUARE
URBAN 0.288 0.101 2.842 0.004 0.120
BABIES 0.407 0.105 3.867 0.000 0.183
Multiple Imputation Using a General Imputation Model - 9 - Estimator is Bayes
Currently, MPLUS does not allow the combination of Multiple Imputation
and estimator = Bayes.

Using the R-Package MplusAutomation it is rather easy to run
MPLUS 10 times to analyse 10 imputed data sets, and to combine the
10 analyses into one overall result.

However, this is beyond the scope of the current course.


A Closer Look at the Imputation Model
Multiple Imputation Using a General Imputation Model - 10 - consistency
Stork
Urban
Babies
a b
c
f
g
Stork
Urban
Babies
Multiple Imputation Using a General Imputation Model - 11 - Consistency
Stork Babies
Stork
Urban
Babies
Multiple Imputation Using a General Imputation Model - 12 - Non Consistency
Stork
Urban
Babies
Stork
Urban
Babies
Stork
*
Stork
Multiple Imputation Using a General Imputation Model - 13- Non Consistency
Stork
Urban
Babies
Stork Urban
Babies
Stork
*
Urban
Summary
Imputation model and statistical model

Does the imputation model render data that are missing at random?

Are the imputation model and the statistical model congeneal?

The combination of multiple imputation with estimator = ML is possible
in Mplus. The combination with estimator = Bayes is not possible.

References Missing Data
A non-technical introduction to missing data analysis and multiple imputation can be found in:

Schafer, J.L. And Graham, J.W. (2002). Missing data: Our view of the state of the
art. Psychological Methods, 7, 147-177.

Classic books about missing data analysis and multiple imputation are

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.

A contemporary book is:

Van Buuren, S. (2012). Flexible imputation of missing data. Boca Raton: Chapmann & Hall/CRC.

An important paper with respect to consistency is:

Meng, X-L. (2002). Multiple imputation inferences with uncongenial sources of input.
Statistical Science, 9, 538-573.


Asparouhov, T. and Muthen, B. (2010). Multiple imputation with Mplus.

Mplusautomation is developed by Michael Hallquist. It can be found at www.statmodel.com under the tab
How-To choose Using Mplus via R.
Lecture 3: Model Fit
Model Fit 1 The Covariance Matrix
... ... ... ...
20 3 7 13
21 1 4 11
22 2 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 9 16
27 5 3 7
28 5 5 8
29 11 6 14
30 6 6 10
31 7 5 11
32 8 8 10
33 9 5 8
34 2 2 1
35 4 4 8
... ... ... ...
S U B
S 10. 7
U 4. 0 4. 8
B 3. 4 5. 1 12. 2
The observed covariance matrix displays the
relation between each pair of variables in the
data matrix.
The model implied covariance matrix is a
reconstruction of the observed covariance
matrix using the statistical model at hand.
Model Fit 2 What is model fit? Why is it important?
Stork
Urban
Babies
Stork
Urban
Babies

Stork
Urban
Babies
S U B
S 10. 7
U 4. 0 4. 8
B 3. 4 5. 1 12. 2
S U B
S 10. 7
U 4. 0 4. 8
B 3. 4 5. 1 12. 2
S U B
S 10. 7
U 0 4. 8
B 0 0 2. 2
Observed = Model Implied
Model Implied
Model Implied
Covariance Matrices
9 model parameters
7 model parameters
6 model parameters
Model Fit 3
The chi square test is computed for each statistical model. It is a function of
- The observed covariance matrix
- The model implied covariance matrix
- The difference between the number of parameters of the current and the
saturated statistical model.

It is a measure of the size of the difference between the observed and implied
covariance matrices.

The larger the size of the difference, that is, the larger the chi square value, the
less a statistical model is able to reconstruct the observed covariance matrix.

The hypothesis that is tested using the chi square test states that
the observed covariance matrix can adequately be reconstructed by
the current statistical model.
Model Fit 4
Stork
Urban
Babies
Using the observed data
and the statistical model
at hand
Parameters are sampled M-V M-V ... M-V
Used to Replicate Data
and Impute observed
missings
Xobs-Xrep Xobs-Xrep ... Xobs-Xrep
Used to compute the
CHI-test using the
parameters and the
observed-imputed
and replicated data
CHIobs-CHIrep CHIobs-CHIrep ... CHIobs-CHIrep
The proportion of pairs in which CHIrep is larger than CHIobs is the posterior predictive p-value
Model Fit 5
Model Fit 6
Stork
Urban
Babies
MODEL FIT INFORMATION

Number of Free Parameters 6

Bayesian Posterior Predictive Checking using Chi-Square

95% Confidence Interval for the Difference Between
the Observed and the Replicated Chi-Square Values

48.046 71.430

Posterior Predictive P-Value 0.000
Posterior predictive p-values around .50 indicate a model that
for all practical purposes is well fitting. Note that this approach
provides a rough model check and not a classical evaluation of
an hypothesis using a p-value.
References Model Fit
This model fit test was proposed by:

Scheines, R., Hoijtink, H., and Boomsma, A. (1999). Bayesian Estimation and Testing
of Structural Equation Models. Psychometrika, 64, 37-52.

Who based it on the work by:

Gelman, A., Meng, X-L, and Stern, H. (1996). Posterior predictive assessment of model
fitness via realized discrepancies. Statistica Sinica, 6, 733-807.


Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.

Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC

What is a Model?
Model Selection 1 Introduction
What is a model?
Stork
Urban
Babies
Stork
Urban
Babies
Stork
Urban
Babies
What is a model?
IQ
AA
LA
A
A
A A
A
A
L
L
L
L
L
L
What is a model?
Stork
Babies
Stork
Babies
Babies = a + b stork + error
b
b
MODEL PRIORS:
a ~ N(4,1)
b ~ N(1,1)
MODEL PRIORS:
a ~ N(4,1)
b ~ N(4,1)

What is the Goal of Model Selection?
What is the goal of model selection?
To select the best model from the models that are under consideration.

What is the best model?
There are multiple answers to this question. Later in this lecture we will introduce
two options:
The model that has the smallest distance to the true model (DIC)
The model that maximizes the probability of the data (Bayes factor and BIC)
But all answers involve an evaluation of the misfit and complexity of each model.

What if the models are all wrong?

What if the true model is not in the set of models under consideration?

All models are wrong but some are useful

Should the null-hypothesis be among the models under consideration?

Should the alternative hypothesis be among the models under consideration?
It can serve as a fail-safe for the models under consideration. A model with
restrictions is only a good model if it is better than the corresponding model
without restrictions.
=
Why is model selection consistent with the empirical cycle?
Observation (exploratory research!!)
Induction: from observations to a theory
Deduction: deriving testable consequences from
the theory, that is, models or hypotheses
Testing: confrontation of models or hypotheses
with empirical data
Why is Bayesian inference consistent with the empirical cycle?
Observation (exploratory research!!)
Induction: from observations to a theory
Deduction: deriving testable consequences from
the theory, that is, models or hypotheses
Testing: confrontation of models or hypotheses
with empirical data
Prior knowledge and
prior thinking
Plausible models, probably
not the true model
Select the best model =
the current state of knowledge
Remember the earth is flat, the earth is round, and the earth is shaped somewhat
like an American football. This too is sequential theory updating using new data as they
become available.
Model selection and sequential updating of scientific theories avoids
Publication bias
Multiple hypotheses testing and capitalization on chance

In model selection all models (also the null-model) are on equal footing. This implies
that model selection can be used to quantify support for the null-hypothesis.

Model selection renders a very power-full approach (in contrast to exploratory research
many possible models are a priori excluded) especially in combination with
credible intervals and standardized estimates. This will be highlighted in the
example that will be presented in the next lecture.

Information Criteria
Model Selection 1 Information Criteria
IC = misfit + complexity
The smaller the value of IC the better the model at hand. Because:
We like well-fitting models
We like parsimonious, that is specific, not-complex models because
we can derive good predictions from them
misfit is determined by the posterior distribution
of the model parameters

complexity is a function of the number of parameters in model
and the amount of information in the prior distribution

to illustrate the main features a number of examples will be given
x
y
What is the y-value?
?1
?2
?3
x
y
?1
?2
?3
What is the fit of this model?
What is the complexity of this model?
x
y
?1
?2
?3
What is the fit of this model?
What is the complexity of this model?
Model Selection 5 Information Criteria Stork can not Predict Babies
Stork
Babies
population
correlation = 0, N=100
Stork
Babies
Stork
Babies
competing models
DIC = 274.67
misfit = 268.45
par = 3.11

BIC = 282.30
misfit = 268.38
par = 3.00
DIC = 272.23
misfit = 268.65
par = 1.89

BIC = 277.61
misfit = 268.39
par = 2.00
Model Selection 6 Information Criteria Stork can Predict Babies
Stork
Babies
population
correlation = .6, N=100
Stork
Babies
Stork
Babies
competing models
DIC = 229.54
misfit = 223.32
par = 3.11

BIC = 237.07
misfit = 223.25
par = 3.00
DIC = 273.48
misfit = 269.70
par = 1.89

BIC = 278.86
misfit = 269.65
par = 2.00
TITLE: Illustrate misfit
and complexity;

MONTECARLO:
NAMES ARE y x;
NOBSERVATIONS = 10000;
NREPS = 1;
SEED = 123;

MODEL POPULATION:
y ON x * .6;
[y * 0];
y * .64;
[x * 0];
x * 1;
analysis:
estimator = bayes;

MODEL PRIORS:
a ~ N(.6,.01);

MODEL: y ON x (a);

OUTPUT: TECH9;
Model Selection 7 Information Criteria DIC and BIC can not Evaluate Models that Differ in
the Prior
Simulate a data matrix
Analyse the simulated data matrix
Specification of the
simulation model
Specification of the
simulation study
y = a + b x + error and error ~ N(0,s2)

var y = b**2 var x + s2
= .6**2 + .64
= 1.0
Why is b in this setup the correlation:
Model Selection 8 Information Criteria DIC and BIC can not Evaluate Models that Differ in
the Prior
MODEL PRIORS:
b ~ N(.6,.01)
MODEL PRIORS:
b ~ N(0,1000000)
Stork
Babies
population
correlation = .6
MODEL PRIORS:
b ~ N(0,.01)
N = 10000
DIC = 24060.54
par = 2.98

BIC = 24082.21
par = 3.00
DIC = 24060.33
par = 2.99

BIC = 24081.98
par = 3.00
DIC = 24060.35
par = 3.00

BIC = 24081.98
par = 3.00

N = 500
DIC = 1198.10
par = 2.88

BIC = 1210.95
par = 3.00
DIC = 1194.66
par = 2.91

BIC = 1207.48
par = 3.00
DIC = 1194.90
par = 3.03

BIC = 1207.47
par = 3.00
Summary:

Complexity and (mis) fit
Complexity not adequate for models that differ in the prior but Bayes factor
can deal with this situation. One example will be given during the last day
of this course
DIC or BIC? Depends on missing values present or not. Depends on the error
rates obtained using DIC and BIC.


Error Rates
Model Selection 1 Error Rates
Stork
Babies
b
M1: b = 0 DIC = 273
M2: b 0 DIC = 229
The conclusion is that M2 is a better model than M1

But how certain are we about this?

What are the probabilities of making an incorrect decision?
M1: b = 0 BIC = 278
M2: b 0 BIC = 237
deltaDIC = 44 deltaBIC = 41
M2: b 0 M1: b= 0 Populations:
... ...
Data Matrices
Sampled from
Populations
deltaDIC or deltaBIC
xx xx ... xx xx xx ... xx
Model Selection 2 Error Rates - Frequency Evaluations
Model Selection 3 Error Rates Frequency Evaluations
DIC, 1000 replications
18% > 0
5% < 0
M1: b = 0 DIC = 273
M2: b 0 DIC = 229
deltaDIC = 44
correlation = 0, N=100 correlation = .3, N=100
Model Selection 4 Error Rates Frequency Evaluations
BIC, 1000 replications
3% > 0
19% < 0
correlation = 0, N=100 correlation = .3, N=100
M1: b = 0 BIC = 278
M2: b 0 BIC = 237
deltaBIC = 41
Model Selection 5 Error Rates A Simple Alternative For Frequency Evaluations
TITLE: Error Rates;

MONTECARLO:
NAMES ARE y x;
NOBSERVATIONS = 100;
NREPS = 1000;
SEED = 123;
RESULTS = PopH0AnH1.txt;

MODEL POPULATION:
y ON x * .3; !! y ON x * 0;
[y * 0];
y * .91; !! y * 1;
[x * 0];
x * 1;

analysis:
estimator = bayes;
fbiter = 10000;

MODEL: y ON x; !! y ON x @ 0;

OUTPUT: TECH9;
M1: b = 0 DIC = 273
M2: b 0 DIC = 229
deltaDIC = 44
M1: b = 0 BIC = 278
M2: b 0 BIC = 237
deltaBIC = 41
correlation = 0, N=100
correlation = .3, N=100
deltaDIC = 285.38 - 277.08
= 8.30
deltaBIC = 290.66 - 284.97
= 5.69
deltaDIC = 285.48 286.51
= -1.03
deltaBIC = 290.75 - 294.40
= -3.65
Model Selection 5 Error Rates
Summary:

How to determine the populations from which to simulate data. Keep power
analysis in the back of you mind. It is closely related.
Mplus does not give the error rates. However, in combination with SPSS
error rates can be computed. In Exercise 7 from the lab-meeting you have the
opportunity to compute error rates in the context of multiple regression.
Mplus give a very rough alternative for error rates.
The error rates discussed here are unconditional: What is the probability of
erroneous decisions if data matrices come from M1 or M2.
Very interesting and very Bayesian are conditional error rates: What is the
probability that M1 and M2 are true if deltaBIC is equal to 2.45 for the
observed data. However, these probabilities are beyond the scope of this
workshop.
References Model Selection
An introduction to model selection can be found in

Burnham, K.P. And Anderson, D.R. (2002). Model Selection and Multi-Model Inference.
New York: Springer.

The DIC was introduced by

Spiegelhalter, D. J.,Best, N. G., Carlin, B. P., and Linde, A. V. D. (2002). Bayesian Measures
of Model Complexity and Fit. Journal of the Royal Statistical Society, 64, 583639.

The BIC is elaborate in

Kass, R.E. and Raftery, A.E. (1995). Bayes factors. Journal of the American Statistical
Association, 90, 773-795.

A comparison and overview can be found in

Hamaker, E.L., Hattum, P. van, Kuiper, R., and Hoijtink, H. (2010). Model Selection based on
information criteria in multilevel modelling. In. J. Hox and K. Roberts. Handbook of
Advanced Multilevel Modelling. London, Taylor and Francis.
Lecture 5: An Application of Model Selection
An Application of Model Selection 1
Introduction of the Twin data

and

Analysis of the first model
title: The Twin Data File;

data: file = twins.txt;

variable:
names = ID sex zygosity mothed fathed income eng1 eng2
math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;
usev = mothed fathed eng1 eng2
missing = all(999);

model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1
natsci2 vocab1 vocab2;
fac on mothed fathed;

analysis:
estimator = bayes;
process = 2;
fbiter = 10000;
point = median;

output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);
M-ED
F-ED
F
M1 E1 S1 N1 V1 M2 E2 S2 N2 V2
Model: 1 Factor and Education
*** WARNING
Data set contains cases with missing on x-variables.
These cases were not included in the analysis.
Number of cases with missing on x-variables: 26
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
For model comparison all analyses must be based on the same number of persons.
Therefore you have to deal with the missing data if Mplus excluses persons from the
analysis like it does in this example.

If there are relatively few missings like here a quick solution is to do a single imutation
using a sensible imputation model.

If there are many missings you have to resort to the use of multiple imputation and
DIC4. However, that is beyond the context of this course and also in statistical science
an area that is under development.
title: Single Imputation of the Twin Data File;

data: FILE = twins.txt;

variable:
names = ID sex zygosity mothed fathed income eng1 eng2
usev = mothed fathed income eng1 eng2
auxiliary = ID sex zygosity;
missing = all(999);

data imputation:
impute = mothed fathed income eng1 eng2
ndatasets = 1;
thin = 1000;
save = twinimp*.dat;

analysis: estimator = bayes;
fbiter = 10000;
proces = 2;

model: mothed with fathed income eng1 eng2
fathed with income eng1 eng2
income with eng1 eng2
eng1 with eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;
eng2 with math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;
math1 with math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;
math2 with socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;
socsci1 with socsci2 natsci1 natsci2 vocab1 vocab2;
socsci2 with natsci1 natsci2 vocab1 vocab2;
natsci1 with natsci2 vocab1 vocab2;
natsci2 with vocab1 vocab2;
vocab1 with vocab2;

output: tech8;
Analyse the first model using the single imputed data set
title: The Twin Data File;

data: file = twinimp1.dat;

variable:
names = mothed fathed income eng1 eng2 math1 math2
socsci1 socsci2 natsci1 natsci2 vocab1 vocab2 ID sex zygosity;
usev = mothed fathed eng1 eng2
missing = all(999);

model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1
natsci2 vocab1 vocab2;
fac on mothed fathed;

analysis:
estimator = bayes;
process = 2;
fbiter = 10000;
point = median;

output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);
In itself these numbers have no meaning. They can only be compared to the
same numbers computed for one or more competing models.
Information Criterion

Deviance (DIC) 46237.298
Estimated Number of Parameters (pD) 31.861
Bayesian (BIC) 46388.873
M-ED
F-ED
F1
M1 E1 S1 N1 V1 M2 E2 S2 N2 V2
F2
Income
F
M1 E1 S1 N1 V1 M2 E2 S2 N2 V2
Model: 1 Factor and Income
F1
M1 E1 S1 N1 V1 M2 E2 S2 N2 V2
Model: 2 Factor and Income
F2
Income

Estim number of Par (pD) 31.861

Estim Number of Par (pD) 34.841
Model: 2 Factor and Income Model: 1 Factor and Income


Are the differences in BIC and DIC convincing?

Should we determine the error rates?

Should we determine the conditional error rates?
FAC1 BY
ENG1 0.765 0.016 0.000 0.732 0.796
MATH1 0.691 0.020 0.000 0.651 0.728
SOCSCI1 0.862 0.011 0.000 0.840 0.883
NATSCI1 0.770 0.016 0.000 0.738 0.801
VOCAB1 0.850 0.012 0.000 0.827 0.873
FAC2 BY
ENG2 0.748 0.017 0.000 0.713 0.780
MATH2 0.739 0.018 0.000 0.703 0.772
SOCSCI2 0.868 0.011 0.000 0.847 0.888
NATSCI2 0.762 0.016 0.000 0.729 0.793
VOCAB2 0.862 0.011 0.000 0.839 0.883
FAC1 ON
MOTHED 0.098 0.042 0.010 0.016 0.180
FATHED 0.236 0.041 0.000 0.154 0.316
FAC2 ON
MOTHED 0.088 0.042 0.018 0.006 0.170
FATHED 0.256 0.041 0.000 0.177 0.336
FAC2 WITH
FAC1 0.870 0.013 0.000 0.843 0.895
Intercepts
ENG1 3.423 0.140 0.000 3.149 3.698
ENG2 3.733 0.145 0.000 3.445 4.015
MATH1 2.731 0.121 0.000 2.484 2.958
MATH2 2.785 0.126 0.000 2.537 3.033
SOCSCI1 3.450 0.148 0.000 3.151 3.732
SOCSCI2 3.502 0.149 0.000 3.201 3.786

Residual Variances
ENG1 0.415 0.025 0.000 0.367 0.464
ENG2 0.441 0.026 0.000 0.392 0.492
MATH1 0.523 0.027 0.000 0.470 0.577
MATH2 0.455 0.026 0.000 0.405 0.507
SOCSCI1 0.256 0.019 0.000 0.220 0.295
SOCSCI2 0.246 0.018 0.000 0.211 0.283
FAC1 0.907 0.020 0.000 0.868 0.944
FAC2 0.900 0.020 0.000 0.858 0.937

R-SQUARE

Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

ENG1 0.585 0.025 0.000 0.536 0.633
ENG2 0.559 0.026 0.000 0.508 0.608
MATH1 0.477 0.027 0.000 0.423 0.530
MATH2 0.545 0.026 0.000 0.493 0.595
SOCSCI1 0.744 0.019 0.000 0.705 0.780
SOCSCI2 0.754 0.018 0.000 0.717 0.789
NATSCI1 0.592 0.025 0.000 0.544 0.640
NATSCI2 0.580 0.025 0.000 0.531 0.629
VOCAB1 0.723 0.020 0.000 0.682 0.761
VOCAB2 0.742 0.019 0.000 0.703 0.778

Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

FAC1 0.093 0.020 0.000 0.056 0.132
FAC2 0.100 0.020 0.000 0.063 0.142
And now the empirical cycle has to be restarted !!!!
References An Application of Model Selection
Loehlin, J.C. and Nichols, R.C. (1976). Genes, Environment and Personality.
Austin TX: University of Texas Press.
Lecture 6: Model Selection in the Presence of Missing Data
Model Selection and Missing Data 1
... ... ... ...
20 3 7 13
21 1 4 11
22 999 4 9
23 3 6 11
24 4 6 9
25 8 7 16
26 11 999 999
27 5 3 7
28 5 5 8
29 999 6 14
30 6 6 10
31 7 5 999
32 8 8 10
33 999 999 8
34 2 2 1
35 4 4 8
... ... ... ...
Situation 1: The data are MAR when the statistical model is equal to the imputation model
In Mplus, both the misfit and the complexity of the DIC are computed using only
the observed data, and, parameter values sampled and estimated using the statistical
model to impute the missing values.

This is a valid procedure that can be used without hesitation.
DIC = misfit + complexity = misfit + estimated number of parameters
BIC = misfit + complexity = misfit + log N x P
In Mplus in the misfit of the BIC is computed using only the observed data, and,
parameter values sampled and estimated using the statistical model to impute
the missing values.

The complexity is estimated as the log of the number of persons multiplied
with the number of parameters in a statistical model. As to yet it is unknown how
N should be determined in the presence of missing data. Mplus uses the sample
size. But this is an ad-hoc and unmotivated choice.

Currently it is not advised to used the BIC in the presence of missing data.
Situation 2: The statistical model is consistent with the imputation model, and, given
the imputation model the missing values are MAR
Using a three step procedure Mplus can be used to compute the DIC accounting
for the fact that some of the data are missing:
1. Multiply impute the data using the imputation model.
2. For each imputed data matrix compute the DIC using Mplus
3. Average the DICs obtained for the M imputed data matrices
The results is DIC4 as discussed by Celeux at al. (2006). This is not the definite answer
to the computation of the DIC in the presence of missing data, but at least there is
some support for this approach in the scientific literature. One is well advised to use
the MonteCarlo approach from Mplus to evaluate in each new situation how well the
DIC4 performs. It is beyond the scope of this course to show how this can be done.
Note that using MplusAutomation this can relatively easily be implemented (as opposed
to doing this manually). However, this is also beyond the scope of this course.
References Model Selection and Missing Data
A paper about the computation of DIC in the presence of missing data

Celeux, G., Forbes, F., Robert, C.P., and Titterington, D.M. (2006). Deviance Information
Criteria for Missing Data Models. Bayesian Analysis, 1, 651-674.

A paper about the difference between the imputation and analysis model in the
context of missing data

Kuiper, R.M. and Hoijtink, H. (2011). How to Handle Missing Data for Predictor Selection in
Regression Models Using the AIC. Statistics Neerlandica, 65, 489-506.

Mplusautomation is developed by Michael Hallquist. If you google for CRAN
MPLUSAUTOMATION you will find the website from which the R package and
documentation can be downloaded.

Introduction To Bayesian SEM

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Introduction To Bayesian SEM

Transféré par

Droits d'auteur :

Formats disponibles

Bayesian Structural Equation Modelling Using Mplus

Overview: Major Steps in the Bayesian Approach to Data Analysis

Vous aimerez peut-être aussi