Vous êtes sur la page 1sur 30

Mixture Modeling

Chongming Yang
Research Support Center
FHSS College

Mixture of Distributions

Mixture of Distributions

Classification Techniques
Latent Class Analysis (categorical
indicators)
Latent Profile Analysis (continuous
Indicators)
Finite Mixture Modeling (multivariate
normal variables)

Integrate Classification Models into


Other Models

Mixture Factor Analysis


Mixture Regressions
Mixture Structural Equation Modeling
Growth Mixture Modeling
Multilevel Mixture Modeling

Disadvantages of Multi-steps
Practice
Multistep practice
Run classification model
Save membership Variable
Model membership variable and other
variables

Disadvantages
Biases in parameter estimates
Biases in standard errors
Significance
Confidence Intervals

Latent Class Analysis (LCA)


Setting
Latent trait assumed to be categorical
Trait measured with multiple categorical
indicators
Example: drug addiction, Schizophrenia

Aim
Identify heterogeneous classes/groups
Estimate class probabilities
Identify good indicators of classes
Relate covariates to Classes

Graphic LCA Model


Categorical Indicators u: u1, u2,u3, ur
Categorical Latent Variable C: C =1,
2, , or K

Probabilistic Model
Assumption: Conditional independence
of u
so that interdependence is explained by C like factor analysis model

An
item probability

Joint Probability of all indicators


P (u1 , u2 , u3 ...ur )
k

P(c k ) P(u
k 1

| c k ) P (u2 | c k )...P (u r | c k )

LCA Parameters
Number of Classes -1
Item Probabilities -1

Class Means (Logit)


Probability Scale
(logistic Regression without any Covariates x)

Logit Scale

Mean

(highest number of Class)

=0

Latent Class Analysis with


Covariates
Covariates

are related to Class


Probability with multinomial logistic
regression

P (cik 1| xi )

ck ck x

e
J 1

cj cj x

Posterior Probability
(membership/classification of cases)

P(c k ) P(u1 | c k ) P(u2 | c k )...P(ur | c k )


P(c k | u1, u2 ,...ur )
P(u1 , u2 ,...ur )

Estimation
Maximum Likelihood estimation via
Expectation-Maximization algorithm
E (expectation) step: compute average
posterior probabilities for each class and
item
M (maximization) step: estimate class
and item parameters
Iterate EM to maximize the likelihood of
the parameters

Test against Data


O = observed number of response
patterns
E = model estimated number of
2
(o e)
2
response
patterns

e
Pearson
Chi-square based on likelihood ratio
2 LR 2 o log(o / e)

Determine Number of
Classes
Substantive theory (parsimonious,
interpretable)
Predictive validity
Auxiliary variables / covariates
Statistical information and tests
Bayesian Information Criterion (BIC)
Entropy
Testing K against K-1 Classes
Vuong-Lo-Mendell-Rubin likelihood-ratio test
Bootstrapped likelihood ratio test

Bayesian Information Criterion


(BIC)

BIC 2log (L) (h)ln(N )


L = likelihood
h = number of parameters
N = sample size
Choose model with smallest
BIC
BIC Difference > 4
appreciable

Quality of Classification
Entropy

= average of highest class probability of


individuals
A value of close to 1 indicates good
classification
No clear cutting point for acceptance or
rejection

Testing K against K-1


Classes
Bootstrapped likelihood ratio test
LRT = 2[logL(model 1)logL(model2)], where
model 2 is
nested in model 1.
Bootstrap Steps:
1. Estimate LRT for both models
2. Use bootstrapped samples to obtain
distributions for LRT of both models
3. Compare LRT and get p values

Testing K against K-1


Classes
Vuong-Lo-Mendell-Rubin likelihoodratio test

Determine Quality of
Indicators
Good indicators
Item response probability is close to 0 or
1 in each class

Bad indicators
Item response probability is high in more
than one classes, like cross-loading in
factor analysis
Item response probability is low in all
classes like low-loading in factor
analysis

LCA Examples
LCA
LCA with covariates
Class predicts a categorical outcome

Save Membership Variable


Variable:
idvar = id;
Output:
Savedata: File = cmmber.txt;
Save = cprob;

Latent Profile Analysis


Covariance of continuous variables
are dependent on class K and fixed
at zero
Variances of continuous variables are
constrained to be equal across
classes and minimized
Mean differences are maximized
across classes

Finite Mixture Modeling


(multivariate normal variables)
Finite = finite number of subgroups/classes
Variables are normally distributed in each
class
Means differ across classes
Variances are the same across
Covariances can differ without restrictions
or equal with restrictions across classes
Latent profile can be special case with
covariances fixed at zero.

Mixture Factor Analysis


Allow one to examine measurement
properties of items in heterogeneous
subgroups / classes
Measurement invariance is not
required assuming heterogeneity
Factor structure can change
See Mplus outputs

Factor Mixture Analysis


Parental Control
Parents let you make your
home on weekend nights
Parents let you make your
around with
Parents let you make your
Parents let you make your
programs you watch
Parents let you make your
programs you watch
Parents let you make your
bed on week nights
Parents let you make your

own decisions about the time you must be


own decisions about the people you hang
own decisions about what you wear
own decisions about which television
own decisions about which television
own decisions about what time you go to
own decisions about what you eat

Parental Acceptance
Feel people in your family understand you
Feel you want to leave home

Feel you and your family have fun together


Feel that your family pay attention to you
Feel your parents care about you
Feel close to your mother
Feel close to your father

Two dimensions of Parenting

Mixture SEM
See mixture growth modeling

Mixture Modeling with Known


Classes
Identify hidden classes within known
groups
Under nonrandomized experiments
Impose equality constraints on
covariates to identify similar classes
from known groups
Compare classes that differ in covariates

Vous aimerez peut-être aussi