Académique Documents
Professionnel Documents
Culture Documents
Follow up to SON Brown Bag Presentation 3/20/13 (C Thompson) Missing Data part 1
From Baraldi/Enders 2009 reference pp7-9:
p 1 of 3
relationship between the probability of missing data and self-esteem after controlling for
substance use). As a second example, suppose that a school district administers a math
aptitude exam, and students that score above a certain cut-off participate in an advanced
math course. The math course grades are MAR because missingness is completely
determined by scores on the aptitude test (e.g., students that score below the cut-off do not
have a grade for the advanced math course).
Finally, data are MNAR if the probability of missing data is systematically related to the
hypothetical values that are missing. In other words, the MNAR mechanism describes data
course grades). Although the magnitude of the bias depends on the correlation between the
omitted aptitude variable and the course grades (bias increases as the correlation increases),
the analysis is nevertheless consistent with an MNAR mechanism. Later in the manuscript,
we describe methods for incorporating so-called auxiliary variables that are related to
missingness into a statistical analysis. Doing so can mitigate bias (i.e., by making the MAR
mechanism more plausible) and can improve power (i.e., by recapturing some of the
missing information).
From Howell ref (Missing):
Missing at random
Often data are not missing completely at random, but they may be classifiable as missing at random (MAR).
(MAR is not really a good name for this condition because most people would take it to be synonymous with
C:\Documents and Settings\mdenny1\Local Settings\Temporary Internet
Files\Content.Outlook\KZU6WGSS\Examples_Missingness Mechanisms_20130325.doc3/25/2013 2:38 PM
p 2 of 3
MCAR, which it is not. However, the label has stuck.) Let's back up one step. For data to be missing completely
at random, the probability that Xi is missing is unrelated to the value of Xi or other variables in the analysis.
But the data can be considered as missing at random if the data meet the requirement that missingness does not
depend on the value of Xi after controlling for another variable. For example, people who are depressed might
be less inclined to report their income, and thus reported income will be related to depression. Depressed people
might also have a lower income in general, and thus when we have a high rate of missing data among depressed
individuals, the existing mean income might be lower than it would be without missing data. However, if,
within depressed patients the probability of reported income was unrelated to income level, then the data would
be considered MAR, though not MCAR. Another way of saying this is to say that to the extent that missingness
is correlated with other variables that are included in the analysis, the data are MAR.
The phraseology is a bit awkward here because we tend to think of randomness as not producing bias, and thus
might well think that Missing at Random is not a problem. Unfortunately it is a problem, although in this case
we have ways of dealing with the issue so as to produce meaningful and relatively unbiased estimates. But just
because a variable is MAR does not mean that you can just forget about the problem. But nor does it mean that
You have to throw up your handes and declare that there is nothing to be done
The situation in which the data are at least MAR is sometimes referred to as ignorable missingness. This name
comes about because for those data we can still produce unbiased parameter estimates without needing to
provide a model to explain missingness. Cases of MNAR, to be considered next, could be labeled cases of
nonignorable missingness.
p 3 of 3