Date: Thu, 6 Jul 2006 07:55:06 -0400
Reply-To: Peter Flom <Flom@NDRI.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Peter Flom <Flom@NDRI.ORG>
Subject: Re: survey regression analysis
In-Reply-To: <BAY103-F31013E1D1681B076D7ACCAB0770@phx.gbl>
Content-Type: text/plain; charset=US-ASCII
I wrote
In a lecture that he gave here at NDRI, and in other lectures I have
heard him give, Joe Schafer has indicated that some of his results
show that MI is a better technique than listwise deletion even when
the data are MNAR.
I haven't got any formal published cites for this, although there
may
be some by now, but thought it apropos. He indicated that the
degree of bias introduced by MNAR would have to be quite extreme
for llistwise to be better than MI.
and David replied
<<<
I agree with Joe. (Of course!) I prefer MI to listwise deletion in
survey
samle analysis.
HOWEVER, the problem remains that treating MNAR (Missing Not At
Random) points as if they have the same distributional properties as
the sampled data can be fundamentally error-prone. I mean, so what if
MI does better than listwise deletion if both of them stink at filling
in the holes that are there because the missings are not random and NOT
from the same population?
When I teach survey sampling classes, I make a big stink about this,
because quite often there is a *reason* why the data are missing.
We sample 98 out of a 100 lakes for mercury, but for the other 2 lakes
we cannot get permission. Do we assume those 2 lakes are just like
everything else we measured? (That's MAR.) Maybe. Maybe not.
Perhaps those 2 lakes are owned by curmudgeons who just don't want
a bunch of Feds on their land nosing around. Or perhaps those lakes
ought to be Superfund sites because of the dumping that has been
going on in them for decades. (Uh-oh. That's MNAR. Those lakes
are drastically different, and perhaps are not even part of the same
population, depending on our sample frame.)
So can MI or listwise deletion help us here? If the lakes are heavily
contaminated and the owners don't want us to find out, then I would
say no. The imputation or deletion assumes that we can fill in
'reasonable' values from the sampled observations, which is not the
case.
>>>
I am not surprised that you agree with Joe. Disagreeing with Joe about
missing data would be odd. And, of course, I agree with you. The case
you bring up with the lakes is analagous to one I presented to Joe about
our own data, e.g., in studying treatment plans for drug abuse, loss to
followup is often directly due to drug abuse. He agreed with me (and
you) that, in this case, there are no good methods.
His point was, I think, that in cases where the data do not strictly
meet the MAR assumption, MI may still produce useful results. Like many
assumptions, MAR can be grossly or mildly violated. However, since it
is almost always impossible to test, the practical upshot of this is
that judgement is necessary. That's good. Keeps us employed
Peter
Peter L. Flom, PhD
Assistant Director, Statistics and Data Analysis Core
Center for Drug Use and HIV Research
National Development and Research Institutes
71 W. 23rd St
http://cduhr.ndri.org
www.peterflom.com
New York, NY 10010
(212) 845-4485 (voice)
(917) 438-0894 (fax)