Date: Wed, 5 Jul 2006 17:20:35 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: survey regression analysis
Content-Type: text/plain; format=flowed
>mshall2@GMAIL.COM sagely replied:
> >I agree, when the data are MCAR, listwise deletion fine (as is any
> >imputation technique), but listwise deletion is also, arguably the
> >strategy when missing data are non-ignorable. Advanced techniques
> >(FIML, MI) are only suitable with MAR data.
>and David Cassell added
>The biggest problem I see is people treating MNAR (Missing NOT At
>Random) data as if the data are missing at random. "Oh no problem,
>I leanred about hot-deck from a professor who last took classes on
>this in the 1960's..." :-(
>I find that the decisions about listwise deletion or not depend on
>the meta-data and the data sources. I tend to expect to see
>differences depending on whether the data come from, say, an
>experimental design vs. a sampling design.
>In a lecture that he gave here at NDRI, and in other lectures I have
>heard him give, Joe Schafer has indicated that some of his results show
>that MI is a better technique than listwise deletion even when the data
>I haven't got any formal published cites for this, although there may
>be some by now, but thought it apropos. He indicated that the degree of
>bias introduced by MNAR would have to be quite extreme for listwise to
>be better than MI.
I agree with Joe. (Of course!) I prefer MI to listwise deletion in survey
HOWEVER, the problem remains that treating MNAR (Missing Not At
Random) points as if they have the same distributional properties as the
sampled data can be fundamentally error-prone. I mean, so what if MI
does better than listwise deletion if both of them stink at filling in the
holes that are there because the missings are not random and NOT
from the same population?
When I teach survey sampling classes, I make a big stink about this,
because quite often there is a *reason* why the data are missing.
We sample 98 out of a 100 lakes for mercury, but for the other 2 lakes
we cannot get permission. Do we assume those 2 lakes are just like
everything else we measured? (That's MAR.) Maybe. Maybe not.
Perhaps those 2 lakes are owned by curmudgeons who just don't want
a bunch of Feds on their land nosing around. Or perhaps those lakes
ought to be Superfund sites because of the dumping that has been
going on in them for decades. (Uh-oh. That's MNAR. Those lakes
are drastically different, and perhaps are not even part of the same
population, depending on our sample frame.)
So can MI or listwise deletion help us here? If the lakes are heavily
contaminated and the owners don't want us to find out, then I would
say no. The imputation or deletion assumes that we can fill in 'reasonable'
values from the sampled observations, which is not the case.
It is more important to find out from the fieldwork whether the data
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
FREE pop-up blocking with the new MSN Toolbar – get it now!