Date: Thu, 1 Dec 2005 21:45:14 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Multiple Imputation in a Narrow Range
Content-Type: text/plain; format=flowed
>There was a presentation at the most recent NYASUG meeting about robust
>regression, and the speaker was talking about dealing with various data
>issues. One issue in the example data (from a real experiment) was as
>follows. A measurement is done with a machine; readings are positive
>numbers, and the results above the machine calibration threshhold are
>lognormally distributed. Readings below the machine calibration
>threshhold are just reported as "< 0.4" or "< 0.6" (depending on how the
>machine was calibrated that particular day). The speaker had simply
>converted such values to 0.2 or 0.3, respectively, but invited suggestions
>from the crowd. One audience member (and SAS-L denizen) mentioned
>multiple imputation, and this idea led to a side discussion after the talk
>finished. These values are not true missing values; they are somewhere
>between 0 and the calibration threshhold, but they just can't be measured
>with the same accuracy as readings above the calibration threshhold, so
>there are no accurate values within that range. How would you handle such
>values? (Discarding is not an option.)
 Discarding is most definitely NOT an option.
 Typically, in chemometrics or biometrics or environmetrics settings, we
these points as BDL (Below Detection Limit). The vlaue was measured, but it
small that there is no reliable value. The smallest 'reliable' measurement
you can get
off the instrument is the 'detection limit', which is some small positive
a lot of settings, you can get a *lot* of BDL values. In some Superfund
(U.S. Environmental Protection Agency sites marked as requiring thousands of
lawyers :-)) you can have 90% or so of the measurements on something like,
PCBs as being below detection limits, while the cleanup has to be done on
areas. And then there's the problem of storing the data when many people
keeping their numbers in a format or database which does not permit you to
a 'special missing value' for BDL cases.
 A *lot* of research, modeling, and journals articles have come out of
A lot of people recommend that prior to analysis, all BDL values be set at
That is no doubt where the speaker's numbers came from. This has fairly
statistical properties in a number of situations, so it's a useful solution.
work in all cases, but then, what does?
So I'm going to support the BDL/2 approach.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ