LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2005, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 1 Dec 2005 21:45:14 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Multiple Imputation in a Narrow Range
In-Reply-To:  <200512012200.jB1LQMgN010036@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed

topkatz@MSN.COM wrote: >There was a presentation at the most recent NYASUG meeting about robust >regression, and the speaker was talking about dealing with various data >issues. One issue in the example data (from a real experiment) was as >follows. A measurement is done with a machine; readings are positive >numbers, and the results above the machine calibration threshhold are >lognormally distributed. Readings below the machine calibration >threshhold are just reported as "< 0.4" or "< 0.6" (depending on how the >machine was calibrated that particular day). The speaker had simply >converted such values to 0.2 or 0.3, respectively, but invited suggestions >from the crowd. One audience member (and SAS-L denizen) mentioned >multiple imputation, and this idea led to a side discussion after the talk >finished. These values are not true missing values; they are somewhere >between 0 and the calibration threshhold, but they just can't be measured >with the same accuracy as readings above the calibration threshhold, so >there are no accurate values within that range. How would you handle such >values? (Discarding is not an option.)

[1] Discarding is most definitely NOT an option.

[2] Typically, in chemometrics or biometrics or environmetrics settings, we refer to these points as BDL (Below Detection Limit). The vlaue was measured, but it is so small that there is no reliable value. The smallest 'reliable' measurement you can get off the instrument is the 'detection limit', which is some small positive number. In a lot of settings, you can get a *lot* of BDL values. In some Superfund situations (U.S. Environmental Protection Agency sites marked as requiring thousands of lawyers :-)) you can have 90% or so of the measurements on something like, say, PCBs as being below detection limits, while the cleanup has to be done on other areas. And then there's the problem of storing the data when many people are keeping their numbers in a format or database which does not permit you to define a 'special missing value' for BDL cases.

[3] A *lot* of research, modeling, and journals articles have come out of this area. A lot of people recommend that prior to analysis, all BDL values be set at BDL/2 . That is no doubt where the speaker's numbers came from. This has fairly good statistical properties in a number of situations, so it's a useful solution. It doesn't work in all cases, but then, what does?

So I'm going to support the BDL/2 approach.

David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Is your PC infected? Get a FREE online computer virus scan from McAfeeŽ Security. http://clinic.mcafee.com/clinic/ibuy/campaign.asp?cid=3963


Back to: Top of message | Previous page | Main SAS-L page