|Date: ||Tue, 20 Sep 2005 12:52:57 -0700|
|Reply-To: ||David L Cassell <davidlcassell@MSN.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||David L Cassell <davidlcassell@MSN.COM>|
|Subject: ||Re: Imputation|
|Content-Type: ||text/plain; format=flowed|
>I have two data sets named ‘good’ and ‘bad’. I want to impute values for
>the data in the ‘bad’ data set based on the ‘good’ set. I have variables
>like model year, make and type which can be used to compare the datasets.
>Actually, I want to retrieve values from the ‘good’ data set based on model
>year, make and type and assign it to the ‘bad’ data set.
I'm going to get all curmudgeon-like (you know, just like always) and
disagree with everyone else.
If you are doing real imputation, and not just filling in holes in your data
with fixed exact values that are not random in any way, then please
do NOT use single imputation methods such as the SAS code offered
so far. Look into multiple imputation so that you can later (statistically)
assess the consequences of your actions!
PROC MI does *not* provide a mechanism for inserting values off alternative
data files. I'm not sure that this is even a good idea for multiple
unless you can do something to establish that the 'good' data come from the
EXACT same population as the 'bad' data. Not pretty much the same, with
just a few tweaks, but exactly the same targt population. Otherwise, you
risk introducing a host of biases due to the fact that your 'good' data are
data representing a different target population which may have important
distinctions in some underlying characteristics.
If you really do want imputation in the statistical sense, perhaps you
write back to SAS-L and explain your process more fully.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
On the road to retirement? Check out MSN Life Events for advice on how to
get there! http://lifeevents.msn.com/category.aspx?cid=Retirement