Date: Tue, 22 Feb 2011 14:34:16 -0500
Reply-To: William Shakespeare <shakespeare_1040@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: William Shakespeare <shakespeare_1040@HOTMAIL.COM>
Subject: Re: Imputing categorical variables
I'm very well aware of the procedures for analyzing imputed dated. Take
the following example:
id race cure
1 0 1
2 1 0
3 . 1
4 . 0
where race=0 is white, race=1 is Hispanic, cure=0 is disease not cured and
cure=1 is disease cured.
Assume I run proc mi to fill in the missing data (ignoring that in this
example the proportion missing is too high to be imputed) and it comes out
like so:
id race cure
1 0 1
2 1 0
3 .40 1
4 .80 0
The literature that I've read states that it's best not to round off the
imputed values. How should the values of .4 and .8 for race be treated in
the analysis and when running proc mianalyze? If I treat them as a 3rd
and 4th caregory then what do they represent?
Now it seems to me that one way around this is to not round and analyze a
covariance matirx or a correlation/means matrix in the second and third
step. Maybe I missed someting but when I looked through the proc glm and
logistic documentation I did not see any option for anaylizing these
matices, but there's a lot there so maybe I missed it. It looks like it's
an option on proc mianalyze.
Am I on the right track here or should I be doing something different?