LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2005, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 23 Feb 2005 15:44:21 -0800
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: Multiple Imputation--Missing Data
In-Reply-To:  <20050223190442.C52E2101D0@ws1-3.us4.outblaze.com>
Content-type: text/plain; charset=US-ASCII

"Nick ." <ni14@mail.com> also wrote to me (instead of to SAS-L): > My data set has inputs each having about 10 to 30% missing values. > When I use PROC MI (I have SAS Version 8.2) with 5 inputs (these are > the inputs I wish to impute missing values of), I get > > WARNING: The initial covariance matrix for MCMC is singular. > You can use a PRIOR= option to stabilize the inference. > > I have no idea what that means or how to get this experimental (a > word for not reliable???) version of PROC MI to give me imputed > values. I also believe that the missing values are MAR (missing at > Random). These are fields we buy from a vendor--fields like MSA and > demographic data, etc. Examples are dollar amounts, salaries, home > market values, etc. It is not like your example below where I > wouldn't want to impute. (But then again, maybe, I still shouldn't > impute???) I cannot guarantee, however, that the data we buy follow > the normality assumptions. This is real life data, not ivory tower > data. So, I guess, my questions now are: > > how do I fix the warning message > > how do I get SAS to impute > > should I even use PROC MI (experimental, data probably not normal, > etc.) or should I use some other SAS procedure?

I see you have solved some of your problems in the other message you sent to me (even though you need to write to SAS-L and not to me personally). You are limited in what PROC MI can do in SAS 8.2, and you may have to do some of the multiple imputation by hand.. or upgrade to SAS 9.1.3 to get more help.

And, since your data are purchased, the Missing At Random assumption is (most likely) unknowable. Ugh. Go ahead and assume it. Then caveat it mercilessly in your documentation to CYA. (That stands for Cover Your Assumptions. Or something close.) Explain why you HAD to assume it and why you CANNOT verify the assumption, and go forward. But document it.

> Finally, say I do get SAS to impute 5 times. So, if my data set has > 3 obs, then I will get an imputed data set 3 x 5 = 15 obs with > IMPUTED NUMBER = 1 (3 obs), IMPUTED NUMBER = 2 (3 obs), etc. What do > I do with this data set? Do I build one model with IMPUTED NUMBER = > 1, then another with IMPUTED NUMBER =2, etc. and select the best > model out of the 5 imputations? (Best in my work means best lift. My > line of work is banking/finance/campaigns ...)

First, the more missing values you have, the more you should consider upping that default '5 times'. I find that m=5 works fine for me when all my variables have less than 10% missings. You don't have that. think about increasing m to something over 5. The more you increase m, the larger your output data set and the longer it takes to run your analyses. But you need m big enough for your estimates to be stable. You can do that by running this with m=5, m=10, ... until the estimates stabilize. You probably don't need do go higher than m=20 or 30.

The way that PROC MI works is like a 'wrapper function'. You run PROC MI on your data. You get m replicates, all in a now-larger data set. Each replicate has a different value of the variable _IMPUTATION_ . Run your planned analysis using the BY statement, with this by-variable. Then feed the results into PROC MIANALYZE to see the impact of the imputation.

And look to see if you can register at www.sierrainformation.com for the course "Treatment of Missing Data via Maximum Likelihood and Multiple Imputation" by Paul Allison, anytime soon. Or consider getting your boss to hire a statistical consultant to lead you through these dark and twisty passages, all alike. This is complex enough that you cannot really expect some yahoo in Oregon to be able to walk you through all your data problems when he can't sit down and work with your data.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page