```Date: Tue, 8 Feb 2000 11:20:57 -0500 Reply-To: "J. Das" Sender: "SAS(r) Discussion" From: "J. Das" Subject: Re: Missing Values; was PROC MEANS question In-Reply-To: <4.2.2.20000208092323.00b03640@mail.binghamton.edu> Content-Type: text/plain; charset="iso-8859-1" >There are a variety of methods for missing value imputation. Here are some of the methods in imputing missing data as discussed in standard textbooks: 1. The imputed value can be selected from the sample distribution. This is known as Hot Deck Imputation. 2. The missing values are replaced by a constant value which can be obtained from appropriate external sources. This is called Cold Deck Imputation. 3. Missing values can be substituted by the means calculated from the sample with responding units. This is called Mean Imputation. 4. Missing values can be predicted from a regression of the missing item on items observed for the unit. This is called Regression Imputation. 5. And "Multiple Imputation Methods" as discussed in a recent paper by T.E. Raghunathan and G.D. Paulin in American Statistical Association 1998 Proceedings of the Business and Economic Statistics Section. The References in their paper gives a list of textbooks and articles that might be helpful. Jayanta --------------------------------- Dr. Jayanta Das Senior Econometrician Integra Information, Inc. Flanders, NJ 07828 -----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of Lary Jones Sent: Tuesday, February 08, 2000 9:57 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Missing Values; was PROC MEANS question At 08:16 AM 2/8/2000 -0500, Miller, Scott wrote: >one thing you can do that is fairly acceptable where there would be some >missing responses to questions in a series that constitutes a 'scale score' >is to substitute the missing response with the mean of the other responses, >as long as the subject answers more than .5 of the series. if the subject >fialed to respond to at least .5 of the questions, the scale should be >unscored (score=.) This is an accepted approach for many (though I question 50%; I would set a much lower proportion). Nevertheless, I am hesitant to recommend any kind of missing value replacement without knowing the data. First of all, it is often forgotten that deciding to replace missing values depends on the assumption that the missing values occur randomly. I think this rarely happens. It is easily understood that items which are of lower quality (confusing, difficult to answer) and which deal with acknowledgement of undesirable qualities will have more missing answers. One often focuses on the relation of a item to others in the "scale." It is important to consider the number of missing values, across respondents, as well. I do not have in hand an exclusion rule, but I would be very uncomfortable of any item which is missing for more than 10% of the respondents. A noticeable collection of missing values for an item raises questions of reliability, if not validity. There are a variety of methods for missing value imputation. I think this is a case where the techniques may be outstripping our general knowledge about appropriateness. We can devise a number of techniques which preserve properties of a distribution. The question is really, are we applying these techniques without thinking about the meaning of the data. Is it better to use the sum of items with means replacing the missing values, or to use the mean ignoring missing values? How many items in a scale do we allow to be missing? How many missing values for an item is still "ok"? Being in the computing services game for the last 25 years, my knowledge of the literature is limited. I welcome the comments of others on this issue. -lary jones _______________________________________________________ Lary Jones % Statistical Computing Analyst Computing Services % .......................... Binghamton University % LJones@Binghamton.EDU Binghamton, NY 13902-6000 % (607) 777-2614 ```

Back to: Top of message | Previous page | Main SAS-L page