LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2000)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sun, 6 Feb 2000 12:26:29 -0500
Reply-To:     james watts <watsdat@NETSIDE.COM>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         james watts <watsdat@NETSIDE.COM>
Subject:      Re: Missing Values; was PROC MEANS question
Comments: To: Lary Jones <ljones@BINGHAMTON.EDU>
Content-Type: text/plain; charset=us-ascii

Hi. Have you seen Little and Rubin's Statistical Analysis with Missing Data, J. Wiley, 1987? The manuals of Systat 9 I think explain well the Little & Rubin tests for missing at random and missing completely at random. I haven't seen the SPSS MVA, but these tests are generic, so I would guess they would be there also. If you can get the Statistics II manual for Systat (pg.9) and the see the graphics (matrices formed from the whole data array) that program produces I think you would be better informed than if I tried to explain the chi-square test that Little & Rubin use to make their distinctions. James Watts

Lary Jones wrote:

> What follows is a message which I just sent out to SAS-L. Since the issues > I raise relate to approaches to data analysis, not the software, I thought > I would post it here, as well. > > My background in social psychology and measurement has made me into a bit > of a curmudgeon about replacement of missing values. I would be most > interested to hear the opinions of others. In my woeful ignorance of > recent literature, I wonder if anyone has developed a technique for > assessing the randomness of missing values within the rectangular array of > a list of variables over all the subjects. My opinion is that missing > value replacement is only justified if we know that the pattern of missing > values is haphazard (unrelated to the row or column of the data matrix). > > I will be researching this, but would appreciate any knowledge that others > might have. > ________________________________________________________ > Lary Jones Statistical Computing Analyst > Binghamton University LJones@Binghamton.EDU > > I found no narrowness respecting sects and opinions, but > believed that sincere, upright hearted people in every > Society who truly loved God were accepted of him. > ~John Woolman~ > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > At 08:16 AM 2/8/2000 -0500, Miller, Scott wrote: > >one thing you can do that is fairly acceptable where there would be some > >missing responses to questions in a series that constitutes a 'scale score' > >is to substitute the missing response with the mean of the other responses, > >as long as the subject answers more than .5 of the series. if the subject > >fialed to respond to at least .5 of the questions, the scale should be > >unscored (score=.) > > This is an accepted approach for many (though I question 50%; I would set a > much lower proportion). Nevertheless, I am hesitant to recommend any kind > of missing value replacement without knowing the data. > > First of all, it is often forgotten that deciding to replace missing values > depends on the assumption that the missing values occur randomly. I think > this rarely happens. It is easily understood that items which are of lower > quality (confusing, difficult to answer) and which deal with > acknowledgement of undesirable qualities will have more missing > answers. One often focuses on the relation of a item to others in the > "scale." It is important to consider the number of missing values, across > respondents, as well. I do not have in hand an exclusion rule, but I would > be very uncomfortable of any item which is missing for more than 10% of the > respondents. A noticeable collection of missing values for an item raises > questions of reliability, if not validity. > > There are a variety of methods for missing value imputation. I think this > is a case where the techniques may be outstripping our general knowledge > about appropriateness. We can devise a number of techniques which preserve > properties of a distribution. The question is really, are we applying > these techniques without thinking about the meaning of the data. Is it > better to use the sum of items with means replacing the missing values, or > to use the mean ignoring missing values? How many items in a scale do we > allow to be missing? How many missing values for an item is still "ok"? > > Being in the computing services game for the last 25 years, my knowledge of > the literature is limited. I welcome the comments of others on this issue. > > -lary jones


Back to: Top of message | Previous page | Main SPSSX-L page