LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2011)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Dec 2011 14:14:33 +0000
Reply-To:     "Poes, Matthew Joseph" <mpoes@UILLINOIS.EDU>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Poes, Matthew Joseph" <mpoes@UILLINOIS.EDU>
Subject:      Re: Multiple imputation for different types of missing values
Comments: To: Kathryn Gardner <>
In-Reply-To:  <SNT104-W7967C8569968FC65D9740A6A70@phx.gbl>
Content-Type: multipart/alternative;

My experience with the handling of missing data has lead me to distrust and largely not use the SPSS Multiple Imputation module. Because of the way it draws its values and develops its estimate of the distribution, my feeling is that should only be used in situations that can truly be called MAR.

Research into the least biased way of estimating the missing value in survey research has actually shown that when the questions are based on finite scales, other methods may be more appropriate. Hot Decking is one of the best approaches for this, and some versions can even take into account multiple time points. Hot Decking is also the method used by many of the very large scale survey groups such as the Census. It's my understanding that hot decking works best when the sample size is fairly substantial. Note that Hot Decking is not supported natively in SPSS, but numerous macro's exist for it, and are fairly easy to use.

Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Kathryn Gardner Sent: Monday, December 19, 2011 3:48 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Multiple imputation for different types of missing values

Dear Art, Thank you for your useful comments. Before estimating missing data I usually remove anyone who has missed out entire questionnaires (other than for non-applicability reasons), and also check for any patterns in the missing data. I also check that the amount of missing data to be imputed is not a large amount. Up until now I have used person mean substitution at the item level, which has shown promising results, but MI seems to be the gold standard, and now that it is available in SPSS I assumed this was the way forward, even for scale items. It seems you are suggesting not, but I wondered if you knew of any references that discuss this issue?

I have noticed that there is an option to apply constraints when running MI, so that one can specify the range one would like the score to fall in e.g., 1-5. I was intending to do this for each item so that the imputation does not impute any implausible values e.g., -1. I couldn't find any discussion of this issue, but it seems like the only logical way to avoid implausible values.

I have dropped items from scales where internal consistency is lowered, but only if this is substantially so.


________________________________ Date: Fri, 16 Dec 2011 13:37:21 -0500 From:<> To:<> CC:<> Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

It is particularly in RELIABILITY that I wonder about imputing values for items from outside the set of variables that are repeated measures of a construct.

Items in a summative scale often have a restricted range 0 to 9 1 to 5 -3 to 3, etc.

A goal in scale development is to have a measurement that has convergent validity with the set of items and divergent validity for constructs other than the one you are trying to measure. One wants to work with the common variance

It is likely that variables intended to measure other constructs would relate to the unique variance within an item Well developed scales have question stems that are balanced for direction, i.e., have opposite signs on the factor they are assigned to, Would you impute items the way they were entered of after they were reflected to be unidirectional like you need to do before you run RELIABILITY?

Mean substitution is potentially problematic when a mean is across cases. I don't know about when the mean is within cases. __ Of course a lot depends on where that piece of research lies in the stream of research in that area, the goals of the particular piece of research, and where you are in the use of the data.

With regard to the alcohol scale when you think about an item would zero be a reasonable value in the order of the responses to that item? It is hard to say much more without understanding the constructs you are trying to measure and their role in your theorizing.

Also before doing any imputing do you remove items from further consideration when many of the values are missing for reasons otehr than non-applicability? Do you drop cases that have substantial amount of missing data? Or that have pattern responding? by pattern responding I mean all true, all false, alternating true and false, 1 2 3 4 5, etc. that show respondents were responding only to the request to give an answer but are not responding to the semantic content of the question?

Do you drop items from scales when their inclusion lowers the internal consistency of the summative score(total or mean)?

Of course it should be a quick project to get means, SD's and correlations several ways once you have finished cleaning the data.. With list wise deletion. With pair wise deletion. With imputed values for missing data.

Do the values differ in meaningful ways? If you do factor analyses and plot the eigenvalues from each and from parallel analyses of each? Is there much difference?

In the long run how much of your data is imputed?


Art Kendall

Social Research Consultants

On 12/16/2011 4:35 AM, Kathryn Gardner wrote: Dear Art, I wanted to impute the missing values at the item level as I thought this was more sensitive and I can then use all data in reliability analyses. Once imputed, I'll be summing the items to create scale scores to represent various constructs (e.g., alcohol use, personality, emotion regulation) that will be used in in the main analyses 2 papers I am publishing (SEM for one paper and latent profile analysis for another paper). I thought using the mean as the summative score to deal with missing data is equally as bad as using mean substitution? Why is imputing summative scale items via MI is often unnecessary? I couldn't find anything on the debate as to whether multiple imputation should be used for scale items vs. computed subscale scores etc. Kathryn ________________________________ Date: Thu, 15 Dec 2011 07:11:01 -0500 From:<> To:<> CC: SPSSX-L@LISTSERV.UGA.EDU<mailto:SPSSX-L@LISTSERV.UGA.EDU> Subject: Re: [SPSSX-L] Multiple imputation for different types of missing values

I would like to hear from other list members, but imputing summative scale items via MI is often unnecessary. You use the term items which often means the variables are meant to be used as part of a score so I am responding in that context.

Are you planning to distribute a public use data set that includes items, that includes only scales scores, or are you only working on you own data set for your use?

What is the goal of your project? finding totals, means, percents for a pop or for subpops? Or are you intending to compare and contrast groups? Or mainly interest in the relations of variables? Developing scales?

What is the response scale on the alcohol items? Are they intended to be repeated measures of a construct where the total or mean is used in analysis as the measure of a construct?

{would like to hear from other on this} If score is to be used in analysis and the mean is the summative score, just use it.

If the score is to be used as a total e.g., for comparison to published norms, then a) do what the original authors of the scale did or b) compute adjscore = sum valid items * (# of items in scale/# of items with valid values). {end of part I would like to hear from other list members about.

Art Kendall Social Research Consultants

On 12/15/2011 4:15 AM, Kathryn Gardner wrote: Dear List, I am running a multiple imputation on lots of questionnaire items and I'm trying to figure out a way to run the analysis, without imputing missing values for those participants who have missed out say all 5 items on an alcohol questionnaire because they were told to skip it if they do not drink alcohol. I don't want to exclude the alcohol measure entirely from the MI because there are also randomly missing values across these alcohol items that do need imputing. At the moment all missing values are identified as system missing in the data file, and I thought there might be a way to get SPSS to only run the MI on certain types of missing values if I coded the ones I want to be ignored as user missing, but this doesn't seem possible. The only solution I could come up with was running the MI, then manually scanning thousands of rows of data and deleting the imputed values on the alcohol measure for the participants who skipped the entire questionnaire. As you can imagine, this is taking hours! There must be a simpler way. Any advice greatly appreciated.

Kind regards, Kathryn


Back to: Top of message | Previous page | Main SPSSX-L page