Date:         Thu, 20 Apr 2006 11:26:25 -1000
Subject:      Re: Combining categories
At 02:26 AM 4/20/2006, Peck, Jon wrote: >To go back to the original question, I surmise that what is being asked >for is a canned transformation that would recode together small category >values into a new variable, optionally respecting an ordinality property, >and combining the value labels of the merged categories. It would need to >work off of an absolute or percentage threshold for the meaning of >small. If ordinal, I suppose it would merge values into the next or >previous category while if not ordinal, it might create one "other" >category with all of these together. > >Have I got that right?

Jon, Thanks for response. Unfortunately you don't have it quite right. Here are the actual categories:

EDUC_LVL '0' 'No formal school' '1' 'Grades 1-8' '2' 'Grades 9-12 no diploma' '3' 'Spec Ed Cert/Diploma' '4' 'H.S. Grad or GED' '5' 'Post-secondary, no degree' '6' 'Assoc degree/VocTech Cert' '7' 'Bachelors Degree' '8' 'Masters or more'/

It is presently coded as a string variable, and is mostly ordinal, except that category '3' is really an alternative branch. Some will argue that it is "the same as" graduating from HS or getting a GED; Others will argue that it's not even on par with Grade 12, even without a diploma. Also, if someone has 3 years of college and then drops out before getting a degree, is that "less than" someone with a 2-year associate degree? So this variate is approximately ordinal but not quite; and besides, I've got it presently defined as a string variable.

My problem was that categories '0', '3', and '8' are relatively rare. '0' combines logically with '1' with the meaning "less than 8 full years of school," while '3' combines easily with '4' because all are certificates of completion at approximately the same level. '6' combines logically either with '5' or with '7' and '8'.

Richard Ristow suggested that the pooling could be done relatively easily using recode as follows: >To make it some easier, use TEMPORARY. If > >CROSSTABS ED_LEVL BY ... > /STATISTICS = CHISQ > >produces small cells, try something like this: >If, say, >+ high school=4, GED = 5; >+ Masters=7, other advanced degrees have higher codes; >then, > >TEMPORARY. >RECODE ED_LEVL > (5 = 4) > (7 THRU HI = 7). >CROSSTABS ED_LEVL BY ... > /STATISTICS = CHISQ

But doesn't TEMPORARY apply ONLY to the next procedure, which would have the result that it would apply only to the recode, but then forget the recode when it does the CROSSTAB?

And also, this assumes that ED_LEVL is a number, but I have it as a string variable. I can't use the "THRU HI" with a string variable, can I?

Thanks, Bob

>You could do this with a combination of AGGREGATE and other transformation >logic, but it is a natural for (surprise) programmability. The spssaux2 >module on the SPSS Code Center ( has an >example of a similar sort of complex calculation. The >CreateBasisVariables function creates a set of dummy variables >representing the distinct values of a variable suitable for use in >Regression etc. I can cook up a similar method for merging small values. > >Regards, >Jon Peck

Robert M. Schacht, Ph.D.

