LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2002)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 7 Oct 2002 09:37:47 -0700
Reply-To:     "Patrick E. Burns" <patrickburns@economicrt.org>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Patrick E. Burns" <patrickburns@economicrt.org>
Subject:      Consolidating String Variables after Data Restructuring...
Content-Type: text/plain; charset="us-ascii"; format=flowed

Dear SPSSers,

I have a data set that I converted from multiple records per person to one record to person (using CasestoVars). The data contains numerical, date, and string variables. Many of the original variables have been expanded into multiple iterations per person, so that I have 8 Date of Birth variables, 8 Sex variables, etc. I would like to consolidate these iterations of the variables back down into one Date or Birth per person, one Sex variable per person, etc.

For numeric and date variables, I can pretty easily consolidate the variables back into just one or two by getting the average, or min/max. But what about string variable like Sex, for which I have values of M and F now spread across 8 iterations of the original variable? Most of the time, the cells are blank, but occasionally they are not:

sex.1 sex.2 sex.3 sex.4 sex.5 sex.6 sex.7 sex.8 M F F M M F M F F F F F M M M F

Has anybody developed a good strategy for dealing with this sort of data restructuring problem? Am I better off recoding these string variables into a numeric, and then consolidating them that way somehow? I have over a million rows of data, so there may be a data entry error where one individual may have both a M and F value, but for the most part, there should be only a few of these. There are several cases (10 - 20%) where the individual will have more than one entry, such and the 4th and 7th persons above.

One strategy might be to recode all M values to 1, and all F values to 10, and the SUM (sex.1 sex.2 sex.3 sex.4 sex.5 sex.6 sex.7 sex.8). If the SUM is less than 9, then recode the consolidated Sex variable as M. If it is some multiple of 10, then recode as F, and for those other combinations, (11, 31, etc.) recode those by hand...

Any thoughts?

PATRICK

Patrick Burns, Senior Researcher Economic Roundtable Los Angeles, California 90015 Email: patrickburns@economicrt.org


Back to: Top of message | Previous page | Main SPSSX-L page