Date: Mon, 7 Oct 2002 09:37:47 -0700
Reply-To: "Patrick E. Burns" <patrickburns@economicrt.org>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Patrick E. Burns" <patrickburns@economicrt.org>
Subject: Consolidating String Variables after Data Restructuring...
Content-Type: text/plain; charset="us-ascii"; format=flowed
Dear SPSSers,
I have a data set that I converted from multiple records per
person to one record to person (using CasestoVars). The data contains
numerical, date, and string variables. Many of the original variables have
been expanded into multiple iterations per person, so that I have 8 Date of
Birth variables, 8 Sex variables, etc. I would like to consolidate these
iterations of the variables back down into one Date or Birth per person,
one Sex variable per person, etc.
For numeric and date variables, I can pretty easily consolidate
the variables back into just one or two by getting the average, or
min/max. But what about string variable like Sex, for which I have values
of M and F now spread across 8 iterations of the original variable? Most
of the time, the cells are blank, but occasionally they are not:
sex.1 sex.2 sex.3 sex.4 sex.5 sex.6 sex.7 sex.8
M
F
F
M M
F
M
F F
F
F
F
M
M
M
F
Has anybody developed a good strategy for dealing with this sort of data
restructuring problem? Am I better off recoding these string variables
into a numeric, and then consolidating them that way somehow? I have over
a million rows of data, so there may be a data entry error where one
individual may have both a M and F value, but for the most part, there
should be only a few of these. There are several cases (10 - 20%) where
the individual will have more than one entry, such and the 4th and 7th
persons above.
One strategy might be to recode all M values to 1, and all F values to 10,
and the SUM (sex.1 sex.2 sex.3 sex.4 sex.5 sex.6 sex.7 sex.8). If the SUM
is less than 9, then recode the consolidated Sex variable as M. If it is
some multiple of 10, then recode as F, and for those other combinations,
(11, 31, etc.) recode those by hand...
Any thoughts?
PATRICK
Patrick Burns, Senior Researcher
Economic Roundtable
Los Angeles, California 90015
Email: patrickburns@economicrt.org
|