Date: Fri, 25 Jan 2008 01:38:13 +0200
Reply-To: hillel vardi <hilel@BGU.AC.IL>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: hillel vardi <hilel@BGU.AC.IL>
Subject: Re: How to best restructure a datafile to include a multiline
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Here is a syntax to restructure your data from long to wide .
If the 600 code list have some information on the code you should match
them before the casestovars command .
no problem of unique ID , the 600code table should be unique on
variable "code" and you should declare it as table in the
match files command (see the syntax ) .
you can use the autorecode command to transform your code to numeric code .
data list fixed / aseID Code VariableA(a9 a15 a4).
08/000125 1--***-*** a
08/000124 1-SF- a
08/000117 1-SF- a
08/000116 1-SF-411; a
08/000115 1--***-***; a
08/000114 1-SF-300; a
08/000113 1-SF-325; a
08/000112 1-SF- a
end data .
autorecode code/ into icode /print .
*** if you need to match the 600 code do it here .
* sort cases by code .
* match files file=* / table =cod600.sav/ by code .
if aseID ne ' ' tmpid=aseID .
if aseID eq ' ' aseID=tmpid.
SORT CASES BY aseID .
/ID = aseID
/GROUPBY = VARIABLE .
Eero Olli wrote:
> Hi to you all,
> I have been reading this group for a while, and have had great fun with
> it. Now I have met a problem that I do not see a elegant solution for
> within SPSS.
> I need to make a decision on how to restructure a datafile that contains
> thousands of cases and around 20 variables. The data comes from a
> database, and the one of the fields, "Code", is a table mapped to the
> rest of the data. Thus this field can contain from 0 to 20 pieces of
> information, each one with on a new line, in no particular order. There
> is a matching list (.xls) of total close to 600 codes, which I would
> like to convert to a variables and their values in my datafile.
> Unfortunately, I do not have access to the database, only excel files
> exported out of the database.
> OLDstructure (in excel)
> CaseID Code VariableA
> 08/000125 1--***-*** a
> 08/000124 1-SF- a
> 08/000117 1-SF- a
> 08/000116 1-SF-411; a
> 08/000115 1--***-***; a
> 08/000114 1-SF-300; a
> 08/000113 1-SF-325; a
> 08/000112 1-SF- a
> I want keep all information. One way would be to restructure the code
> so that "code" is transformed from several lines to only one line.
> Example of this is below. Then I can use IF INDEX(NewCode)... to give
> values to variabels.
> Newstructure (suggestion)
> CaseID NewCode
> 08/000125 1--***-***
> 08/000124 1-SF-
> 08/000117 1-SF-
> 08/000116 1-SF-411;2-SF-512.2;3-SF-512.3;4-SF-902;5-SF-711 a
> 08/000115 1--***-***; 2--***-***;3--***-***;4--***-*** a
> So my questions are
> 1) What is the best way to restructure the datafile?
> 2) or perhaps there is a elegant way to work on files of irregular
> structre? I have seen datafiles where one case is allways on several
> lines, but never on a random number of lines.
> 3) Are the other more elegant ways to recode "newCode" to a set of
> variables. I am thinking about taking the list of 600 codes and turning
> it into a table with Code, Variable, Value. And then somehow matching
> this with the restructured datafile (but I suspect that a regular MATCH
> FILES would not succeed, because there is no unique ID).
> Eero Olli
> Eero Olli
> the Equality and Anti-discrimination Ombud
> +47 2405 5951
> POB 8048 Dep, N-0031 Oslo, Norway
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
For a list of commands to manage subscriptions, send the command