Date: Thu, 2 Aug 2007 15:01:24 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Restructuring data
In-Reply-To: <535C1C66ADE0F8469C080434A45B188909F7447A@TMAMS02.tmant.tex
med.org>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 01:22 PM 8/2/2007, Jan Veenstra wrote:
>I have a data file that contains information listed in multiple rows
>that I need to restructure so that the rows for each ID are combined
>into a single row. Please note that duplicates are also allowed in
>this file. I have included a sample below. [I]have not been able to
>get the CASETOVARS syntax to work. I also tried to aggregate the data
>and then use CASETOVARS but it only moved the row number around and
>not the data.
In the following, I haven't addressed the problem of duplicates. I'm
not sure what is 'duplicated' - just the HCPC_Code, the whole record,
or what? Anyway,
Input data (SPSS 15 draft output):
|-----------------------------|---------------------------|
|Output Created |02-AUG-2007 14:31:28 |
|-----------------------------|---------------------------|
Billing_ID Line_No HCPC_Code Modifier_1 Modifier_2
7707 1 9567 03 01
7710 1 9567 . .
7710 2 9658 . .
7710 3 9567 . .
Number of cases read: 4 Number of cases listed: 4
>What I need them to look like is as follows. It basically moves line
>numbers 2 and 3 to new variables. I have multiple line counts for
>each ID - some have 3 lines, some have more.
>
>Billing_ID Line_No HCPC_Code HCPC_2 HCPC_3 Modifier_1 Modifier_2
>7707 1 9567 03 01
>7710 1 9567 9658 9567
So far, except for the 'Modifier' variables, and the "Many other
variables" that I'm ignoring, this is straightforward CASESTOVARS that
could be clicked up from the menus. I clicked this one up, then added
the /RENAME and /SEPARATOR subcommands. It isn't *quite* the above
format, but it's very close. (You can use a /KEEP= list on ADD FILES to
reorder the variables, if you like.) Continuing SPSS 15 draft output
(WRR:not saved separately):
SORT CASES BY Billing_ID Line_No .
CASESTOVARS
/ID = Billing_ID
/INDEX = Line_No
/RENAME HCPC_Code=HCPC
/SEPARATOR = '_'
/GROUPBY = VARIABLE .
Cases to Variables
|----------------------------|---------------------------|
|Output Created |02-AUG-2007 14:55:02 |
|----------------------------|---------------------------|
Generated Variables
|---------|-------|------|
|Original |Line_No|Result|
|Variable | |------|
| | |Name |
|---------|-------|------|
|HCPC_Code|1 |HCPC_1|
| |2 |HCPC_2|
| |3 |HCPC_3|
|---------|-------|------|
Processing Statistics
|---------------|---|
|Cases In |4 |
|Cases Out |2 |
|---------------|---|
|Cases In/Cases |2.0|
|Out | |
|---------------|---|
|Variables In |5 |
|Variables Out |6 |
|---------------|---|
|Index Values |3 |
|---------------|---|
LIST.
List
|-----------------------------|---------------------------|
|Output Created |02-AUG-2007 14:55:02 |
|-----------------------------|---------------------------|
Billing_ID Modifier_1 Modifier_2 HCPC_1 HCPC_2 HCPC_3
7707 03 01 9567 . .
7710 . . 9567 9658 9567
Number of cases read: 2 Number of cases listed: 2
...............
Now, what of Modifier_1, Modifier_2, and the "many other variables"? If
they appear on every line for each ID, and have the same values on
every line, this code will work as is, though always putting the HCPC
codes at the end of the records. If, say, they only appear on Line 1,
you need something more, possibly using AGGREGATE but I can't say at
this point.
-Good luck,
Richard
===================
APPENDIX: Test data
===================
(This code works, but gives warning messages.)
DATA LIST LIST SKIP=1
/Billing_ID Line_No HCPC_Code Modifier_1 Modifier_2
(F4, F2, F4, N2, N2).
BEGIN DATA
Billing_ID Line No. HCPC Code Modifier_1 Modifier_2
7707 1 9567 03 01
7710 1 9567
7710 2 9658
7710 3 9567
END DATA.
FORMATS Modifier_1 Modifier_2 (N2).
LIST.