Date: Thu, 22 Jun 2006 09:52:19 -0400
Reply-To: "Kevin F. Spratt" <Kevin.F.Spratt@DARTMOUTH.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Kevin F. Spratt" <Kevin.F.Spratt@DARTMOUTH.EDU>
Subject: Data Restructuring Question (Revised).
Content-Type: text/plain; charset="us-ascii"; format=flowed
I want to thank the list for the useful comments.
I tried the suggestion from Mark Terjeson and got what I wanted, or I
thought so until I read the comments from Howard Schreier, Ian Whitlock and
others and realized I didn't provide an example that is sufficiently general
to the situations that I might observe. Thus, I have modified my example
and changed the structure to what I think would allow the structures to remain
consistent if one wished to move between them.
The question remains, however, how to move from my original structure 1 or
to my newly modified structure 3:
I have data structured like this:
Structure 1:
id dx
1 a
1 b
1 c
1 d
1 d * Note a duplicate dx within an id
2 d
2 e
3 g
4 h
4 i
4 b
4 i * Note a duplicate dx within an id (there could be more than one
duplicate)
.
.
.
With the possibility of repeated dx within an ID, structure 1 could be
modified to structure 1a as:
Structure 1a:
id dx count
1 a 1
1 b 1
1 c 1
1 d 2
2 d 1
2 e 1
3 g 1
4 h 1
4 i 2
4 b 1
.
.
.
In other words, multiple records for a patient with various diagnostic codes
With help from the list in the past, I restructured these data to have one
record per patient
as follows
Structure 2:
id dx1 dx2 dx3 dx4 dx5
1 a b c d d
2 d e
3 g
4 h i b i
From either structure 1, 1a, or 2 above, I would like to have the data
structures as follows:
id dx_a dx_b dx_c dc_d dx_e dx_g dx_h dx_i
1 1 1 1 2 0 0 0 0
2 0 0 0 1 1 0 0 0 note the 2 indicates 2
dx within a person and could be more.
3 0 0 0 0 0 1 0 0 than 2
4 0 1 0 0 0 0 1 2
In case it makes a difference, in the real data, the dx fields are
character but not all
the same length.
Coding efficiency is an issue as I have 1,000,000 records and there are
many more than 8 different
dx codes.
I am running SAS 9.1 on the XP platform.
As always, any suggestions welcome and I hope the above information is
sufficient.
PS
It turns out that in my original data there were no duplicate dx within id
and so
Mark's solution worked for me. However, I have other data sets where there
will be duplicates
within id and so the more generalized approach for restructuring is important.
______________________________________________________________________
Kevin F. Spratt, Ph.D.
Department of Orthopaedic Surgery
Dartmouth Medical School
One Medical Center Drive
DHMC
Lebanon, NH USA 03756
(603) 653-6012 (voice)
(603) 653-6013 (fax)
(603) 252-5922 (cell)
Kevin.F,Spratt@Dartmouth.Edu (e-mail)
_______________________________________________________________________