LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 22 Jun 2006 09:52:19 -0400
Reply-To:     "Kevin F. Spratt" <Kevin.F.Spratt@DARTMOUTH.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Kevin F. Spratt" <Kevin.F.Spratt@DARTMOUTH.EDU>
Subject:      Data Restructuring Question (Revised).
Content-Type: text/plain; charset="us-ascii"; format=flowed

I want to thank the list for the useful comments.

I tried the suggestion from Mark Terjeson and got what I wanted, or I thought so until I read the comments from Howard Schreier, Ian Whitlock and others and realized I didn't provide an example that is sufficiently general to the situations that I might observe. Thus, I have modified my example and changed the structure to what I think would allow the structures to remain consistent if one wished to move between them.

The question remains, however, how to move from my original structure 1 or to my newly modified structure 3:

I have data structured like this:

Structure 1:

id dx 1 a 1 b 1 c 1 d 1 d * Note a duplicate dx within an id 2 d 2 e 3 g 4 h 4 i 4 b 4 i * Note a duplicate dx within an id (there could be more than one duplicate)

. . .

With the possibility of repeated dx within an ID, structure 1 could be modified to structure 1a as:

Structure 1a:

id dx count 1 a 1 1 b 1 1 c 1 1 d 2 2 d 1 2 e 1 3 g 1 4 h 1 4 i 2 4 b 1 . . .

In other words, multiple records for a patient with various diagnostic codes

With help from the list in the past, I restructured these data to have one record per patient as follows

Structure 2:

id dx1 dx2 dx3 dx4 dx5 1 a b c d d 2 d e 3 g 4 h i b i

From either structure 1, 1a, or 2 above, I would like to have the data structures as follows:

id dx_a dx_b dx_c dc_d dx_e dx_g dx_h dx_i 1 1 1 1 2 0 0 0 0 2 0 0 0 1 1 0 0 0 note the 2 indicates 2 dx within a person and could be more. 3 0 0 0 0 0 1 0 0 than 2 4 0 1 0 0 0 0 1 2

In case it makes a difference, in the real data, the dx fields are character but not all the same length.

Coding efficiency is an issue as I have 1,000,000 records and there are many more than 8 different dx codes.

I am running SAS 9.1 on the XP platform.

As always, any suggestions welcome and I hope the above information is sufficient.

PS

It turns out that in my original data there were no duplicate dx within id and so Mark's solution worked for me. However, I have other data sets where there will be duplicates within id and so the more generalized approach for restructuring is important.

______________________________________________________________________

Kevin F. Spratt, Ph.D. Department of Orthopaedic Surgery Dartmouth Medical School One Medical Center Drive DHMC Lebanon, NH USA 03756 (603) 653-6012 (voice) (603) 653-6013 (fax) (603) 252-5922 (cell) Kevin.F,Spratt@Dartmouth.Edu (e-mail) _______________________________________________________________________


Back to: Top of message | Previous page | Main SAS-L page