LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 15 Dec 2009 12:55:09 -0500
Reply-To:   Jonathan Goldberg <jgoldberg@BIOMEDSYS.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Jonathan Goldberg <jgoldberg@BIOMEDSYS.COM>
Subject:   Re: Creating Dummy Variables from Values of Variables (reformatted to be more readable, I HATE Outlook's formatting of my messages)
Comments:   To: "Kirby, Ted" <ted.kirby@LEWIN.COM>

This task can be done in one pass of the patient data by a data step using a hash table and an array. Also needed: a preprocess of the the TopX data set to add an index.

data TopX; set Topx(keep = code); index + 1; run;

proc sql noprint; select cats("dummy" code) into :codes separated by " " from topx; quit;

data want; array codes(*) &codes; array dxs(*) dx1 - dx9; /*you could use macro here to allow*/ /*for a varying number of dxs*/ if _n_ = 1 then do; if 0 then set TopX; dcl hash codetable(dataset: "TopX"); codetable.definekey("code"); codetable.definedata("index"); codetable.definedone(); end;

do i = 1 to dim(codes); /Initialize dummys to zero*/ code(i) = 0; end;

set have;

do i = 1 = dim(dxs); code = dxs(i); rc = codetable.find(); /*found a top whatever dx?*/ if rc = 0 then codes(index) = 1; /*set it to 1*/ end;

run;

The proc sql step is not needed for a single set of codes; you could just hand code them into the array. The codes just need to be in the same order they are in the data set TopX so the index variable points to the right array member.

Cheers,

Jonathan

On Mon, 14 Dec 2009 23:32:20 -0500, Kirby, Ted <ted.kirby@LEWIN.COM> wrote:

>We have a dataset that has up to nine ICD-9 diagnosis codes in each record. We want to identify the top 50 (or 100 or 250) ICD-9 and construct dummy variables for the existence of those codes. > >Thanks to Dan Nordlund's reply to Trish Bous (at http://groups.google.com/group/comp.soft- sys.sas/browse_thread/thread/c82586e7f1164e54/ec5fc63a3eb16d9b? lnk=gst&q=proc+freq+order) we have the first part done (the identification of the top x codes and have saved them in a dataset. The first four observations of this dataset can be created with: > >data TopX; >input code $5. count; >datalines; >5990 36163 >4019 34429 >2639 31573 >5188126200 >; >run; > >How do we create the appropriate dummy variables? We would like to add variables to the dataset named dummyxxxxx for each of the top diagnosis codes that would have a value of 1 if the appropriate ICD-9 Diagnosis Code appears in any of the diagnosis code fields. We would use the TopX dataset to construct the names of the dummy variables, but setting each of the dummy variables would have to be done in the main dataset. For example, in the following dataset: > >data have; >input patid (Dx1-Dx9) ($5); >datalines; > 1 51881 5990 8052 85201 V550 V5419 73300 57420 5932 > 2 49121 51881 486 V850 41401 4019 7837 V462 > 3 51884 33520 34400 5070 5601 5990 V4611 33522 51909 > 4 03811 2639 2793 73021 V090 99591 07070 2384 311 > 5 56961 486 1962 4019 2749 27801 3051 V1005 78606 > 6 99832 73026 V4586 04111 496 42731 27800 V1259 71590 >; >run; > >We would want to add the dummy variables with the following values: > >PatID Dummy5990 Dummy4019 Dummy2639 Dummy51881 > 1 1 0 0 1 > 2 1 0 0 1 > 3 1 0 0 0 > 4 0 0 1 0 > 5 0 1 0 0 > 6 0 0 0 0 > >Thanks for any assistance you can render. > >************* IMPORTANT - PLEASE READ ******************** > >This e-mail, including attachments, may include confidential and/or proprietary information, and may be used only by the person or entity to which it is addressed. If the reader of this e-mail is not the intended recipient or his or her authorized agent, the reader is hereby notified that any dissemination, distribution or copying of this e-mail is prohibited. If you have received this e-mail in error, please notify the sender by replying to this message and delete this e-mail immediately. >


Back to: Top of message | Previous page | Main SAS-L page