LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2008, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 10 Dec 2008 09:24:38 -0500
Reply-To:     "Howard Schreier <hs AT dc-sug DOT org>"
              <schreier.junk.mail@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Howard Schreier <hs AT dc-sug DOT org>"
              <schreier.junk.mail@GMAIL.COM>
Subject:      Re: CORRECTION: PROC SQL - select records according to FORMAT
              value rather than actual value

On Wed, 10 Dec 2008 13:02:37 +1300, Adrian Slack <Adrian.Slack@VUW.AC.NZ> wrote:

>Sorry, I pasted the example code over with errors. Corrected example >below. > >DATA temp00; > INPUT id diag01 $ ecode01 $ wies ; > CARDS ; > 1 C000 . 0.7129 > 2 C000 W1009 0.4105 > 3 C000 Y838 6.9079 > 4 S010 . 3.0588 > 5 S010 W2208 0.2986 > 6 S010 Y870 2.4198 >; > >DATA temp01; > INPUT ecode01 $; > CARDS ; > W1009 > W2208 >; > >DATA temp02 (RENAME= (ecode01=start)); > SET temp01; > fmtname='$injury'; > label = 1; >RUN; > >PROC FORMAT cntlin=temp02; >QUIT; > >DATA temp03; >SET temp01; >FORMAT ecode01 $injury.; >RUN; > >PROC SQL; >CREATE TABLE temp04 AS >SELECT * >FROM temp03 >WHERE ecode01 = '1'; >QUIT; > >PROC PRINT DATA= temp04; QUIT; > >> ______________________________________________ >> From: Adrian Slack >> Sent: Wednesday, 10 December 2008 12:53 p.m. >> To: 'SAS-L@LISTSERV.UGA.EDU' >> Subject: PROC SQL - select records according to FORMAT value >> rather than actual value >> >> Hi >> >> Question: >> How do I use a WHERE statement in PROC SQL to select a record >> according to a variable's formatted value rather than its actual >> value? >> >> E.g.: >> Suppose X1 = C000 and applying a (character) format makes X1 = '1'. >> Here, 1 is treated as a character rather than a number (reasoning >> below). A straight WHERE X = '1' statement in PROC SQL doesn't pick up >> this record, because the actual value is 'C000' not '1'. >> >> Background: >> I have a large patient discharge dataset. I want to select specific >> records if their primary diagnosis (DIAG01) or their external injury >> (ECODE01) match the list I'm interested in, so I can ultimately sum a >> cost variable (WIES). A record may have both a DIAG01 and ECODE01 >> that I am interested in, but I only want to count each record once. >> The ECODE01 takes priority. >> >> So, I want to split the dataset into two, and analyse the resulting >> sets separately. That is, I want a set with matched ECODE01s and >> residual set with matching DIAG01s (the remaining, unmatched records >> are irrelevant and I don't want to keep them in the analytical >> datasets). >> >> I've managed to do this by a torturous PROC SQL where I list all the >> ECODE01s I want to keep. But this is hard coded in with a statemetn >> WHERE ECODE01 = 'xxxx' OR ECODE01 = 'xxxx' OR ... >> But I want to be able to easily track what's on the list, and add or >> remove conditions by changing an input file rather than my code. >> >> Partial solution: >> I've just thought, instead, of using PROC FORMAT, as follows using a >> dataset (TEMP01), which is a list of all the ECODE01s I want to match, >> to create a format ($INJURY), applying this to my dataset (TEMP00), >> and then using PROC SQL to create two datasets with the appropriate >> records (matched ECODE01s and matched DIAG01s; I haven't done the >> latter step for matching DIAG01s yet). >> >> data temp00; >> input ID DIAG01 $ ECODE01 $ WIES ; >> cards ; >> 1 C000 . 0.7129 /* Keep because of >> relevant DIAG01 */ >> 2 C000 W1009 0.4105 /* Keep because of >> relevant ECODE01; overrides relevant DIAG01 */ >> 3 C000 Y838 6.9079 /* Keep because of >> relevant DIAG01 despite irrelevant ECODE01 */ >> 4 S010 . 3.0588 /* Discard, nothing >> relevant */ >> 5 S010 W2208 0.2986 /* Keep because of >> relevant ECODE01 */ >> 6 S010 Y870 2.4198 /* Discard, nothing >> relevant */ >> ; >> >> data temp01; >> input ECODE01 $; >> cards ; >> W1009 >> W2208 >> ; >> >> DATA temp02 (rename= (ecode01=start)); >> set temp01; >> fmtname='$injury'; /* Defines a character >> format, so I can apply it to an existing character varialbe >> label = 1; /* This comes out as a >> character variable; I don't want to use a letter. (Because my format >> is LENGTH 1, it only picks up the first element of the variable to >> which I apply the format (and which always starts with a letter.) */ >> RUN; >> >> PROC FORMAT cntlin=temp02; >> QUIT; >> >> DATA temp03; >> SET temp02; >> FROMAT ecode01 $injury.; >> RUN; >> >> PROC SQL; >> CRATE TABLE temp04 AS >> SELECT * >> FROM temp03 >> WHERE ecode01 = '1'; >> QUIT; >> >> PROC PRINT DATA= temp04; QUIT; >> >> NOTE: No observations in data set WORK.TEMP04. >> NOTE: The PROCEDURE PRINT used 0.0 seconds. >> >> Problem: >> There are no records where ecode01 = '1', only (some) records where >> the format of ecode01 = '1', i.e. records 2 and 5. Can I modify the >> WHERE statement so it targets the formatted value rather than the >> actual value? >> >> Thanks in advance. >> >> Adrian. >> >> ---------------------------------------------------------------------- >> ---------------------------------------- >> School of Economics and Finance >> Victoria University of Wellington >> >> P +64 4 463 5233 xtn 8571 >> E adrian.slack@vuw.ac.nz >> A Room 315, Rutherford House, PO Box 600, Wellington, NEW ZEALAND >> ---------------------------------------------------------------------- >> ---------------------------------------- >>

I think it is a very good idea to keep your lists of "interesting" codes in tables, rather than embedding them into code.

I would not create two separate output files. Rather, have one file with a categorical variable to distinguish the subsets.

SQL can probably make it all rather simple, eliminating need for formats altogether. Instead, use subqueries to do the table lookups. A CASE structure can impose the hierarchy and avoid double counting. The aggregation can even be done in the same statement. The code would be something like:

data have; input ID DIAG01 $ ECODE01 $ WIES ; cards ; 1 C000 . 0.7129 2 C000 W1009 0.4105 3 C000 Y838 6.9079 4 S010 . 3.0588 5 S010 W2208 0.2986 6 S010 Y870 2.4198 ;

data ecodes; input keeper $; cards ; W1009 W2208 ;

data diags; input keeper $; cards ; C000 ;

proc sql; create table need as select case when ecode01 in (select keeper from ecodes) then 'ecode' when diag01 in (select keeper from diags ) then 'diag' else 'no' end as Category , sum(wies) as Wies from have group by category; quit;

Result:

Category Wies

diag 7.6208 ecode 0.7091 no 5.4786


Back to: Top of message | Previous page | Main SAS-L page