Date: Fri, 8 Mar 1996 14:20:08 -0600
Reply-To: txplltw@UABCVSR.CVSR.UAB.EDU
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Todd Weiss <txplltw@UABCVSR.CVSR.UAB.EDU>
Subject: Re: Code Reduction
In-Reply-To: <9603081845.AA48793@UABCVSR.cvsr.uab.edu>
Hello to all,
I would like to add some things about the code that I
previously submitted to the list. I am not in the least
offended by the replies, but the repliers should know the following:
A) I apologize for supplying a proc print of the dataset
instead of the actual data, and for not giving a more
thorough explanation of the problem. I am attempting
to keep those individuals who have no infection and those
having a first fungal, viral, protozoal, or bacterial infection.
B) I was hoping to find a way to make one pass through the data.
C) The code is not my handy work.
D) I am aware that MACRO language can be used to make the code
more concise.
E) When I have some more time, I will try to send some data.
Thanks to all who have responded.
Todd
On Fri, 8 Mar 1996, Ian Whitlock wrote:
> Subject: Code Reduction
> Summary: SAS code and later macro suggestions are made based on the
> code given, rather than an understanding of underlying
> problem which might point to better code than offered.
> Respondent: Ian Whitlock <whitloi1@westat.com>
>
> Todd Weiss <txplltw@UABCVSR.CVSR.UAB.EDU> writes:
>
> >There maybe giggling and laughing when showing the following but
> >this problem is a little trickier than it looks. I have 4 data
> >sets(i.e. patfung patprot patvirus patbact) containing
> >observations using the code below. Would anyone be willing to share
> >a more parsimonious code solution for obtaining the same observations
> >in these 4 data sets(kind and number) using either data step or sql.
>
> The original code is given without data at the bottom.
>
> There are several obvious ways to improve the code without trying to
> understand the data. The code blocks off into 4 blocks creating the 4
> data sets mentioned.
>
> The same variables are on all these files. I strongly suspect that you
> only want certain relevant variables in each file. Hence there ought
> to be a KEEP= option limiting the variables.
>
> The next suggestion is in efficiency. The second and third steps can be
> combined. The two steps accomplish getting either the first or second
> record from a PATNUM group. I will add a flag WANTED to indicate a
> record is wanted. When the record is chosen WANTED is set to 0 so that
> no more from that block will be taken. This eliminates an extra data
> pass in each block.
>
> I also changed to my style in order to understand what is
> happening. Here is the code for the first block.
>
>
> DATA phtsf;
> SET phts ( keep = institut patnum retx infdate infect
> int_fup fungus int_fung ) ;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum then
> do ;
> if fungus = . then
> do ; /* update first record */
> int_fung = int_fup;
> fungus = 0;
> end ;
> end ;
> else
> if fungus = . then delete;
> run ;
>
> data patfung ( keep = institut patnum retx infdate infect
> fungus int_fung ) ;
> retain wanted ;
> set phtsf;
> by institut patnum infdate;
> if first.patnum = 1 then
> do ;
> wanted = 1 ;
> if not last.patnum and fungus = 0 then delete;
> end ;
> if wanted ;
> output ;
> wanted = 0 ;
> run ;
>
> In terms of pure SAS any further improvement would have to come from a
> deeper understanding of the problem. After one has sufficient
> understanding of SAS code, it is time to start learning macro. This is
> a good place to begin because the code is repetitious. Essentially the
> same thing is done four times. Let's put it in a macro so that we have
> only one copy of code to be executed 4 times. This won't make it any
> more efficient, but it will highlight the structure of what is being
> done and minimize the amount of code. This type of macro code is very
> simple because it is almost all pure SAS code. Only a few macro
> variable references are required.
>
> %macro getdat ( out = patfung , /* output data set */
> var = fungus , /* test variable */
> assign = int_fung /* assign variable */
> ) ;
> DATA temp ;
> SET phts ( keep = institut patnum retx infdate infect
> int_fup &var &assign ) ;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum then
> do ;
> if &var = . then
> do ;
> &assign = int_fup;
> &var = 0;
> end ;
> end ;
> else
> if &var =. then delete;
> run ;
>
> data &out
> ( keep = institut patnum retx infdate
> infect &var &assign )
> ;
> retain wanted ;
> set temp ;
> by institut patnum infdate;
> if first.patnum = 1 then
> do ;
> wanted = 1 ;
> if not last.patnum and &var = 0 then delete;
> end ;
> if wanted ;
> output ;
> wanted = 0 ;
> run ;
> %mend getdat ;
>
> Now your code reduces to
>
> options pagesize=80 linesize=132 notes obs=100;
> libname buildinf '/mydir';
> filename maccode '......';
>
> %inc maccode ;
>
> proc sort data=buildinf.phts(keep=institut patnum retx infdate
> infect int_fung int_prot
> int_vir int_bact fungus protozoa
> bacteria virus int_fup )
> out=phts;
> by institut patnum retx infdate;
> run ;
>
> proc print data = phts n;
> var institut patnum retx infdate infect int_fup fungus int_fung
> protozoa int_prot virus int_vir bacteria int_bact;
> run;
>
> %getdat ( out = patfung , var = fungus , assign = int_fung )
> %getdat ( out = patprot , var = protozoa, assign = int_prot )
> %getdat ( out = patvirus , var = virus , assign = int_vir )
> %getdat ( out = patbact , var = bacteria , assign = int_bact )
>
> I used
>
> data phts ;
> input institut patnum retx infdate infect
> int_fup fungus int_fung ;
> cards ;
> 1 1 1 1 1 1 1 1
> 1 2 1 1 1 1 . .
> 1 3 1 1 1 1 . .
> 1 3 1 1 1 1 . .
> run ;
>
> %getdat ( out = patfung , var = fungus , assign = int_fung )
>
> to test the syntax of the macro. This is far from an exhaustive test.
>
> Ian Whitlock
> -------------------------------------------------------------------
> options pagesize=80 linesize=132 notes obs=100;
>
> libname buildinf '/mydir';
>
> proc sort data=buildinf.phts(keep=institut patnum retx infdate infect int_fung
> int_prot
> int_vir int_bact fungus protozoa bacteria
> virus int_fup ) out=phts;
> by institut patnum retx infdate;
> proc print n;
> var institut patnum retx infdate infect int_fup fungus int_fung
> protozoa int_prot virus int_vir bacteria int_bact;
>
> run;
>
> DATA phtsf; SET phts;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum ne 1 and fungus =. then delete;
> if first.patnum = 1 and fungus =. then int_fung = int_fup;
> if first.patnum = 1 and fungus =. then fungus = 0;
>
> data patient; set phtsf;
> by institut patnum infdate;
> if first.patnum = 1 and last.patnum=0 and fungus =0 then delete;
>
> data patfung; set patient;
> by institut patnum infdate;
> if first.patnum = 1;
>
>
> DATA phtsp; SET phts;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum ne 1 and protozoa=. then delete;
> if first.patnum = 1 and protozoa=. then int_prot = int_fup;
> if first.patnum = 1 and protozoa=. then protozoa = 0;
>
>
> data patient; set phtsp;
> by institut patnum infdate;
> if first.patnum = 1 and last.patnum=0 and protozoa=0 then delete;
>
> data patprot; set patient;
> by institut patnum infdate;
> if first.patnum = 1;
>
>
> DATA phtsv; SET phts;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum ne 1 and virus =. then delete;
> if first.patnum = 1 and virus =. then int_vir = int_fup;
> if first.patnum = 1 and virus =. then virus = 0;
>
>
> data patient; set phtsv;
> by institut patnum infdate;
> if first.patnum = 1 and last.patnum=0 and virus =0 then delete;
>
> data patvirus; set patient;
> by institut patnum infdate;
> if first.patnum = 1;
>
>
>
> DATA phtsb; SET phts;
> by institut patnum retx infdate;
> if infect = . then infect = 0;
> if first.patnum ne 1 and bacteria=. then delete;
> if first.patnum = 1 and bacteria=. then int_bact = int_fup;
> if first.patnum = 1 and bacteria=. then bacteria = 0;
>
>
> data patient; set phtsb;
> by institut patnum infdate;
> if first.patnum = 1 and last.patnum=0 and bacteria=0 then delete;
>
> data patbact; set patient;
> by institut patnum infdate;
> if first.patnum = 1;
> run;
>
|