Date: Fri, 26 Sep 2003 11:48:08 -0400
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Data Step Question
Content-Type: text/plain
Ben,
In answer to the general question:
>I'm not clear on what can be done in any single datastep and would
>appreciate any pointers.
A DATA step can be as complex as you want to make it. I suggest that in
most cases you should do everything you can to one data set in one step. In
the old days there was a 10,000 line limit on DATA steps, but now I think
the restriction is that compiled code must fit in memory. One can, of
course, argue that two simpler steps may be easier to read than one complex
one, or that too many simple steps make it hard to see the intent of the
code, while one complex step would convey the intent more clearly.
For your explicit question, I doubt if any good mature SAS programmer would
use more than one DATA step for your task. In general, I prefer dataset
options to statements for KEEP, DROP and RENAME because I think they are
clearer as to when the instruction is executed. So I would write,
data a;
set lib.a ( keep = var1 var3
rename = ( var1 = var2 )
) ;
/* probably with more code */
run;
unless of course, the purpose was simply to print those variables. Then I
wouldn't use a DATA step at all. Just
proc print data = lib.a ( rename = ( var1 = var2 ) ;
var var2 var3 ;
run ;
Minimizing steps is probably the single most important step to writing
efficient SAS code. However it is good to be reasonable and allow PROCs to
do some of the basic work because one must balance computer efficiency with
code writing efficiency and maintenance efficiency. SAS programmers usually
tend to stress the latter at the expense of the former except in extreme
cases.
One could, in principle, read and write many unrelated datasets in one step
(by placing the SET and OUTPUT statements in explicit loops); however it is
not a good idea because there is little advantage in doing so. On the other
hand, it is sometimes important to place the SET and/or OUTPUT statements in
explicit loops because it makes the code easier to write and more natural to
read. And that is a good idea.
IanWhitlock@westat.com
-----Original Message-----
From: Ben Powell [mailto:ben.powell@CLA.CO.UK]
Sent: Thursday, September 25, 2003 9:39 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Data Step Question
I'm not clear on what can be done in any single datastep and would
appreciate any pointers.
Say for example I want to copy a dataset to my work lib, rename a variable
and keep that and one other variable. Because in the past I have found that
sometimes a step won't action unless there is a run command after it I would
break this job into 3 seperate data steps, which is quite repetative. Is
there a rule of thumb for when a new datastep is needed and how many steps
can be included in a datastep?
e.g.
data a;
set lib.a;run;
data a (rename=(var1=var2));
set a;run;
data a;
set a;
keep var2 var3;
run;
This could be done with proc sql as
proc sql;
create table a as
select var1 as var2, var3
from lib.a;
quit;
or without the lib:
proc sql;
create table temp as
select var1 as var2, var3
from a;
create table a as
select *
from temp;
drop table temp;
quit;
Is there a tidier way to do this as a datastep?
Any help much appreciated.