LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 26 Sep 2003 11:48:08 -0400
Reply-To:     Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:      Re: Data Step Question
Comments: To: Ben Powell <ben.powell@CLA.CO.UK>
Content-Type: text/plain


In answer to the general question:

>I'm not clear on what can be done in any single datastep and would >appreciate any pointers.

A DATA step can be as complex as you want to make it. I suggest that in most cases you should do everything you can to one data set in one step. In the old days there was a 10,000 line limit on DATA steps, but now I think the restriction is that compiled code must fit in memory. One can, of course, argue that two simpler steps may be easier to read than one complex one, or that too many simple steps make it hard to see the intent of the code, while one complex step would convey the intent more clearly.

For your explicit question, I doubt if any good mature SAS programmer would use more than one DATA step for your task. In general, I prefer dataset options to statements for KEEP, DROP and RENAME because I think they are clearer as to when the instruction is executed. So I would write,

data a; set lib.a ( keep = var1 var3 rename = ( var1 = var2 ) ) ; /* probably with more code */ run;

unless of course, the purpose was simply to print those variables. Then I wouldn't use a DATA step at all. Just

proc print data = lib.a ( rename = ( var1 = var2 ) ; var var2 var3 ; run ;

Minimizing steps is probably the single most important step to writing efficient SAS code. However it is good to be reasonable and allow PROCs to do some of the basic work because one must balance computer efficiency with code writing efficiency and maintenance efficiency. SAS programmers usually tend to stress the latter at the expense of the former except in extreme cases.

One could, in principle, read and write many unrelated datasets in one step (by placing the SET and OUTPUT statements in explicit loops); however it is not a good idea because there is little advantage in doing so. On the other hand, it is sometimes important to place the SET and/or OUTPUT statements in explicit loops because it makes the code easier to write and more natural to read. And that is a good idea.

-----Original Message----- From: Ben Powell [mailto:ben.powell@CLA.CO.UK] Sent: Thursday, September 25, 2003 9:39 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Data Step Question

I'm not clear on what can be done in any single datastep and would appreciate any pointers.

Say for example I want to copy a dataset to my work lib, rename a variable and keep that and one other variable. Because in the past I have found that sometimes a step won't action unless there is a run command after it I would break this job into 3 seperate data steps, which is quite repetative. Is there a rule of thumb for when a new datastep is needed and how many steps can be included in a datastep?


data a; set lib.a;run; data a (rename=(var1=var2)); set a;run; data a; set a; keep var2 var3; run;

This could be done with proc sql as

proc sql; create table a as select var1 as var2, var3 from lib.a; quit;

or without the lib:

proc sql; create table temp as select var1 as var2, var3 from a; create table a as select * from temp; drop table temp; quit;

Is there a tidier way to do this as a datastep?

Any help much appreciated.

Back to: Top of message | Previous page | Main SAS-L page