LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2003, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 11 Dec 2003 11:04:01 +0100
Reply-To:   Ace <b.rogers@VIRGIN.NET>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ace <b.rogers@VIRGIN.NET>
Subject:   Re: Manipulating Data Sets in the SET statement
Content-Type:   text/plain; charset=us-ascii

On 10 Dec 2003 12:39:21 -0800, soberhi@yahoo.com (Sober Hi) wrote:

>I am new to SAS, but not programming. > >If I have the following: > >1 Data B; >2 Set A; >3 if col1='2003'; >4 col2 = UPCase(col2); >5 if col3=' ' then col3='NEW'; >6 rename col4=colFOUR; >7 Run; >8 >9 Data B(drop col1); >10 set B; >11 run; > >Is there someway/reason to combine lines 2-6 into ONE SINGLE SET >statement and statements 9-11 into statement 1? > >Like: >20 Data B(drop=col1); >21 Set A (rename=(col4=colFOUR) WHERE=(col1="2003")); >22 Run; > >I found only one or two articles on this and it seemed like FILTERING >in SET is much more efficient... is this true?

Subsetting, or filtering as you put it, can be done at several points within a data step, using a subsetting IF (as in your first example), a WHERE statement, or a where clause as a dataset option. It's the latter that you're using on the SET statement, but it can equally well be applied whenever a sas dataset is referenced (e.g. on a MERGE statement or DATA= within 99% of sas PROCs).

You're right, generally speaking, that using it in this way will be more efficient. In principle, in a data step with a single read a WHERE statement _should_ be applied in exactly the same way, but in practise doesn't always seem to be. But it's never going to be less efficient so it's probably not a bad idea to stick with this method where possible.

If you're reading multiple datasets (with SET, UPDATE or MERGE) it will always be more efficient to subset each input dataset separately with a WHERE= dataset option.

Of course, it may not always be possible to do so, for instance when conditions rely on variables from two or more input datasets. In these cases, you'll not be able to use a WHERE= dataset option, but neither will a WHERE statement work, as SAS will try to apply this in the same way, i.e. on all input datasets. So in this case you must go back to a subsetting IF statement. Or an IF with an explicit OUTPUT or DELETE statement, of course.

-- Ace in Basel - brucedotrogers a.t rochedotcom


Back to: Top of message | Previous page | Main SAS-L page