|Date: ||Thu, 11 Dec 2003 11:04:01 +0100|
|Reply-To: ||Ace <b.rogers@VIRGIN.NET>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Ace <b.rogers@VIRGIN.NET>|
|Subject: ||Re: Manipulating Data Sets in the SET statement|
|Content-Type: ||text/plain; charset=us-ascii|
On 10 Dec 2003 12:39:21 -0800, email@example.com (Sober Hi) wrote:
>I am new to SAS, but not programming.
>If I have the following:
>1 Data B;
>2 Set A;
>3 if col1='2003';
>4 col2 = UPCase(col2);
>5 if col3=' ' then col3='NEW';
>6 rename col4=colFOUR;
>9 Data B(drop col1);
>10 set B;
>Is there someway/reason to combine lines 2-6 into ONE SINGLE SET
>statement and statements 9-11 into statement 1?
>20 Data B(drop=col1);
>21 Set A (rename=(col4=colFOUR) WHERE=(col1="2003"));
>I found only one or two articles on this and it seemed like FILTERING
>in SET is much more efficient... is this true?
Subsetting, or filtering as you put it, can be done at several points
within a data step, using a subsetting IF (as in your first example),
a WHERE statement, or a where clause as a dataset option. It's the
latter that you're using on the SET statement, but it can equally well
be applied whenever a sas dataset is referenced (e.g. on a MERGE
statement or DATA= within 99% of sas PROCs).
You're right, generally speaking, that using it in this way will be
more efficient. In principle, in a data step with a single read a
WHERE statement _should_ be applied in exactly the same way, but in
practise doesn't always seem to be. But it's never going to be less
efficient so it's probably not a bad idea to stick with this method
If you're reading multiple datasets (with SET, UPDATE or MERGE) it
will always be more efficient to subset each input dataset separately
with a WHERE= dataset option.
Of course, it may not always be possible to do so, for instance when
conditions rely on variables from two or more input datasets. In these
cases, you'll not be able to use a WHERE= dataset option, but neither
will a WHERE statement work, as SAS will try to apply this in the same
way, i.e. on all input datasets. So in this case you must go back to a
subsetting IF statement. Or an IF with an explicit OUTPUT or DELETE
statement, of course.
Ace in Basel - brucedotrogers a.t rochedotcom