Date: Wed, 25 Feb 2004 16:13:36 -0500
Reply-To: "Lustig, Roger" <roger.lustig@CITIGROUP.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Lustig, Roger" <roger.lustig@CITIGROUP.COM>
Subject: Re: Data Step vs. Proc sql
Content-Type: text/plain; charset="iso-8859-1"
How's that again? PROC SQL has to sort the data first, even if it's already
sorted. You don't include PROC SORT for either dataset, so one has to
assume they're already sorted by X. Let's see your logs from your
five-million-record test.
As for memory use, why would the data step take much memory at all? All it
needs is the program data vector and a few buffers. Sorting takes lots of
memory.
Moreover, your DATA step and SQL code don't do the same thing:
--the DATA step is doing the equivalent of a right join, more or less.
--assuming that you wanted to put a comma between "a as a" and "b as b",
you're doing an inner join in SQL.
--when a variable/column occurs in both input data sets/tables, the two
methods handle this differently.
--If both a and b have multiple occurrences of a particular value of X,
you'll get different output.
This last item is critical: any way of "chang[ing] data step code" would
have to know something about the data involved.
Finally, the DATA step can do a great many things that SQL can't, such as
first. and last. processing, creation of multiple output data sets, etc.
Roger
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
rdporto1
Sent: Wednesday, February 25, 2004 3:33 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Data Step vs. Proc sql
Hi everybody.
There are 2 ways (at least) of match merging two datasets:
data c;
merge a b (in=b);
by x;
if b;
run;
proc sql;
create table c as
select * from a as a b as b
where a.x=b.x;
quit;
Proc sql spend much less time (and memory) than data step (try them with 5
milion observatiosn!).
I wonder if there's a way to change data step code in order to turn it so
efficient as proc sql