LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (February 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 25 Feb 2004 16:13:36 -0500
Reply-To:     "Lustig, Roger" <roger.lustig@CITIGROUP.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Lustig, Roger" <roger.lustig@CITIGROUP.COM>
Subject:      Re: Data Step vs. Proc sql
Comments: To: rdporto1 <rdporto1@TERRA.COM.BR>
Content-Type: text/plain; charset="iso-8859-1"

How's that again? PROC SQL has to sort the data first, even if it's already sorted. You don't include PROC SORT for either dataset, so one has to assume they're already sorted by X. Let's see your logs from your five-million-record test.

As for memory use, why would the data step take much memory at all? All it needs is the program data vector and a few buffers. Sorting takes lots of memory.

Moreover, your DATA step and SQL code don't do the same thing:

--the DATA step is doing the equivalent of a right join, more or less. --assuming that you wanted to put a comma between "a as a" and "b as b", you're doing an inner join in SQL. --when a variable/column occurs in both input data sets/tables, the two methods handle this differently. --If both a and b have multiple occurrences of a particular value of X, you'll get different output.

This last item is critical: any way of "chang[ing] data step code" would have to know something about the data involved.

Finally, the DATA step can do a great many things that SQL can't, such as first. and last. processing, creation of multiple output data sets, etc.

Roger

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of rdporto1 Sent: Wednesday, February 25, 2004 3:33 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Data Step vs. Proc sql

Hi everybody.

There are 2 ways (at least) of match merging two datasets:

data c; merge a b (in=b); by x; if b; run;

proc sql; create table c as select * from a as a b as b where a.x=b.x; quit;

Proc sql spend much less time (and memory) than data step (try them with 5 milion observatiosn!). I wonder if there's a way to change data step code in order to turn it so efficient as proc sql


Back to: Top of message | Previous page | Main SAS-L page