LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 7 Dec 2004 17:12:24 -0500
Reply-To:     Frank Ivis <FIvis@CIHI.CA>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Frank Ivis <FIvis@CIHI.CA>
Subject:      Re: Simple data querying problem
Content-Type: text/plain; charset="iso-8859-1"

I seem to recall that the WHERE condition is faster than the equivalent INNER JOIN. I think there is something in the SAS doc on this.

Frank

-----Original Message----- From: Sigurd Hermansen [mailto:HERMANS1@WESTAT.COM] Sent: Tuesday, December 07, 2004 4:53 PM Subject: Re: Simple data querying problem

Toby: From the perspective of a statistician (not me), you have one estimate of a difference of 0.05 or 0.02 in observed times. I would not want to bet the farm that this estimate represents the true difference.

Now, in a more practical vein, it would not surprise me to learn that the BETWEEN operator takes longer to optimize and compile than arithmetic operators. In earlier versions of SAS, it really did take much longer to execute a BETWEEN condition. That difference has narrowed dramatically in recent versions.

Other conditions that I find would account for even larger variations in time than what you are observing: - programs running concurrently; - which query comes first in the program; - seemingly random variation. Sig

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dunn, Toby Sent: Tuesday, December 07, 2004 4:41 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Simple data querying problem

Sig,

Your method :

proc sql; 23 create table d as 24 select a.time as time 25 from a as a inner join b as b 26 on a.time between (b.time - 1) and (b.time + 1); NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.D created, with 3 rows and 1 columns.

27 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.26 seconds user cpu time 0.00 seconds system cpu time 0.03 seconds Memory 297k

A modified version of mine:

20 proc sql; 21 create table d as 22 select a.time as time 23 from a as a , 24 b as b 25 where a.time between (b.time - 1) and (b.time + 1); NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.D created, with 3 rows and 1 columns.

26 quit; NOTE: PROCEDURE SQL used (Total process time): real time 0.21 seconds user cpu time 0.00 seconds system cpu time 0.01 seconds Memory 297k

Hmmmm I figured yours would be faster than mine, wonder why it isn't?

Toby

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd Hermansen Sent: Tuesday, December 07, 2004 3:33 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Simple data querying problem

Zach: I could not resist poaching program segments from Ya and Toby, and rewriting the query in a somewhat more generic and streamlined form. If you need to span days, try a datetime key. This form of query also extends easily to constrain comparisons within keys representing persons or other entities:

DATA A; input time; cards; 9.00 13.00 14.00 21.00 ; run; DATA B; input time; cards; 9.25 13.50 23.00 ; run; proc sql; create table c as select a.time as time from a as a inner join b as b on a.time between (b.time - 1) and (b.time + 1); quit;

Building on what I wrote earlier today, the SQL query begins with the first row in A and compares the time to each row in table B. If the condition holds, it writes whatever specified in the select clause to the output dataset. The query then repeats the process for each subsequent row in A. Since neither order of rows or columns selected in the select clause matters, the brute force method does not have to change with the scale of the query. The SAS SQL query optimizer may find a more efficient way to implement the query, but it will produce the same result. Sig Sig

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Zach Peery Sent: Tuesday, December 07, 2004 1:44 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Simple data querying problem

Hi All,

Sorry about the last post - it seemed to want to send itself. in the meanwhile - I thought of a better way to explain the problem...

I am trying to identify observations in dataset A that occured within 1 hour of any observation in dataset B, and place these observations in a new dataset. Note that time is in hh.hh format such that 13.50 = 1:30 pm.

DATASET A time 9.00 13.00 14.00 21.00

DATASET B time 9.25 13.50 23.00

DESIRED DATASET time1 9.00 13.00 14.00

Thanks, I really appreciate any help that the list can provide. Zach


Back to: Top of message | Previous page | Main SAS-L page