Date: Tue, 7 Dec 2004 17:12:24 -0500
Reply-To: Frank Ivis <FIvis@CIHI.CA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Frank Ivis <FIvis@CIHI.CA>
Subject: Re: Simple data querying problem
Content-Type: text/plain; charset="iso-8859-1"
I seem to recall that the WHERE condition is faster than the equivalent INNER JOIN. I think there is something in the SAS doc on this.
Frank
-----Original Message-----
From: Sigurd Hermansen [mailto:HERMANS1@WESTAT.COM]
Sent: Tuesday, December 07, 2004 4:53 PM
Subject: Re: Simple data querying problem
Toby:
From the perspective of a statistician (not me), you have one estimate of a
difference of 0.05 or 0.02 in observed times. I would not want to bet the
farm that this estimate represents the true difference.
Now, in a more practical vein, it would not surprise me to learn that the
BETWEEN operator takes longer to optimize and compile than arithmetic
operators. In earlier versions of SAS, it really did take much longer to
execute a BETWEEN condition. That difference has narrowed dramatically in
recent versions.
Other conditions that I find would account for even larger variations in
time than what you are observing:
- programs running concurrently;
- which query comes first in the program;
- seemingly random variation.
Sig
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dunn,
Toby
Sent: Tuesday, December 07, 2004 4:41 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Simple data querying problem
Sig,
Your method :
proc sql;
23 create table d as
24 select a.time as time
25 from a as a inner join b as b
26 on a.time between (b.time - 1) and (b.time + 1);
NOTE: The execution of this query involves performing one or more Cartesian
product joins
that can not be optimized.
NOTE: Table WORK.D created, with 3 rows and 1 columns.
27 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.26 seconds
user cpu time 0.00 seconds
system cpu time 0.03 seconds
Memory 297k
A modified version of mine:
20 proc sql;
21 create table d as
22 select a.time as time
23 from a as a ,
24 b as b
25 where a.time between (b.time - 1) and (b.time + 1);
NOTE: The execution of this query involves performing one or more Cartesian
product joins
that can not be optimized.
NOTE: Table WORK.D created, with 3 rows and 1 columns.
26 quit;
NOTE: PROCEDURE SQL used (Total process time):
real time 0.21 seconds
user cpu time 0.00 seconds
system cpu time 0.01 seconds
Memory 297k
Hmmmm I figured yours would be faster than mine, wonder why it isn't?
Toby
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Sigurd
Hermansen
Sent: Tuesday, December 07, 2004 3:33 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Simple data querying problem
Zach:
I could not resist poaching program segments from Ya and Toby, and rewriting
the query in a somewhat more generic and streamlined form. If you need to
span days, try a datetime key. This form of query also extends easily to
constrain comparisons within keys representing persons or other entities:
DATA A;
input time;
cards;
9.00
13.00
14.00
21.00
;
run;
DATA B;
input time;
cards;
9.25
13.50
23.00
;
run;
proc sql;
create table c as
select a.time as time
from a as a inner join b as b
on a.time between (b.time - 1) and (b.time + 1);
quit;
Building on what I wrote earlier today, the SQL query begins with the first
row in A and compares the time to each row in table B. If the condition
holds, it writes whatever specified in the select clause to the output
dataset. The query then repeats the process for each subsequent row in A.
Since neither order of rows or columns selected in the select clause
matters, the brute force method does not have to change with the scale of
the query. The SAS SQL query optimizer may find a more efficient way to
implement the query, but it will produce the same result. Sig Sig
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Zach
Peery
Sent: Tuesday, December 07, 2004 1:44 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Simple data querying problem
Hi All,
Sorry about the last post - it seemed to want to send itself. in the
meanwhile - I thought of a better way to explain the problem...
I am trying to identify observations in dataset A that occured within 1 hour
of any observation in dataset B, and place these observations in a new
dataset. Note that time is in hh.hh format such that 13.50 = 1:30 pm.
DATASET A
time
9.00
13.00
14.00
21.00
DATASET B
time
9.25
13.50
23.00
DESIRED DATASET
time1
9.00
13.00
14.00
Thanks, I really appreciate any help that the list can provide. Zach