Date: Fri, 28 Mar 2003 14:43:09 -0600
Reply-To: pudding_man@lycos.com
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Puddin' Man <pudding_man@LYCOS.COM>
Organization: Lycos Mail (http://www.mail.lycos.com:80)
Subject: Re: Why is SQL taking so long to pull nobs?
Content-Type: text/plain; charset=us-ascii
On Fri, 28 Mar 2003 14:59:38
Ian Whitlock wrote:
>Puddin',
>
>Now try
>
> where &obs-1 < id <= &obs and ceil(id) = &obs
>
>Does it make any difference?
It certainly does.
1.) It reports that the index is used.
2.) It uses over twice the resources. :-)
To wit:
859 %let obs = 1e7;
860 options msglevel=i;
861
862 data aa(index = (id));
863 array x(4);
864 do id = 1 to &obs; output; end;
865 run;
NOTE: The data set WORK.AA has 10000000 observations and 5 variables.
NOTE: Simple index id has been defined.
NOTE: DATA statement used:
real time 1:34.09
cpu time 46.31 seconds
866
867 Proc SQL;
868 Select *
869 From aa
870 where id = &obs
871 ;
INFO: Index id selected for WHERE clause optimization.
871! quit;
NOTE: PROCEDURE SQL used:
real time 0.06 seconds
cpu time 0.01 seconds
872
873 Proc SQL;
874 Select *
875 From aa
876 where &obs-1 < id <= &obs and ceil(id) = &obs
877 ;
INFO: Index id selected for WHERE clause optimization.
877! quit;
NOTE: PROCEDURE SQL used:
real time 47.54 seconds
cpu time 31.90 seconds
The idea was to get it to use the index on the first
condition then apply the function to the result?
You tested with W9x/8.2? I hope it looked better than
this! <g>
Puddin'
***********************************************************
*** Puddin' Man *** Pudding_Man@lycos.com ********
***********************************************************;
>IanWhitlock@westat.com
>-----Original Message-----
>From: Puddin' Man [mailto:pudding_man@LYCOS.COM]
>Sent: Friday, March 28, 2003 2:35 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: Why is SQL taking so long to pull nobs?
>
>
>On Fri, 28 Mar 2003 08:15:09
> Fehd, Ronald J. (PHPPO) wrote:
>>> From: a little Birdie
>>> If the predicates in the where clause are "simple"
>>> libname EQ "constant",
>>> memname EQ "constant",
>>> then the SQL optimiser
>>> realises that he can open only the tables that will qualify
>>> anyways... (this is specific optimization for
>>> LIBNAME/MEMNAME in the dictionary.tables that have them...)
>>>
>>> Obfuscate it with functions, || or other expressions, and the
>
>I assume that this reflects on the use of UPCASE in your
>WHERE statement ...
>
>>> optimiser doesn't see thru to the underlying truth (it could,
>
>I am inclined to agree that "it could".
>
>>> but its not yet that smart)... So we open all the tables,
>
>I am inclined to agree that "its not yet that smart". :-)
>
>>> and many get opened "in vain" because their data are thrown
>>> away when the predicate is finally processed in its completeness...
>>
>>end Birdie quote
>
>Ron,
>
>Is it possible that your Rhapsodic Boidie intended that
>we should generalize such behavior to any uses of
>indices via a WHERE statement in SAS processing?
>
>Consider:
>
>177 %let obs = 1e7;
>178 data aa(index = (id));
>179 array x(4);
>180 do id = 1 to &obs; output; end;
>181 run;
>
>NOTE: The data set WORK.AA has 10000000 observations and 5 variables.
>NOTE: Simple index id has been defined.
>NOTE: DATA statement used:
> real time 1:37.95
> cpu time 47.30 seconds
>182
>183 Proc SQL;
>184 Select *
>185 From aa
>186 where id = &obs
>187 ;
>INFO: Index id selected for WHERE clause optimization.
>187! quit;
>NOTE: PROCEDURE SQL used:
> real time 0.04 seconds
> cpu time 0.01 seconds
>188
>189 Proc SQL;
>190 Select *
>191 From aa
>192 where ceil(id) = &obs
>193 ;
>193! quit;
>NOTE: PROCEDURE SQL used:
> real time 18.61 seconds
> cpu time 13.32 seconds
>
>You'll notice that the index is explicitly used and noted
>when the CEIL function is -not- used, and the performance
>"stats" very much reflect such use.
>
>Could this last step not have used the index to find
>ID = 1E7 and -then- applied the CEIL function to see
>if it met the WHERE condition? Instead of grinding thru the
>entire file sequentially, applying CEIL to every ID value and shuffling tons
>on data thru the buffer?
>
>Lest anyone should consider this to be just an SQL phenomenon:
>
>194
>195 data bb;
>196 set aa(where = (id = &obs));
>INFO: Index id selected for WHERE clause optimization.
>197 run;
>
>NOTE: There were 1 observations read from the data set WORK.AA.
> WHERE id=10000000;
>NOTE: The data set WORK.BB has 1 observations and 5 variables.
>NOTE: DATA statement used:
> real time 0.04 seconds
> cpu time 0.02 seconds
>198
>199 data bb;
>200 set aa(where = (ceil(id) = &obs));
>201 run;
>
>NOTE: There were 1 observations read from the data set WORK.AA.
> WHERE CEIL(id)=10000000;
>NOTE: The data set WORK.BB has 1 observations and 5 variables.
>NOTE: DATA statement used:
> real time 18.14 seconds
> cpu time 13.36 seconds
>
>This was W2kP 8.1. Specifying IDXWHERE=YES (hopefully
>to avoid the sequential search) seems to have no
>effect on this code.
>
>It's great to be able to use most/all of the bells/whistles
>(functions, in/formats, etc) in a WHERE statement, but it
>appears that a serious cost may attach to such use if indices
>are present.
>
>Bears mentioning that the Onlinedoc has guidelines etc
>on index usage, sez that certain functions (TRIM and
>some forms of SUBSTR, etc) and numerous operators
>(i.e. CONTAINS, LIKE) will "optimize" for index
>usage. See Lang. Ref. Concepts: SAS DataFiles: Indices.
>But there's tons of stuff that evidently won't make
>use of an index.
>
>It's interesting to note that if I apply the CEIL
>to the constant (&obs) instead of the ID var in the
>above code, I get a very comparable result. :-(
>
>So our "optimizer" is patently sub-optimal? <g>
>
>A po' fella's gotta wonder *when* will the "WHERE statement
>optimizer" get "smarter"?
>
>-- snip ----
>
> Zalut,
> Puddin'
>
>***********************************************************
>*** Puddin' Man *** Pudding_Man@lycos.com ********
>***********************************************************;
>
>"Funniest thing I ever seen,
> polecat jumped on a sewing machine.
> Sewing machine, it run so fast,
> took 99 stitches in his ... yas, yas, yas!"
> from "The Dirty Dozens", Memphis Minnie, maybe 1939
>
_____________________________________________________________
Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus
|