LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2003, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 28 Mar 2003 14:43:09 -0600
Reply-To:     pudding_man@lycos.com
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Puddin' Man <pudding_man@LYCOS.COM>
Organization: Lycos Mail  (http://www.mail.lycos.com:80)
Subject:      Re: Why is SQL taking so long to pull nobs?
Comments: To: Ian Whitlock <WHITLOI1@WESTAT.com>
Content-Type: text/plain; charset=us-ascii

On Fri, 28 Mar 2003 14:59:38 Ian Whitlock wrote: >Puddin', > >Now try > > where &obs-1 < id <= &obs and ceil(id) = &obs > >Does it make any difference?

It certainly does.

1.) It reports that the index is used.

2.) It uses over twice the resources. :-)

To wit:

859 %let obs = 1e7; 860 options msglevel=i; 861 862 data aa(index = (id)); 863 array x(4); 864 do id = 1 to &obs; output; end; 865 run;

NOTE: The data set WORK.AA has 10000000 observations and 5 variables. NOTE: Simple index id has been defined. NOTE: DATA statement used: real time 1:34.09 cpu time 46.31 seconds 866 867 Proc SQL; 868 Select * 869 From aa 870 where id = &obs 871 ; INFO: Index id selected for WHERE clause optimization. 871! quit; NOTE: PROCEDURE SQL used: real time 0.06 seconds cpu time 0.01 seconds 872 873 Proc SQL; 874 Select * 875 From aa 876 where &obs-1 < id <= &obs and ceil(id) = &obs 877 ; INFO: Index id selected for WHERE clause optimization. 877! quit; NOTE: PROCEDURE SQL used: real time 47.54 seconds cpu time 31.90 seconds

The idea was to get it to use the index on the first condition then apply the function to the result?

You tested with W9x/8.2? I hope it looked better than this! <g>

Puddin'

*********************************************************** *** Puddin' Man *** Pudding_Man@lycos.com ******** ***********************************************************;

>IanWhitlock@westat.com >-----Original Message----- >From: Puddin' Man [mailto:pudding_man@LYCOS.COM] >Sent: Friday, March 28, 2003 2:35 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: Re: Why is SQL taking so long to pull nobs? > > >On Fri, 28 Mar 2003 08:15:09 > Fehd, Ronald J. (PHPPO) wrote: >>> From: a little Birdie >>> If the predicates in the where clause are "simple" >>> libname EQ "constant", >>> memname EQ "constant", >>> then the SQL optimiser >>> realises that he can open only the tables that will qualify >>> anyways... (this is specific optimization for >>> LIBNAME/MEMNAME in the dictionary.tables that have them...) >>> >>> Obfuscate it with functions, || or other expressions, and the > >I assume that this reflects on the use of UPCASE in your >WHERE statement ... > >>> optimiser doesn't see thru to the underlying truth (it could, > >I am inclined to agree that "it could". > >>> but its not yet that smart)... So we open all the tables, > >I am inclined to agree that "its not yet that smart". :-) > >>> and many get opened "in vain" because their data are thrown >>> away when the predicate is finally processed in its completeness... >> >>end Birdie quote > >Ron, > >Is it possible that your Rhapsodic Boidie intended that >we should generalize such behavior to any uses of >indices via a WHERE statement in SAS processing? > >Consider: > >177 %let obs = 1e7; >178 data aa(index = (id)); >179 array x(4); >180 do id = 1 to &obs; output; end; >181 run; > >NOTE: The data set WORK.AA has 10000000 observations and 5 variables. >NOTE: Simple index id has been defined. >NOTE: DATA statement used: > real time 1:37.95 > cpu time 47.30 seconds >182 >183 Proc SQL; >184 Select * >185 From aa >186 where id = &obs >187 ; >INFO: Index id selected for WHERE clause optimization. >187! quit; >NOTE: PROCEDURE SQL used: > real time 0.04 seconds > cpu time 0.01 seconds >188 >189 Proc SQL; >190 Select * >191 From aa >192 where ceil(id) = &obs >193 ; >193! quit; >NOTE: PROCEDURE SQL used: > real time 18.61 seconds > cpu time 13.32 seconds > >You'll notice that the index is explicitly used and noted >when the CEIL function is -not- used, and the performance >"stats" very much reflect such use. > >Could this last step not have used the index to find >ID = 1E7 and -then- applied the CEIL function to see >if it met the WHERE condition? Instead of grinding thru the >entire file sequentially, applying CEIL to every ID value and shuffling tons >on data thru the buffer? > >Lest anyone should consider this to be just an SQL phenomenon: > >194 >195 data bb; >196 set aa(where = (id = &obs)); >INFO: Index id selected for WHERE clause optimization. >197 run; > >NOTE: There were 1 observations read from the data set WORK.AA. > WHERE id=10000000; >NOTE: The data set WORK.BB has 1 observations and 5 variables. >NOTE: DATA statement used: > real time 0.04 seconds > cpu time 0.02 seconds >198 >199 data bb; >200 set aa(where = (ceil(id) = &obs)); >201 run; > >NOTE: There were 1 observations read from the data set WORK.AA. > WHERE CEIL(id)=10000000; >NOTE: The data set WORK.BB has 1 observations and 5 variables. >NOTE: DATA statement used: > real time 18.14 seconds > cpu time 13.36 seconds > >This was W2kP 8.1. Specifying IDXWHERE=YES (hopefully >to avoid the sequential search) seems to have no >effect on this code. > >It's great to be able to use most/all of the bells/whistles >(functions, in/formats, etc) in a WHERE statement, but it >appears that a serious cost may attach to such use if indices >are present. > >Bears mentioning that the Onlinedoc has guidelines etc >on index usage, sez that certain functions (TRIM and >some forms of SUBSTR, etc) and numerous operators >(i.e. CONTAINS, LIKE) will "optimize" for index >usage. See Lang. Ref. Concepts: SAS DataFiles: Indices. >But there's tons of stuff that evidently won't make >use of an index. > >It's interesting to note that if I apply the CEIL >to the constant (&obs) instead of the ID var in the >above code, I get a very comparable result. :-( > >So our "optimizer" is patently sub-optimal? <g> > >A po' fella's gotta wonder *when* will the "WHERE statement >optimizer" get "smarter"? > >-- snip ---- > > Zalut, > Puddin' > >*********************************************************** >*** Puddin' Man *** Pudding_Man@lycos.com ******** >***********************************************************; > >"Funniest thing I ever seen, > polecat jumped on a sewing machine. > Sewing machine, it run so fast, > took 99 stitches in his ... yas, yas, yas!" > from "The Dirty Dozens", Memphis Minnie, maybe 1939 >

_____________________________________________________________ Get 25MB, POP3, Spam Filtering with LYCOS MAIL PLUS for $19.95/year. http://login.mail.lycos.com/brandPage.shtml?pageId=plus&ref=lmtplus


Back to: Top of message | Previous page | Main SAS-L page