|
Dave,
SQL is optimized to evaluate the benefits of using the INDEX before deciding
whether to use it or not. See
http://support.sas.com/techsup/unotes/SN/001/001126.html
If you are keen to use the index in the data step to get the max value read
the last observation using the BY statement. You be the judge as to which is
more efficient in this regard - SQL or the data step:
data _null_ ;
call symput('nobs',put(nobs,best.-l)) ;
stop ;
set test nobs=nobs ;
run ;
data _null_ ;
set test(firstobs=&nobs.) ;
by key ;
call symput('max_key',key);
run ;
Kind Regards,
__________________________
Venky Chakravarthy
E-mail: swovcc@hotmail.com
-----Original Message-----
From: Devoll, Dave [mailto:dave.devoll@CITIGROUP.COM]
Sent: Friday, August 08, 2003 1:21 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Get MAX index value efficiently
I have a huge dataset that has a simple index (but is not sorted by the
index.) I want to efficiently get the maximum value for the simple index.
I can get the minimum value quickly, but I haven't found a way to use the
index to get the maximum value.
This generic code illustrates the scenario:
-------------------------------------------------------
option msglevel=i;
data test (index=(key));
/* Create 1 million row dataset, with MIN */
/* & MAX key values in the middle. */
do key=2 to 500000; output; end;
key=1; /* MIN */
output;
key=1000000; /* MAX */
output;
do key=500001 to 999999; output; end;
run;
data _null_;
/* Uses index, but only works for MIN_key */
set test (obs=1);
by key; /* No, DESCENDING does not work with index */
call symput('min_key',key);
run;
proc sql noprint;
/* Works for MAX_key, but does NOT use index */
select max(key) into: max_key from test;
quit;
%put _user_;
-------------------------------------------------------
Which results in:
-------------------------------------------------------
5210 option msglevel=i;
5211
5212 data test (index=(key));
5213 /* Create 1 million row dataset, with MIN */
5214 /* & MAX key values in the middle. */
5215
5216 do key=2 to 500000; output; end;
5217
5218 key=1; /* MIN */
5219 output;
5220 key=1000000; /* MAX */
5221 output;
5222
5223 do key=500001 to 999999; output; end;
5224 run;
NOTE: The data set WORK.TEST has 1000000 observations and 1 variables.
NOTE: Simple index key has been defined.
NOTE: Compressing data set WORK.TEST increased size by 121.66 percent.
Compressed is 2210 pages; un-compressed would require 997 pages.
NOTE: DATA statement used:
real time 16.68 seconds
cpu time 16.56 seconds
5225
5226 data _null_;
5227 /* Uses index, but only works for MIN_key */
5228 set test (obs=1);
5229 by key; /* No, DESCENDING does not work with index */
5230 call symput('min_key',key);
5231 run;
NOTE: Numeric values have been converted to character values at the places
given by: (Line):(Column).
5230:25
INFO: Index key selected for BY clause processing.
NOTE: There were 1 observations read from the data set WORK.TEST.
NOTE: DATA statement used:
real time 0.07 seconds
cpu time 0.02 seconds
5232
5233 proc sql noprint;
5234 /* Works for MAX_key, but does NOT use index */
5235 select max(key) into: max_key from test;
5236 quit;
NOTE: PROCEDURE SQL used:
real time 5.86 seconds
cpu time 5.83 seconds
5237
5238 %put _user_;
GLOBAL MAX_KEY 1000000
GLOBAL MIN_KEY 1
-------------------------------------------------------
If I could coax SQL to use the index, or if SAS could do DESCENDING
by-groups using indexes, I would be set.
Does anyone have a solution?
BTW, I'm running 8.1 on Solaris.
Thanks.
Dave
LEGAL NOTICE
Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.
|