LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2003, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 8 Aug 2003 16:17:53 -0400
Reply-To:   "Chakravarthy, Venky" <Venky.Chakravarthy@PFIZER.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   "Chakravarthy, Venky" <Venky.Chakravarthy@PFIZER.COM>
Subject:   Re: Get MAX index value efficiently
Comments:   To: "Devoll, Dave" <dave.devoll@CITIGROUP.COM>
Content-Type:   text/plain; charset="iso-8859-1"

Dave,

SQL is optimized to evaluate the benefits of using the INDEX before deciding whether to use it or not. See http://support.sas.com/techsup/unotes/SN/001/001126.html

If you are keen to use the index in the data step to get the max value read the last observation using the BY statement. You be the judge as to which is more efficient in this regard - SQL or the data step:

data _null_ ; call symput('nobs',put(nobs,best.-l)) ; stop ; set test nobs=nobs ; run ;

data _null_ ; set test(firstobs=&nobs.) ; by key ; call symput('max_key',key); run ;

Kind Regards, __________________________ Venky Chakravarthy E-mail: swovcc@hotmail.com

-----Original Message----- From: Devoll, Dave [mailto:dave.devoll@CITIGROUP.COM] Sent: Friday, August 08, 2003 1:21 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Get MAX index value efficiently

I have a huge dataset that has a simple index (but is not sorted by the index.) I want to efficiently get the maximum value for the simple index.

I can get the minimum value quickly, but I haven't found a way to use the index to get the maximum value.

This generic code illustrates the scenario:

------------------------------------------------------- option msglevel=i;

data test (index=(key)); /* Create 1 million row dataset, with MIN */ /* & MAX key values in the middle. */

do key=2 to 500000; output; end;

key=1; /* MIN */ output; key=1000000; /* MAX */ output;

do key=500001 to 999999; output; end; run;

data _null_; /* Uses index, but only works for MIN_key */ set test (obs=1); by key; /* No, DESCENDING does not work with index */ call symput('min_key',key); run;

proc sql noprint; /* Works for MAX_key, but does NOT use index */ select max(key) into: max_key from test; quit;

%put _user_; -------------------------------------------------------

Which results in:

------------------------------------------------------- 5210 option msglevel=i; 5211 5212 data test (index=(key)); 5213 /* Create 1 million row dataset, with MIN */ 5214 /* & MAX key values in the middle. */ 5215 5216 do key=2 to 500000; output; end; 5217 5218 key=1; /* MIN */ 5219 output; 5220 key=1000000; /* MAX */ 5221 output; 5222 5223 do key=500001 to 999999; output; end; 5224 run;

NOTE: The data set WORK.TEST has 1000000 observations and 1 variables. NOTE: Simple index key has been defined. NOTE: Compressing data set WORK.TEST increased size by 121.66 percent. Compressed is 2210 pages; un-compressed would require 997 pages. NOTE: DATA statement used: real time 16.68 seconds cpu time 16.56 seconds

5225 5226 data _null_; 5227 /* Uses index, but only works for MIN_key */ 5228 set test (obs=1); 5229 by key; /* No, DESCENDING does not work with index */ 5230 call symput('min_key',key); 5231 run;

NOTE: Numeric values have been converted to character values at the places given by: (Line):(Column). 5230:25 INFO: Index key selected for BY clause processing. NOTE: There were 1 observations read from the data set WORK.TEST. NOTE: DATA statement used: real time 0.07 seconds cpu time 0.02 seconds

5232 5233 proc sql noprint; 5234 /* Works for MAX_key, but does NOT use index */ 5235 select max(key) into: max_key from test; 5236 quit; NOTE: PROCEDURE SQL used: real time 5.86 seconds cpu time 5.83 seconds

5237 5238 %put _user_; GLOBAL MAX_KEY 1000000 GLOBAL MIN_KEY 1 -------------------------------------------------------

If I could coax SQL to use the index, or if SAS could do DESCENDING by-groups using indexes, I would be set.

Does anyone have a solution?

BTW, I'm running 8.1 on Solaris.

Thanks. Dave

LEGAL NOTICE Unless expressly stated otherwise, this message is confidential and may be privileged. It is intended for the addressee(s) only. Access to this E-mail by anyone else is unauthorized. If you are not an addressee, any disclosure or copying of the contents of this E-mail or any action taken (or not taken) in reliance on it is unauthorized and may be unlawful. If you are not an addressee, please inform the sender immediately.


Back to: Top of message | Previous page | Main SAS-L page