| Date: | Fri, 28 Jul 2000 15:21:24 -0700 |
| Reply-To: | "Lund, Pete" <Peter.Lund@CFC.WA.GOV> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Lund, Pete" <Peter.Lund@CFC.WA.GOV> |
| Subject: | Re: index question in V8 |
| Content-Type: | text/plain; charset="iso-8859-1" |
|---|
Ian-
That can't be it. I looked and I have INDEX_AEHJLQSW=NO option set!
----------------------------------------------------------------------
Pete Lund
WA State Caseload Forecast Council
515 15th Ave SE
Olympia, WA 98504-0962
(360) 902-0086 voice
(360) 902-0084 fax
(360) 971-0962 pager
peter.lund@cfc.wa.gov
----------------------------------------------------------------------
-----Original Message-----
From: Ian Whitlock [mailto:WHITLOI1@WESTAT.com]
Sent: Friday, July 28, 2000 12:50 PM
To: 'Lund, Pete'; SAS-L@LISTSERV.UGA.EDU
Subject: RE: index question in V8
Peter,
It looks like they are using the AEHJLOQSW-algorithim to make the decision.
Ian Whitlock <whitloi1@westat.com>
-----Original Message-----
From: Lund, Pete [mailto:Peter.Lund@CFC.WA.GOV]
Sent: Thursday, July 27, 2000 6:25 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: index question in V8
I've been creating some indexed datasets today and noticed that there didn't
seem to be any performance improvement in resulting steps. Turning on
MSGLEVEL=I I was able to see that there were a number of cases in which it
seemed logical that an index should be used and it wasn't. The following
two data steps returned within 2 of the same number of records - one used
the index, the other did not. (Note: this is V8 only, in 6.12 the index is
used both times).
data test1;
set auths200004;
where service eq '4510'; *<---- index used - returned about 3,800 out
of 280,000 records;
run;
data test2;
set auths200004;
where service eq '4501'; *<---- index not used - also returned about
3,800 out of 280,000 records;
run;
Mark Terjeson and I have been playing around with this and would welcome any
insight. The following little program creates a dataset with 300,000
observations and a simple index on one of the variables. There is then a
macro that will create a dataset subsetting on each value of the index
variable. A note is written to the log to show whether or not the index was
used. There is a pattern to when the index is used, but seems to be
unrelated to the number of records in the index group.
IMPORTANT NOTE: in V6 the index is used every time.
Here's the code - if you have time, run it and see if you can shed any light
on what's happening:
* make sample data ;
data table1(index=(ltr));
do i = 1 to 300000;
x = uniform(0);
m = mod(x*100,26);
if m le 13 then m = int(m/2);
else m = int(m*2);
mm = mod(m,26);
if 1 le mm le 26 then ltr =
substr("ABCDEFGHIJKLMNOPQRSTUVWXYZ",mm,1);
output;
end;
run;
* look at ltr % distribution ;
proc freq data=table1;
table ltr / list missing;
run;
options msglevel=i;
%macro testind;
%let letters = ABCDEFGHIJKLMNOPQRSTUVWXYZ;
%do i = 1 %to 26;
options nonotes;
%put;
%put LETTER: %substr(&letters,&i,1);
data table2;
set table1;
where ltr eq "%substr(&letters,&i,1)";
run;
options notes;
%end;
%mend;
%testind;
----------------------------------------------------------------------
Pete Lund
WA State Caseload Forecast Council
515 15th Ave SE
Olympia, WA 98504-0962
(360) 902-0086 voice
(360) 902-0084 fax
(360) 971-0962 pager
peter.lund@cfc.wa.gov
----------------------------------------------------------------------
|