Date: Thu, 4 Sep 2003 14:28:25 -0400
Reply-To: john_m_wildenthal@BANKONE.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "John M. Wildenthal" <john_m_wildenthal@BANKONE.COM>
Subject: Re: An efficiency question
Content-type: text/plain; charset="us-ascii"
Dennis Diskin has the best approach in his last post - direct matrix
lookups. I will assume, as he did, that your ids are all essentially
numbers. If not, you can convert them to numbers on the fly as an exercise
;)
The coding need not be that involved. This email will demonstrate a
straightforward solution to that problem. Code is UNTESTED.
First off, do you really want to maintain the list of 50+ categories, each
with 50+ subcategories, with each of those 2500 elements having multiple
triggering conditions, inside your code? If there are 3 conditions (on
average) for each different subcategory, that would be 7500 relationships.
I wouldn't want to be responsible for maintaining that list, particularly
within production source code. Let us assume you have access to an RDBMS
from SAS (at least ODBC, right?). You can place an appropriate table out
there, something like (monospace font highly recommended)
id cat subcat
3566 A 1
4863 ZZ 33
9327 YS 56
3566 U 23
And so on. Make the primary key all three fields.
Having the data in an RDBMS like this means you can have others maintain
all of the mappings via an ASP page or some other simple access mechanism.
You know they will eventually want to change them, so let's make the future
comfortable.
Now to the coding. We need to build a BIG matrix that says what is to be
set by what values. It will require (COUNT(DISTINCT cat)) x (MAX(id)
-MIN(id)+1) x 8 bytes of memory. For a range of 10,000 id values and 50
categories, that would be about 4MB, well within any reasonable limits.
Multiplying the range by 10 (100,000 difference between MIN and MAX) means
40MB, which any modern PC should easily handle. Make the difference
1,000,000 and you will need a nice PC (>512MB RAM) to prevent paging.
Let's get on with it. I will assume the above table is called
orcldata.cat_table:
PROC SQL STIMER;
SELECT COUNT(DISTINCT cat), MAX(id), MIN(id)
INTO :catcount, :maxarray, :minarray
FROM orcldata.cat_table; %* this is needed to build the big lookup
matrix;
SELECT cat, MAX(subcat)
INTO :catlist SEPARATED BY ' ', :arraysize SEPARATED BY ' '
FROM orcldata.cat_table
GROUP BY cat; %* this is needed to build an ARRAY for each category;
QUIT;
%MACRO arraylist(cats, sizes); %* create the ARRAY statements for each
category;
%LOCAL i token lval;
%LET i = 1;
%LET token = %SCAN(&cats,&i);
%DO %WHILE (%LENGTH(&token));
%LET lval = %SCAN(&sizes,&i);
ARRAY category_&token.[0:&lval] category_&token._0-category_&token._&lval
(%EVAL(&lval+1)*0);
%LET i = %EVAL(&i + 1);
%LET token = %SCAN(&cats,&i);
%END;
%MEND;
%MACRO indexcalc(cats); %* perform the mapping from cat to the column
number in the lookup matrix;
%LOCAL i token;
%LET i = 1;
%LET token = %SCAN(&cats,&i);
%DO %WHILE (%LENGTH(&token));
WHEN ("&token") ival = &i;
%LET i = %EVAL(&i + 1);
%LET token = %SCAN(&cats,&i);
%END;
%MEND;
%MACRO arrayassign(cats); %* do the actual lookup and assign the
appropriate subcategory value, with bounds checking;
%LOCAL i token;
%LET i = 1;
%LET token = %SCAN(&cats,&i);
%DO %WHILE (%LENGTH(&token));
IF &minarray LE id LE &maxarray THEN
category_&token.[lookups[&i,id]] = 1;
%LET i = %EVAL(&i + 1);
%LET token = %SCAN(&cats,&i);
%END;
%MEND;
%MACRO droplist(cats); %* drop unneeded variables ;
%LOCAL i token;
%LET i = 1;
%LET token = %SCAN(&cats,&i);
%DO %WHILE (%LENGTH(&token));
IF &minarray LE id LE &maxarray THEN
category_&token._0
%LET i = %EVAL(&i + 1);
%LET token = %SCAN(&cats,&i);
%END;
%MEND;
DATA old;
ARRAY lookups[&catcount,&minarray:&maxarray] _TEMPORARY_ (%EVAL
(&catcount*(&maxarray - &minarray + 1))*0); %* initialize all values to
zero ;
%arraylist(&catlist,&arraysize);
SET new;
eof = 0;
IF (_N_ EQ 1) THEN
DO WHILE (NOT eof);
SET orcldata.cat_table END=eof;
SELECT (cat);
%indexcalc(&catlist)
OTHERWISE
ABORT ABEND; %* table must have been edited between PROC SQL
execution and now;
END;
lookups(ival,id) = subcat;
END;
DO i = 1 TO 30;
%arrayassign(&catlist)
END;
DROP eof ival i %droplist(&catlist);
RUN;
This transmission may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you.