LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 4 Sep 2003 14:28:25 -0400
Reply-To:     john_m_wildenthal@BANKONE.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "John M. Wildenthal" <john_m_wildenthal@BANKONE.COM>
Subject:      Re: An efficiency question
Content-type: text/plain; charset="us-ascii"

Dennis Diskin has the best approach in his last post - direct matrix lookups. I will assume, as he did, that your ids are all essentially numbers. If not, you can convert them to numbers on the fly as an exercise ;)

The coding need not be that involved. This email will demonstrate a straightforward solution to that problem. Code is UNTESTED.

First off, do you really want to maintain the list of 50+ categories, each with 50+ subcategories, with each of those 2500 elements having multiple triggering conditions, inside your code? If there are 3 conditions (on average) for each different subcategory, that would be 7500 relationships. I wouldn't want to be responsible for maintaining that list, particularly within production source code. Let us assume you have access to an RDBMS from SAS (at least ODBC, right?). You can place an appropriate table out there, something like (monospace font highly recommended)

id cat subcat 3566 A 1 4863 ZZ 33 9327 YS 56 3566 U 23

And so on. Make the primary key all three fields.

Having the data in an RDBMS like this means you can have others maintain all of the mappings via an ASP page or some other simple access mechanism. You know they will eventually want to change them, so let's make the future comfortable.

Now to the coding. We need to build a BIG matrix that says what is to be set by what values. It will require (COUNT(DISTINCT cat)) x (MAX(id) -MIN(id)+1) x 8 bytes of memory. For a range of 10,000 id values and 50 categories, that would be about 4MB, well within any reasonable limits. Multiplying the range by 10 (100,000 difference between MIN and MAX) means 40MB, which any modern PC should easily handle. Make the difference 1,000,000 and you will need a nice PC (>512MB RAM) to prevent paging.

Let's get on with it. I will assume the above table is called orcldata.cat_table:

PROC SQL STIMER; SELECT COUNT(DISTINCT cat), MAX(id), MIN(id) INTO :catcount, :maxarray, :minarray FROM orcldata.cat_table; %* this is needed to build the big lookup matrix; SELECT cat, MAX(subcat) INTO :catlist SEPARATED BY ' ', :arraysize SEPARATED BY ' ' FROM orcldata.cat_table GROUP BY cat; %* this is needed to build an ARRAY for each category; QUIT;

%MACRO arraylist(cats, sizes); %* create the ARRAY statements for each category; %LOCAL i token lval; %LET i = 1; %LET token = %SCAN(&cats,&i); %DO %WHILE (%LENGTH(&token)); %LET lval = %SCAN(&sizes,&i); ARRAY category_&token.[0:&lval] category_&token._0-category_&token._&lval (%EVAL(&lval+1)*0); %LET i = %EVAL(&i + 1); %LET token = %SCAN(&cats,&i); %END; %MEND;

%MACRO indexcalc(cats); %* perform the mapping from cat to the column number in the lookup matrix; %LOCAL i token; %LET i = 1; %LET token = %SCAN(&cats,&i); %DO %WHILE (%LENGTH(&token)); WHEN ("&token") ival = &i; %LET i = %EVAL(&i + 1); %LET token = %SCAN(&cats,&i); %END; %MEND;

%MACRO arrayassign(cats); %* do the actual lookup and assign the appropriate subcategory value, with bounds checking; %LOCAL i token; %LET i = 1; %LET token = %SCAN(&cats,&i); %DO %WHILE (%LENGTH(&token)); IF &minarray LE id LE &maxarray THEN category_&token.[lookups[&i,id]] = 1; %LET i = %EVAL(&i + 1); %LET token = %SCAN(&cats,&i); %END; %MEND;

%MACRO droplist(cats); %* drop unneeded variables ; %LOCAL i token; %LET i = 1; %LET token = %SCAN(&cats,&i); %DO %WHILE (%LENGTH(&token)); IF &minarray LE id LE &maxarray THEN category_&token._0 %LET i = %EVAL(&i + 1); %LET token = %SCAN(&cats,&i); %END; %MEND;

DATA old; ARRAY lookups[&catcount,&minarray:&maxarray] _TEMPORARY_ (%EVAL (&catcount*(&maxarray - &minarray + 1))*0); %* initialize all values to zero ; %arraylist(&catlist,&arraysize); SET new; eof = 0; IF (_N_ EQ 1) THEN DO WHILE (NOT eof); SET orcldata.cat_table END=eof; SELECT (cat); %indexcalc(&catlist) OTHERWISE ABORT ABEND; %* table must have been edited between PROC SQL execution and now; END; lookups(ival,id) = subcat; END; DO i = 1 TO 30; %arrayassign(&catlist) END; DROP eof ival i %droplist(&catlist); RUN;

This transmission may contain information that is privileged, confidential and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is STRICTLY PROHIBITED. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format. Thank you.


Back to: Top of message | Previous page | Main SAS-L page