Date: Wed, 16 Dec 1998 21:39:06 +0000
Reply-To: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Subject: Re: XMAS SASTip: Quick Table Lookup by Hashing
In article <firstname.lastname@example.org>, pdorfma@FL6612MAI
>It is about time to exchange SAS gifts. Here is mine to you.
>Consider the following (toooo very common) problem:
>One file, SMALL, contains a variable SKEY. Another file, LARGE, contains a
>variable LKEY and maybe other variables, for instance, SMTHELSE. Within the
>limits of SAS, what is the most efficient way to match SMALL and LARGE by
>SKEY and LKEY?
>For certainty, assume that the number of records in LARGE is &N_LARGE =
>10,000,000 and that the number of records in SMALL, &N_SMALL, may vary from
>1,000 to 2,000,000. Let us also limit the case to KEYs being non-negative
>Conclusion: A fifty or so lines of DATA step code seems like a pretty cheap
>price for being able to subset 10 million records by 2 million in about a
>minute, in all.
>Happy Holiday, everyone!
>Paul M. Dorfman
>Citibank Universal Card Services
>Decision Support Systems
Thanks Paul, that's a serious piece of work.
Interesting to see your approach, to avoid naming clashes with a random
name generator. Others have suggested this may be a weakness.
In this area, would you consider imposing a standard approach which
tries to avoid the "sod's law" risk......
"if something can go wrong, it will,
and at the worst possible time....."
alternative design for global unique naming - to avoid "sod's law" risk
Generate a global macro variable to act as the pool counter
providing the next free number in global name space.
rules for using the global name pool counter macro variable
rules: 1 if the name doesn't exist already, create it as a global with
value 2, and use 1 as your returned value
2 if the name exists and is numeric, return that value & add 1
to the pool counter
3 if the name exists and is not numeric then use the next fall-
back substitute and apply through rules 1 and 2
4 rules to limit fall-back substitutes ==> shoot the cause
How would you name the "global name pool counter macro variable" ?