Date: Wed, 16 Dec 1998 21:39:06 +0000
Reply-To: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: Peter Crawford <Peter@CRAWFORDSOFTWARE.DEMON.CO.UK>
Subject: Re: XMAS SASTip: Quick Table Lookup by Hashing
In-Reply-To: <913776318.1123129.0@vm121.akh-wien.ac.at>
In article <913776318.1123129.0@vm121.akh-wien.ac.at>, pdorfma@FL6612MAI
LEX4.UCS.ATT.COM writes
>Dear SAS-Lers,
>
>It is about time to exchange SAS gifts. Here is mine to you.
>
>Consider the following (toooo very common) problem:
>
>One file, SMALL, contains a variable SKEY. Another file, LARGE, contains a
>variable LKEY and maybe other variables, for instance, SMTHELSE. Within the
>limits of SAS, what is the most efficient way to match SMALL and LARGE by
>SKEY and LKEY?
>
>For certainty, assume that the number of records in LARGE is &N_LARGE =
>10,000,000 and that the number of records in SMALL, &N_SMALL, may vary from
>1,000 to 2,000,000. Let us also limit the case to KEYs being non-negative
(snip)
>
>Conclusion: A fifty or so lines of DATA step code seems like a pretty cheap
>price for being able to subset 10 million records by 2 million in about a
>minute, in all.
>
>
>Happy Holiday, everyone!
>
>Kind regards,
>Paul
>
>++++++++++++++++++++++++++++++++
>Paul M. Dorfman
>Citibank Universal Card Services
>Decision Support Systems
>Jacksonville, FL
>++++++++++++++++++++++++++++++++
Thanks Paul, that's a serious piece of work.
Interesting to see your approach, to avoid naming clashes with a random
name generator. Others have suggested this may be a weakness.
In this area, would you consider imposing a standard approach which
tries to avoid the "sod's law" risk......
"if something can go wrong, it will,
and at the worst possible time....."
alternative design for global unique naming - to avoid "sod's law" risk
Generate a global macro variable to act as the pool counter
providing the next free number in global name space.
rules for using the global name pool counter macro variable
rules: 1 if the name doesn't exist already, create it as a global with
value 2, and use 1 as your returned value
2 if the name exists and is numeric, return that value & add 1
to the pool counter
3 if the name exists and is not numeric then use the next fall-
back substitute and apply through rules 1 and 2
4 rules to limit fall-back substitutes ==> shoot the cause
How would you name the "global name pool counter macro variable" ?
seasonal greetings
--
Peter Crawford
|