Date: Thu, 17 Dec 1998 19:01:26 -0500
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
Subject: Re: XMAS SASTip: Quick Table Lookup by Hashing
Content-Type: text/plain; charset="iso-8859-1"
Peter Crawford, in part, wrote:
>Interesting to see your approach, to avoid naming clashes with a random
>name generator. Others have suggested this may be a weakness.
>In this area, would you consider imposing a standard approach which
>tries to avoid the "sod's law" risk......
> "if something can go wrong, it will,
> and at the worst possible time....."
>alternative design for global unique naming - to avoid "sod's law" risk
> Generate a global macro variable to act as the pool counter
> providing the next free number in global name space.
>rules for using the global name pool counter macro variable
>rules: 1 if the name doesn't exist already, create it as a global with
> value 2, and use 1 as your returned value
> 2 if the name exists and is numeric, return that value & add 1
> to the pool counter
> 3 if the name exists and is not numeric then use the next fall-
> back substitute and apply through rules 1 and 2
> 4 rules to limit fall-back substitutes ==> shoot the cause
>How would you name the "global name pool counter macro variable" ?
I find the method of "random macro aliases" less fuzzy. In the posting about
hashing, I resorted to the "lazy Susan" approach wishing not to digress from
the main subject. Normally, though, I "randomly macroize" names for DATA
step variables, temporary arrays and destination labels (all of which must
be "localized") used in a macro-based DATA step routine by making each of
the eight available characters random. This way, we have two radices: Radix
27 for the first character (underscore and all letters) and radix 37 for the
remaining seven, which yields a pool of 2563160682591 distinct names to
choose from. Hence, the probability that some name selected from the pool
will match an already existing 8-byte name purely by chance is about
3.9E-13; and for a name containing less than 8 characters, it is strictly 0.
Of course, this technique's implementation is a bit more involved than the
"lazy Susan". I have written a little macro, %RANVAR(), that accepts
blank-separated lists of names of different types to randomize:
%LET STRFIRST = _ABCDEFGHIJKLMOPQRSTUVWXYZ;
%LET STRALL = 0123456789&STRFIRST;
%LET PARMNUM = 1;
%IF &DROP %THEN DROP;
%LET VNAME = %SCAN(&PLIST,&PARMNUM);
%DO %WHILE(&VNAME NE);
%LET FLN =
%LET FL = %SUBSTR(&STRFIRST,&FLN,1);
%LET ALL = ;
%DO N=1 %TO 7;
%LET ALL&N =
%LET ALL = &ALL%SUBSTR(
%LET &VNAME = &FL&ALL;
%IF &DROP %THEN &FL&ALL;
%PUT &FL&ALL | &VNAME;
%LET PARMNUM = %EVAL(&PARMNUM+1);
%LET VNAME = %SCAN(&PLIST,&PARMNUM);
%IF &DROP %THEN %STR(;);
%IF &VARNAMES NE %THEN %DO;
%PUT ALIAS | VARIABLE;
%IF &ARRNAMES NE %THEN %DO;
%PUT ALIAS | ARRAY;
%IF &LBLNAMES NE %THEN %DO;
%PUT ALIAS | LABEL;
The macro is passed the names to "localize" and should be invoked before the
very first macro reference to any of the listed names.
After the DATA step where the macro routine (in this case, it was %HLOAD and
%HSEARCH) has finished, the %RANVAR produces a cross-reference in the log
listing the "normal" names and their random aliases. A more detailed
discussion of the method is published in the SESUG'98 Proceedings. If you
have no access to this publication but are still interested, please drop me
a note and I will send you an electronic copy of the paper.
Now, for long names in the V7, the %RANVAR() would produce the mind-boggling
1.11E50 combinations. However, these complications are not really
necessary, and the %substr(%sysfunc()...) method will more than suffice,
plus it takes less time to create names. With 31 bytes to spare, the "lazy
Susan" will result in the pool of 1E31. How big is the chance of hitting
something purely by chance in this pool, even if we already have, say, a
thousand names in the application system? If you reply "if something can go
wrong, it will, and at the worst possible time.....", I would ask, then,
would you expect, intuitively, for 10000 molecules of air chaotically moving
inside a 4-inch box to assemble in one half of the box spontaneously?
Remember, theoretically, it can "go wrong", too, but will you EVER observe
such a configuration?
Apparently, SAS programming poses very interesting questions quite beyond
its realm! One may not always realize that in data processing, we rely on
probability a lot without suffering consequences. For instance, quicksort's
worst case scenario is sorting in O(N**2) time, no matter what measures may
have been taken to prevent it. But the same measures guarantee that this
behavior will hardly ever occur, and the algorithm have been applied with
confidence in a great variety of applications without any noticeable harm.
Happy Holidays to you!
Paul M. Dorfman
Citibank UCS Decision Support Systems