LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 17 Dec 1998 19:01:26 -0500
Reply-To:     pdorfma@FL6612MAILEX4.UCS.ATT.COM
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         pdorfma@FL6612MAILEX4.UCS.ATT.COM
Subject:      Re: XMAS SASTip: Quick Table Lookup by Hashing
Content-Type: text/plain; charset="iso-8859-1"

Peter Crawford, in part, wrote:

>Interesting to see your approach, to avoid naming clashes with a random >name generator. Others have suggested this may be a weakness. >In this area, would you consider imposing a standard approach which >tries to avoid the "sod's law" risk...... > "if something can go wrong, it will, > and at the worst possible time....." > >alternative design for global unique naming - to avoid "sod's law" risk > Generate a global macro variable to act as the pool counter > providing the next free number in global name space. > >rules for using the global name pool counter macro variable > >rules: 1 if the name doesn't exist already, create it as a global with > value 2, and use 1 as your returned value > 2 if the name exists and is numeric, return that value & add 1 > to the pool counter > 3 if the name exists and is not numeric then use the next fall- > back substitute and apply through rules 1 and 2 > 4 rules to limit fall-back substitutes ==> shoot the cause > >How would you name the "global name pool counter macro variable" ?


I find the method of "random macro aliases" less fuzzy. In the posting about hashing, I resorted to the "lazy Susan" approach wishing not to digress from the main subject. Normally, though, I "randomly macroize" names for DATA step variables, temporary arrays and destination labels (all of which must be "localized") used in a macro-based DATA step routine by making each of the eight available characters random. This way, we have two radices: Radix 27 for the first character (underscore and all letters) and radix 37 for the remaining seven, which yields a pool of 2563160682591 distinct names to choose from. Hence, the probability that some name selected from the pool will match an already existing 8-byte name purely by chance is about 3.9E-13; and for a name containing less than 8 characters, it is strictly 0. Of course, this technique's implementation is a bit more involved than the "lazy Susan". I have written a little macro, %RANVAR(), that accepts blank-separated lists of names of different types to randomize:


The macro is passed the names to "localize" and should be invoked before the very first macro reference to any of the listed names. After the DATA step where the macro routine (in this case, it was %HLOAD and %HSEARCH) has finished, the %RANVAR produces a cross-reference in the log listing the "normal" names and their random aliases. A more detailed discussion of the method is published in the SESUG'98 Proceedings. If you have no access to this publication but are still interested, please drop me a note and I will send you an electronic copy of the paper.

Now, for long names in the V7, the %RANVAR() would produce the mind-boggling 1.11E50 combinations. However, these complications are not really necessary, and the %substr(%sysfunc()...) method will more than suffice, plus it takes less time to create names. With 31 bytes to spare, the "lazy Susan" will result in the pool of 1E31. How big is the chance of hitting something purely by chance in this pool, even if we already have, say, a thousand names in the application system? If you reply "if something can go wrong, it will, and at the worst possible time.....", I would ask, then, would you expect, intuitively, for 10000 molecules of air chaotically moving inside a 4-inch box to assemble in one half of the box spontaneously? Remember, theoretically, it can "go wrong", too, but will you EVER observe such a configuration?

Apparently, SAS programming poses very interesting questions quite beyond its realm! One may not always realize that in data processing, we rely on probability a lot without suffering consequences. For instance, quicksort's worst case scenario is sorting in O(N**2) time, no matter what measures may have been taken to prevent it. But the same measures guarantee that this behavior will hardly ever occur, and the algorithm have been applied with confidence in a great variety of applications without any noticeable harm.

Happy Holidays to you!

Kind regards, Paul

++++++++++++++++++++++++++++++ Paul M. Dorfman Citibank UCS Decision Support Systems Jacksonville, FL ++++++++++++++++++++++++++++++

Back to: Top of message | Previous page | Main SAS-L page