LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2002, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 31 Jul 2002 16:55:36 -0400
Reply-To:     "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject:      Re: Consolidating categorical data
Comments: To: "sophe88@yahoo.com" <sophe88@yahoo.com>
Content-Type: text/plain; charset=iso-8859-1

> From: paula D [mailto:sophe@USA.NET] Quoth paula D [sophe88@yahoo.com]:

> 2. I tried ED's receipe and Paul Dorfman's expanded version. Paul's > version, as I observed during the past at this forcum, is always more > thougtful and thorough. This case is no exception. Special salute to > the chief.

Paula,

Within the scope of the problem as you presented it, the Ed Heaton's version is not in the least more thoughtful and/or thorough than mine. In fact, it is more thoughtful, for it takes advantage of the known range of the data (integers falling on a limited range) to address the problem in the most computer-efficient manner. It would have never occurred to my friend Ed (a top-notch programmer in general, and SASS programmer at that, whose thoroughness in programming matters far exceeds mine) to choose the approach he offered if he had known in advance how many digits the tokens in question contained in reality.

As to my version, it is not expanded version of Ed's code, but rather an approach taken from a diametrically opposite standpoint. His code is based on distribution, i.e. spreading values as far apart as possible, ideally finding a unique place for each possible value - which is only possible when a key is in a limited range. I routinely take the Ed's path, perhaps more often than any other SASS programmer, to which anyone aware of key-indexing, bitmapping, or hashing could testify. In fact, after I had read your problem description, I spent some time thinking whether the number of possible token permutations within the limits you originally specified would permit to use some sort of a direct-addressed table to track "combinatorially duplicate" token combinations. After I saw Ed's solution, I decided that I had nothing to add in that direction. Trying to beat the elegance of Ed's code seemed only to make such attempts more maladroit.

So, ironically, it was due to Ed having posted his proposal that I decided to grab it from the other end of the algorithmic spectrum, i.e. use key comparisons, which I usually try to avoid, since comparison-based schemes in sorting and searching (and your problem is kind of mix of both) usually run much slower than ones based on distribution. Their advantage lies in a greater generality. It just so happened - and I had not foreseen it - that with your monster keys, Ed's formula would far exceed the maximum integer precision of a SAS number, thus effectively making a comparison-based approach the only practical way to go.

Kind regards, ================== Paul M. Dorfman Jacksonville, FL ==================

Blue Cross Blue Shield of Florida, Inc., and its subsidiary and affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.


Back to: Top of message | Previous page | Main SAS-L page