Date: Wed, 31 Jul 2002 16:55:36 -0400
Reply-To: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <Paul.Dorfman@BCBSFL.COM>
Subject: Re: Consolidating categorical data
Content-Type: text/plain; charset=iso-8859-1
> From: paula D [mailto:sophe@USA.NET]
Quoth paula D [firstname.lastname@example.org]:
> 2. I tried ED's receipe and Paul Dorfman's expanded version. Paul's
> version, as I observed during the past at this forcum, is always more
> thougtful and thorough. This case is no exception. Special salute to
> the chief.
Within the scope of the problem as you presented it, the Ed Heaton's version
is not in the least more thoughtful and/or thorough than mine. In fact, it
is more thoughtful, for it takes advantage of the known range of the data
(integers falling on a limited range) to address the problem in the most
computer-efficient manner. It would have never occurred to my friend Ed (a
top-notch programmer in general, and SASS programmer at that, whose
thoroughness in programming matters far exceeds mine) to choose the approach
he offered if he had known in advance how many digits the tokens in question
contained in reality.
As to my version, it is not expanded version of Ed's code, but rather an
approach taken from a diametrically opposite standpoint. His code is based
on distribution, i.e. spreading values as far apart as possible, ideally
finding a unique place for each possible value - which is only possible when
a key is in a limited range. I routinely take the Ed's path, perhaps more
often than any other SASS programmer, to which anyone aware of key-indexing,
bitmapping, or hashing could testify. In fact, after I had read your problem
description, I spent some time thinking whether the number of possible token
permutations within the limits you originally specified would permit to use
some sort of a direct-addressed table to track "combinatorially duplicate"
token combinations. After I saw Ed's solution, I decided that I had nothing
to add in that direction. Trying to beat the elegance of Ed's code seemed
only to make such attempts more maladroit.
So, ironically, it was due to Ed having posted his proposal that I decided
to grab it from the other end of the algorithmic spectrum, i.e. use key
comparisons, which I usually try to avoid, since comparison-based schemes in
sorting and searching (and your problem is kind of mix of both) usually run
much slower than ones based on distribution. Their advantage lies in a
greater generality. It just so happened - and I had not foreseen it - that
with your monster keys, Ed's formula would far exceed the maximum integer
precision of a SAS number, thus effectively making a comparison-based
approach the only practical way to go.
Paul M. Dorfman
Blue Cross Blue Shield of Florida, Inc., and its subsidiary and
affiliate companies are not responsible for errors or omissions in this e-mail message. Any personal comments made in this e-mail do not reflect the views of Blue Cross Blue Shield of Florida, Inc.