Date: Tue, 27 Oct 1998 08:32:39 -0500
Reply-To: "Fehd, Ronald J." <rjf2@CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Fehd, Ronald J." <rjf2@CDC.GOV>
Subject: Re: Sorting
/From: King, Raymond [SMTP:raymondk@MOCR.OAPI.COM]
/Does anyone know what algorithm is used as the default with PROC SORT.
No, but you can check out http://www.syncsort.com/ for another good
/The recent discussion of SORTing efficiency jump-started my curiosity.
/Dorfman's 'Insertion Sort' solution to a recent question was well received
/and proved to be "three times as fast" as other methods. Yet, Insertion
/Sort is on the Order of N**2 in complexity, and there are more efficient
Yeah, but, ... :-q
After having spent a week reading and rereading the internal sorting chapter
of Knuth's Vol 3,
there are some other good sorting algorithms and I got three running: two
counting algorithms, and the insertion sort. I may write a paper on this,
next year. ;-)
The obvious best parts of the insertion sort are:
A. it sorts in place, meaning you have little overhead except the temp
B. you can read and comprehend the algorithm ... maybe that ought to be the
first item! LOL
/It would be interesting to see how much better things can get using an
/algorithm that's on the Order of N complexity.
Don't hold your breath on that one unless N=2. :-\
Bit-flippers -- and I use the term reverently, because they write in
machine-language or assembler --
have been working on this problem since the 1930's. The best algorithms that
Knuth discusses are on the order of N log N. We're talking programming
complexity and overhead now, in order to cut down on the number of
comparisons and moves. And of course, our best algorithm will handle
character as well as numeric variables.
While I have the floor: I was thinking further of the <sorting 3000 vars,
many empty> problem
and was wondering if a better solution might not be:
A. different data structure:
A.1 separate variables for vendors
A.2 a linked list which would retain the original data structure,
but add overhead, and be simpler to update.
B. an insertion algorithm that runs with the update of each day's data.
Ron Fehd the macro maven CDC Atlanta GA