Date: Fri, 26 May 2000 09:59:14 -0700
Reply-To: Cassell.David@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject: Re: problem sorting array
Content-type: text/plain; charset=us-ascii
Bolvan,
Richard Graham has already given you a nice explanation for why
the given algorithm takes 10,000 times longer on an array 100
times larger. This is what we in the biz call an O(N**2) algorithm.
It takes an amount of time proportional to the square of the array
size.
I'm sorry, but this is the *slowest* way to sort your array.
Your SAS guru may know SAS, but may be a little inexperienced
when it comes to comp-sci algorithms. The given code should even
do worse than the bubble-sort, which is notoriously slow on large
unsorted datasets. At least bubble-sort looks more like N**2/2
instead of N**2.
If you have no prior information about the data in your array,
then the best you can do is a sort which takes roughly N*logN
time. An implementation of heapsort or quicksort or mergesort
will do this for you. If we assume the computational features
implicit in Richard Graham's post, then for your 100,000 number
array you would need on the order of NlogN = 1.15E6 iterations,
which would run in about .82 seconds [if 1 million iterations
do indeed run in .71 seconds on your system]. That might be all
the time-savings you need.
If you have additional information on the data, such the internal
structure of the keys, then you can actually do better than NlogN.
The radix sorts do this.. but are not generally applicable to
sorting problems.
And there is one additional approach that you might consider.
You have the array in a data step. Keep the variables. Transpose
them. Use PROC SORT. Transpose back. Read them back into an
array. This ought to be slower than doing an optimal sort in
RAM, but it may still be a lot better than using the solution
you were trying.
Paul Dorfman will probably have an even faster approach in mind.
David
--
David Cassell, OAO cassell@mail.cor.epa.gov
Senior computing specialist
mathematical statistician