LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2000, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 26 May 2000 09:59:14 -0700
Reply-To:     Cassell.David@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <Cassell.David@EPAMAIL.EPA.GOV>
Subject:      Re: problem sorting array
Comments: To: bolvandubina@netscape.net
Content-type: text/plain; charset=us-ascii

Bolvan,

Richard Graham has already given you a nice explanation for why the given algorithm takes 10,000 times longer on an array 100 times larger. This is what we in the biz call an O(N**2) algorithm. It takes an amount of time proportional to the square of the array size.

I'm sorry, but this is the *slowest* way to sort your array. Your SAS guru may know SAS, but may be a little inexperienced when it comes to comp-sci algorithms. The given code should even do worse than the bubble-sort, which is notoriously slow on large unsorted datasets. At least bubble-sort looks more like N**2/2 instead of N**2.

If you have no prior information about the data in your array, then the best you can do is a sort which takes roughly N*logN time. An implementation of heapsort or quicksort or mergesort will do this for you. If we assume the computational features implicit in Richard Graham's post, then for your 100,000 number array you would need on the order of NlogN = 1.15E6 iterations, which would run in about .82 seconds [if 1 million iterations do indeed run in .71 seconds on your system]. That might be all the time-savings you need.

If you have additional information on the data, such the internal structure of the keys, then you can actually do better than NlogN. The radix sorts do this.. but are not generally applicable to sorting problems.

And there is one additional approach that you might consider. You have the array in a data step. Keep the variables. Transpose them. Use PROC SORT. Transpose back. Read them back into an array. This ought to be slower than doing an optimal sort in RAM, but it may still be a lot better than using the solution you were trying.

Paul Dorfman will probably have an even faster approach in mind.

David -- David Cassell, OAO cassell@mail.cor.epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page