LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 9 Jul 2010 13:36:25 -0400
Reply-To:     Paul Dorfman <sashole@BELLSOUTH.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <sashole@BELLSOUTH.NET>
Subject:      Re: Where Clause on Sorted Dataset (was Re: The expensive INTNX
              function (was: Pass date values))
Comments: To: "Kevin A. Myers" <KevinMyers@AUSTIN.RR.COM>

On Thu, 8 Jul 2010 12:23:10 -0500, Kevin Myers <KevinMyers@AUSTIN.RR.COM> wrote:

>In the case of a sorted data set, I would *hope* that SAS would search for >the first matching record by using a *binary*, rather than sequential, >search. If that isn't the case, then someone needs to suggest using a >binary search to locate the first matching observation as a performance >enhancement.

Kevin,

One problem with this is that the only indication that a data set *might* be sorted is SORTEDBY=, but its being set in no way guarantees that the file is actually physically ordered by the key(s) listed:

data test (sortedby = key) ; do key = 9, 1, 3, 2, 8, 5 ; output ; end ; run ;

Look at the table TEST's properties, and the metadata plainly tells you the data set is sorted by KEY. (It reminds me of the old Bob Virgile's puzzle, where a file similar to the above was explicitly sorted and the ensuing step whose BY statement relied on the sort order would fail because the SORTEDBY= fooled SORT used without FORCE option into doing nothing:

8 proc sort data = test ; 9 by key ; 10 run ; NOTE: Input data set is already sorted, no sorting done.)

So if SAS were to infer from SORTEDBY= that the file is indeed sorted and therefore "binary search" it, the search algorithm will either (a) enter an infinite loop, in which case you have at least some indication of the trouble, albeit at an ugly expense, or (b) finish perfectly well - in which case it will produce an erroneous result giving no token whatsoever that something wrong has happened and which is much more dangerous.

Of course the underlying software can easily check beforehand if the file is in fact physically in order, but this will be tantamount to a whole pass through the file, thereby defeating the reason for using the binary search in the first place.

The only way out of the SORTEDBY= trap would be for SAS to create a system option banning SORTEDBY from being set by anything but a successfully executed SORT or ORDER BY. Then if SORTEDBY= were set the binary search against the keys listed under it could be used securely.

Kind regards ------------ Paul Dorfman Jax, FL ------------


Back to: Top of message | Previous page | Main SAS-L page