Date: Fri, 9 Jul 2010 13:36:25 -0400
Reply-To: Paul Dorfman <sashole@BELLSOUTH.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <sashole@BELLSOUTH.NET>
Subject: Re: Where Clause on Sorted Dataset (was Re: The expensive INTNX
function (was: Pass date values))
On Thu, 8 Jul 2010 12:23:10 -0500, Kevin Myers <KevinMyers@AUSTIN.RR.COM>
wrote:
>In the case of a sorted data set, I would *hope* that SAS would search for
>the first matching record by using a *binary*, rather than sequential,
>search. If that isn't the case, then someone needs to suggest using a
>binary search to locate the first matching observation as a performance
>enhancement.
Kevin,
One problem with this is that the only indication that a data set *might* be
sorted is SORTEDBY=, but its being set in no way guarantees that the file is
actually physically ordered by the key(s) listed:
data test (sortedby = key) ;
do key = 9, 1, 3, 2, 8, 5 ;
output ;
end ;
run ;
Look at the table TEST's properties, and the metadata plainly tells you the
data set is sorted by KEY. (It reminds me of the old Bob Virgile's puzzle,
where a file similar to the above was explicitly sorted and the ensuing step
whose BY statement relied on the sort order would fail because the SORTEDBY=
fooled SORT used without FORCE option into doing nothing:
8 proc sort data = test ;
9 by key ;
10 run ;
NOTE: Input data set is already sorted, no sorting done.)
So if SAS were to infer from SORTEDBY= that the file is indeed sorted and
therefore "binary search" it, the search algorithm will either (a) enter an
infinite loop, in which case you have at least some indication of the
trouble, albeit at an ugly expense, or (b) finish perfectly well - in which
case it will produce an erroneous result giving no token whatsoever that
something wrong has happened and which is much more dangerous.
Of course the underlying software can easily check beforehand if the file is
in fact physically in order, but this will be tantamount to a whole pass
through the file, thereby defeating the reason for using the binary search
in the first place.
The only way out of the SORTEDBY= trap would be for SAS to create a system
option banning SORTEDBY from being set by anything but a successfully
executed SORT or ORDER BY. Then if SORTEDBY= were set the binary search
against the keys listed under it could be used securely.
Kind regards
------------
Paul Dorfman
Jax, FL
------------