Date: Thu, 5 Feb 2004 12:23:37 -0500
Reply-To: Talbot Michael Katz <topkatz@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Talbot Michael Katz <topkatz@MSN.COM>
Subject: Re: SQL speed
Hi.
My serious introduction to SAS programming came BACK IN 1990 when I was
chosen to provide tech support for PROC SYNCSORT in VM/CMS. It was kind of
the Cinderella step-sister to the MVS (now z/OS) version of PROC SYNCSORT,
and doesn't seem to exist anymore (of course, as you CMS users know, SAS is
phasing out of VM entirely after SAS 8.2). When I looked at the SyncSort
web site (which didn't exist back in 1991 ;-), I got the impression that
PROC SYNCSORT is now available only for z/OS. Back in the uncomplicated
days of SAS 5.18, PROC SYNCSORT was given away to SyncSort/CMS customers as
a freebie; SyncSort didn't start to charge for it until they had to
overhaul it thoroughly for SAS 6.
Just to recap a few things already mentioned...
On the SAS side, options SORTNAME, SORTPGM and SORTCUTP control which sort
utility (I keep wanting to say Sort Utility Vehicle ;-) SAS uses for PROC
SORT. SORTPGM has three generic values, BEST | HOST | SAS, or you can use
the utility name itself, if it is a valid operating environment name. If
SORTPGM is BEST or HOST, then SORTNAME tells which host utility to use; the
same rules apply to the SORTNAME value as the use of the utility name in
SORTPGM. IF SORTPGM is BEST, then SORTCUTP allows the user to specify a
size above which to always use the system sort. As Don Stanley explained,
under this scenario, SyncSort is like any other host sort program -- the
data passes from SAS to the sort through a "narrow channel." PROC SYNCSORT
is an entirely different beast -- it works directly with the SAS data set
and "kicks SAS out" until it is finished. As was also noted, there is a
way to invoke PROC SYNCSORT via PROC SORT, so that you don't have to recode
all your sort steps (although I don't remember how that is controlled).
From a computer science standpoint, sorting is an extremely well understood
phenomenon, and nobody -- not even SyncSort -- has a magic sorting
algorithm better than all the rest. SyncSort's advantage comes from a
nitty-gritty understanding of how to optimize the movement of data between
the processor and the various levels of memory / storage available. That
said, on data sets which are small enough that sorting time doesn't really
matter, SyncSort, and even PROC SYNCSORT, may not perform any better than
the SAS sort, or other host sorts, simply because much of the sorting time
is taken up by overhead. SyncSort has more overhead than most sort
utilities because of all the decisions it makes in order to optimize the
sort; by the time the size of the data set becomes an issue, that small
added overhead pays back huge dividends.
In my short time at SyncSort, the issue of sorting through PROC SQL never
arose, so I can't tell you whether the PROC SQL sorting uses the same
engine as PROC SORT, or if it allows for the use of host sort utilities.
From the experiment described by Puddin' (followed by the quote from brand
spankin' new Rock'n'Roll Hall of Famer, Jackson Browne!), it is reasonable
to speculate that PROC SQL does not use the host sort utility the same way
PROC SORT does.
I have the following comment regarding Don Stanley's intriguing suggestion
of letting PROC SYNCSORT sort SQL view output. System sort utilities, like
SyncSort, allow the use of "exit programs." These are programs which do
some processing of the raw data, either on the way into, or on the way out
of, the sort; you can even put different exit programs on both the input
and output side. Exit programs can provide flexibility, filtering, etc.,
but they narrow the channel between the data and the sort. If I recall
correctly, the use of a where clause in an input data set for a PROC (SYNC)
SORT is treated by SyncSort like an exit program; the data has to pass
record by record from SAS to SyncSort, as it would for any host sort. You
still have the normal advantage of using SyncSort as the host sort, but you
lose the special boost you get from PROC SYNCSORT on vanilla sorts. I'm
guessing that this same effect would occur when using SQL view output as
the input to PROC SYNCSORT. I would be curious to compare that method with
the performance obtained from creating a table with PROC SQL, and then
using a vanilla PROC SYNCSORT on that table (but I no longer work in a z/OS
environment, so I won't get to do that myself).
-- TMK --
|