LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2004, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 5 Feb 2004 12:23:37 -0500
Reply-To:     Talbot Michael Katz <topkatz@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Talbot Michael Katz <topkatz@MSN.COM>
Subject:      Re: SQL speed

Hi.

My serious introduction to SAS programming came BACK IN 1990 when I was chosen to provide tech support for PROC SYNCSORT in VM/CMS. It was kind of the Cinderella step-sister to the MVS (now z/OS) version of PROC SYNCSORT, and doesn't seem to exist anymore (of course, as you CMS users know, SAS is phasing out of VM entirely after SAS 8.2). When I looked at the SyncSort web site (which didn't exist back in 1991 ;-), I got the impression that PROC SYNCSORT is now available only for z/OS. Back in the uncomplicated days of SAS 5.18, PROC SYNCSORT was given away to SyncSort/CMS customers as a freebie; SyncSort didn't start to charge for it until they had to overhaul it thoroughly for SAS 6.

Just to recap a few things already mentioned... On the SAS side, options SORTNAME, SORTPGM and SORTCUTP control which sort utility (I keep wanting to say Sort Utility Vehicle ;-) SAS uses for PROC SORT. SORTPGM has three generic values, BEST | HOST | SAS, or you can use the utility name itself, if it is a valid operating environment name. If SORTPGM is BEST or HOST, then SORTNAME tells which host utility to use; the same rules apply to the SORTNAME value as the use of the utility name in SORTPGM. IF SORTPGM is BEST, then SORTCUTP allows the user to specify a size above which to always use the system sort. As Don Stanley explained, under this scenario, SyncSort is like any other host sort program -- the data passes from SAS to the sort through a "narrow channel." PROC SYNCSORT is an entirely different beast -- it works directly with the SAS data set and "kicks SAS out" until it is finished. As was also noted, there is a way to invoke PROC SYNCSORT via PROC SORT, so that you don't have to recode all your sort steps (although I don't remember how that is controlled).

From a computer science standpoint, sorting is an extremely well understood phenomenon, and nobody -- not even SyncSort -- has a magic sorting algorithm better than all the rest. SyncSort's advantage comes from a nitty-gritty understanding of how to optimize the movement of data between the processor and the various levels of memory / storage available. That said, on data sets which are small enough that sorting time doesn't really matter, SyncSort, and even PROC SYNCSORT, may not perform any better than the SAS sort, or other host sorts, simply because much of the sorting time is taken up by overhead. SyncSort has more overhead than most sort utilities because of all the decisions it makes in order to optimize the sort; by the time the size of the data set becomes an issue, that small added overhead pays back huge dividends.

In my short time at SyncSort, the issue of sorting through PROC SQL never arose, so I can't tell you whether the PROC SQL sorting uses the same engine as PROC SORT, or if it allows for the use of host sort utilities. From the experiment described by Puddin' (followed by the quote from brand spankin' new Rock'n'Roll Hall of Famer, Jackson Browne!), it is reasonable to speculate that PROC SQL does not use the host sort utility the same way PROC SORT does.

I have the following comment regarding Don Stanley's intriguing suggestion of letting PROC SYNCSORT sort SQL view output. System sort utilities, like SyncSort, allow the use of "exit programs." These are programs which do some processing of the raw data, either on the way into, or on the way out of, the sort; you can even put different exit programs on both the input and output side. Exit programs can provide flexibility, filtering, etc., but they narrow the channel between the data and the sort. If I recall correctly, the use of a where clause in an input data set for a PROC (SYNC) SORT is treated by SyncSort like an exit program; the data has to pass record by record from SAS to SyncSort, as it would for any host sort. You still have the normal advantage of using SyncSort as the host sort, but you lose the special boost you get from PROC SYNCSORT on vanilla sorts. I'm guessing that this same effect would occur when using SQL view output as the input to PROC SYNCSORT. I would be curious to compare that method with the performance obtained from creating a table with PROC SQL, and then using a vanilla PROC SYNCSORT on that table (but I no longer work in a z/OS environment, so I won't get to do that myself).

-- TMK --


Back to: Top of message | Previous page | Main SAS-L page