Date: Mon, 10 Apr 2006 13:49:01 -0400
Reply-To: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Dorfman, Paul" <paul.dorfman@FCSO.COM>
Subject: Re: Efficiency of Proc sort with out =
On Mon, 10 Apr 2006 10:21:38 -0700, Jack Hamilton <jfh@STANFORDALUMNI.ORG>
wrote (in part):
>That said, PROC SORT is one of the most heavily researched and
>optimized parts of the SAS system,
Jack,
Indeed. In fact,so much so that it appears to read from a file and sort
faster than the new CALL SORTN routine sorts an equivalent amount of data
in memory. Consider (9.1.3 under AIX):
25 data a ;
26 do key = 1e6 to 1 by -1 ;
27 output ;
28 end ;
29 run ;
30
31 proc sort data = a out = _null_ ;
32 by key ;
33 run ;
NOTE: There were 1000000 observations read from the data set USER.A.
NOTE: SAS threaded sort was used.
NOTE: PROCEDURE SORT used (Total process time):
real time 0.59 seconds
34
35 data _null_ ;
36 array key [1000000] ;
37 do _n_ = 1 to 1e5 ;
38 key [_n_] = 1e5 - _n_ + 1 ;
39 end ;
40 run ;
NOTE: DATA statement used (Total process time):
real time 8.72 seconds
41
42 data _null_ ;
43 array key [1000000] ;
44 do _n_ = 1 to 1e5 ;
45 key [_n_] = 1e5 - _n_ + 1 ;
46 end ;
47 call sortn (of key[*]) ;
48 run ;
NOTE: The SORTN function or routine is experimental in release 9.1.
NOTE: DATA statement used (Total process time):
real time 9.55 seconds
Subtracting the time taken by the second DATA _NULL_ from that of the
first yields 0.83 seconds, which is definitely no faster than the 0.59
seconds taken by the PROC SORT.
>and it's hard to predict what might trigger an optimization that results
>in a noticeable change in run time.
The only thing that can do that is PROC SYNCSORT, which runs circles
around PROC SORT (2 to 5 times faster plus extra SUM functionality),
however this is limited to the mainframe only.
Kind regards
------------
Paul Dorfman
Jax, FL
------------