Date: Tue, 17 Jun 1997 15:21:05 +0100
Reply-To: John Whittington <johnw@MAG-NET.CO.UK>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Whittington <johnw@MAG-NET.CO.UK>
Subject: Re: SAS Trap: SORT NODUPKEY -- non-key variable value select
Content-Type: text/plain; charset="us-ascii"
On Mon, 16 Jun 1997. "Karsten M. Self" <kmself@IX.NETCOM.COM>
>The reason I recommend SQL over a sort with a keep or drop list (or a =
>DATA step with BY and FIRST. processing) is because of the peculiar =
>behavior of SORT with dropped variables, most particularly noted by John =
>Wittington -- the dropped variables seem not to be excluded until =
>*after* they've been processed by SORT. Most unappealing. I've not =
>tested to see if this behavior has changed recently (e.g.: v6.12).
No, I'm afraid this 'feature' is still alive and well in v6.12; I've just
re-run my SUGI 22 code' on 6.12 (with the 500 var x 10,000 obs test
dataset). KEEP on the input dataset is slightly faster than KEEP on the
output dataset, which I don't quite understand - but still takes over 140
times longer than is the case when one feeds the SORT with a single variable
dataset or data view:
13 proc sort data = test out = done ;
14 by x1 ;
15 run ;
NOTE: The data set WORK.DONE has 10000 observations and 500 variables.
NOTE: The PROCEDURE SORT used 1 minute 22.67 seconds.
17 proc sort data = test out = done (keep = x1) ;
18 by x1 ;
19 run ;
NOTE: The data set WORK.DONE has 10000 observations and 1 variables.
NOTE: The PROCEDURE SORT used 1 minute 1.23 seconds.
21 proc sort data = test (keep = x1) out = done ;
22 by x1 ;
23 run ;
NOTE: The data set WORK.DONE has 10000 observations and 1 variables.
NOTE: The PROCEDURE SORT used 1 minute 3.16 seconds.
25 data narrow (keep = x1) ;
26 set test ;
27 run ;
NOTE: The data set WORK.NARROW has 10000 observations and 1 variables.
NOTE: The DATA statement used 12.96 seconds.
28 proc sort data = narrow out = done ;
29 by x1 ;
30 run ;
NOTE: The data set WORK.DONE has 10000 observations and 1 variables.
NOTE: The PROCEDURE SORT used 0.44 seconds.
32 data narrow2 (keep = x1) / view = narrow2 ;
33 set test ;
34 run ;
NOTE: DATA STEP view saved on file WORK.NARROW2.
NOTE: The DATA statement used 0.55 seconds.
35 proc sort data = narrow2 out = done ;
36 by x1 ;
37 run ;
NOTE: The view WORK.NARROW2.VIEW used 12.8 seconds.
NOTE: The data set WORK.DONE has 10000 observations and 1 variables.
NOTE: The PROCEDURE SORT used 13.24 seconds.
John
-----------------------------------------------------------
Dr John Whittington, Voice: +44 1296 730225
Mediscience Services Fax: +44 1296 738893
Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk
Buckingham MK18 4EL, UK CompuServe: 100517,3677
-----------------------------------------------------------