LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1997, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 2 Apr 1997 10:21:11 EDT
Reply-To:     whitloi1@WESTATPO.WESTAT.COM
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Ian Whitlock <whitloi1@WESTATPO.WESTAT.COM>
Subject:      Re[2]: FW: FW: why did this code take 22 hours to run?
Comments: To: RICH <RICH@BOALAN.ATTMAIL.COM>

Subject: Re: FW: FW: why did this code take 22 hours to run? Summary: SAS not humor. Ian Whitlock <whitloi1@westat.com>

The following is the twisted tale started by having Rich Livornese <RICH@BOALAN.ATTMAIL.COM> ask why a SET - BY on an indexed data set took 22 hours?

> 1726 DATA HHC.ERCPI; > 1727 SET HHC.ER; > 1728 BY CPI_KEY; > 1729 IF FIRST.CPI_KEY; > 1730 KEEP CPI_KEY; > 1731 RUN; > > NOTE: The data set HHC.ERCPI has 1345222 observations and 1 variables. > NOTE: The DATA statement used 22 hours 21 minutes 15.77 seconds.

Laurie Flemming suggested using a view <lfleming@actrix.gen.nz> and sorting the view. Then Clinton S Rickards <RickardsCS@AETNA.COM> responded changing Laurie's name and saying that a sort with the KEEP= option was good enough. I took the opportunity (Don Stanley mentioned Laurie in San Diego) to point out to Clinton various misunderstandings. In particular that the Laurie's view was a good idea because the KEEP= option on input to PROC SORT is actually processed on the sorted file.

This does not mean that one does not have to keep the sort variable. As shown by Rich's test

> 68 data aa; > 69 array a1 (10) x1-x10; > 70 do i=10 to 1 by -1; > 71 do j=1 to 10; > 72 a1(j)=i*10; > 73 end; > 74 output; > 75 end; > 76 > 77 > > NOTE: The data set WORK.AA has 10 observations and 12 variables. > NOTE: The DATA statement used 0.2 seconds. > > > 78 proc sort data=aa(keep=x1-x5) > 79 out=bb; > 80 by i; > ERROR: Variable I not found. > 81

To get back to the point, here is a log (SAS 6.11 under Windows 3.11) showing how a view can save time over using the KEEP= option on a big sort.

35 data w1 w2 ( keep = i ) ; 36 retain a1 - a20 'xxxxxxxxxxxxxxxxxxxxxxxxxxx' ; 37 do i = 20000 to 1 by - 1 ; output ; end ; 38 run ;

NOTE: The data set WORK.W1 has 20000 observations and 21 variables. NOTE: The data set WORK.W2 has 20000 observations and 1 variables. NOTE: The DATA statement used 11.52 seconds.

39 40 proc sort data = w1 ( keep = i ) out = t ; 41 by i ; 42 run ;

NOTE: The data set WORK.T has 20000 observations and 1 variables. NOTE: The PROCEDURE SORT used 42.52 seconds.

43 44 proc sort data = w2 out = t ; 45 by i ; 46 run ;

NOTE: The data set WORK.T has 20000 observations and 1 variables. NOTE: The PROCEDURE SORT used 1.69 seconds.

47 48 data w3 / view = w3 ; 49 set w1 ( keep = i ) ; 50 run ;

NOTE: DATA STEP view saved on file WORK.W3. NOTE: The original source statements cannot be retrieved from a stored DATA STEP view nor will a stored DATA STEP view run under a different release of the SAS system or under a different operating system. Please be sure to save the source statements for this DATA STEP view. NOTE: The DATA statement used 0.66 seconds.

51 52 proc sort data = w3 out = t ; 53 by i ; 54 run ;

NOTE: The view WORK.W3.VIEW used 7.84 seconds.

NOTE: The data set WORK.T has 20000 observations and 1 variables. NOTE: The PROCEDURE SORT used 8.4 seconds.

The KEEP= method took 42.52 seconds and the view took 0.66 + 8.4 seconds. That is a reduction in time by almost a factor of 5. Of course the sort of a narrow file wins with 1.69 seconds. The question is what does it cost to create the narrow file.

In a SUGI 22 Coder's Corner paper John Whittington points out that the overhead cost of using a view is sufficiently large that it is often just as fast to make a narrow data set before the sort as it is to use a view. In any case either is very much more efficient than using indexing or sorting with the KEEP= option.

Finally in connection with this question, Judy Loren mentioned a horror story with sorting a view under MVS. Was the OUT= parameter used? (Under Windows one simply gets a message that the data set can not be created since a view with that name already exists.) Has the problem been fixed in more recent releases? How about some testing from some MVS people?

Ian Whitlock


Back to: Top of message | Previous page | Main SAS-L page