LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 31 Dec 2008 16:46:14 -0500
Reply-To:     Arthur Tabachneck <art297@NETSCAPE.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Arthur Tabachneck <art297@NETSCAPE.NET>
Subject:      Re: How to each group last record ?
Comments: To: "./ ADD NAME=data _null_;" <iebupdte@GMAIL.COM>

Based on a test I just ran I'd have to say 'use' rather than 'consider', especially if one is on a multiple processor system running 9.1.3. I'd be interested if someone could run the same comparison on a single processor system, as well as on 9.2 (since I hear that hash includes greater optimization in 9.2 as compared to 9.1.3).

I ran two sets of tests, one with 19,000,000 records and the other with 190,000,000 million records. The test file was simply:

77 data class (drop=i); 78 set sashelp.class; 79 do i=1 to 1000000; 80 if i=1000000 then lastone=1; 81 else lastone=0; 82 output; 83 end; 84 run;

For Howard's proposed hash solution I ran:

85 data _null_; 86 set class end=lastobs; 87 if _n_ = 1 then do; 88 declare hash lasts(ordered:'ascending'); 89 result = lasts.defineKey('age'); 90 result = lasts.defineData( 'name' 91 , 'sex' 92 , 'age' 93 , 'height' 94 , 'weight' 95 , 'lastone'); 96 result = lasts.defineDone(); 97 end; 98 result = lasts.replace( key: age 99 , data:name 100 , data:sex 101 , data:age 102 , data:height 103 , data:weight 104 , data:lastone); 105 if lastobs then rc = lasts.output(dataset: 'lasts'); 106 run;

NOTE: The data set WORK.LASTS has 6 observations and 6 variables. NOTE: There were 19000000 observations read from the data set WORK.CLASS. NOTE: DATA statement used (Total process time): real time 32.21 seconds cpu time 32.15 seconds

For datanull's proc summary solution I ran:

107 proc summary nway missing data=class; 108 class age; 109 output out=work.lastObsInAges(drop=_:) 110 idgroup(last out(name sex height weight lastone)=) 111 ; 112 run;

NOTE: There were 19000000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.LASTOBSINAGES has 6 observations and 6 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 11.53 seconds cpu time 28.98 seconds

Then, running the same tests, but on a test file that outputed each sashelp.class record 10 million times:

Howard's hash solution: NOTE: The data set WORK.LASTS has 6 observations and 6 variables. NOTE: There were 190000000 observations read from the data set WORK.CLASS. NOTE: DATA statement used (Total process time): real time 5:27.82 cpu time 5:27.18

Datanull's proc summary solution: NOTE: There were 190000000 observations read from the data set WORK.CLASS. NOTE: The data set WORK.LASTOBSINAGES has 6 observations and 6 variables. NOTE: PROCEDURE SUMMARY used (Total process time): real time 1:54.59 cpu time 4:47.84

As you can see, the proc summary solution took less cpu time and less real time in both cases but, more importantly, the differences in real time (on my system) were quite dramatic.

Art --------- On Wed, 31 Dec 2008 14:31:12 -0600, ./ ADD NAME=Data _null_, <iebupdte@GMAIL.COM> wrote:

>On 12/31/08, Howard Schreier <hs AT dc-sug DOT org> ><schreier.junk.mail@gmail.com> wrote: >> Consider a hash solution: > >Consider a PROC SUMMARY solution. > >proc summary nway missing data=sashelp.class; > class age; > output out=work.lastObsInAges(drop=_type_ _freq_) > idgroup(last out(_all_)=) / autoname > ; > run;


Back to: Top of message | Previous page | Main SAS-L page