LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2004, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 24 Jun 2004 15:47:41 -0500
Reply-To:     Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Subject:      Re: DoW Loop "Duh" experience - data set and infile N+1loops
Comments: To: pchoate@DDS.CA.GOV
Content-Type: text/plain; charset=us-ascii

I find that behavior useful when reporting processing data step counts - I can put subsetting IFs in my code, and know that the final PUT statement will always be executed (not tested):

data _null_

if end then put recsin= blahblah= purpleblahblah=;

set somestuff end=end;

recsin + 1;

if blahblah > 10;

highvalue + 1;

if color = 'purple';

purpleblahblah + 1;

run;

-- JackHamilton@FirstHealth.com Manager, Technical Development Metrics Department, First Health West Sacramento, California USA

>>> "Choate, Paul@DDS" <pchoate@DDS.CA.GOV> 06/24/2004 1:23 PM >>> Greeting SAS-Lers-

Apologies if this is old news, but while pondering Paul Dorfman & Ian's "Do Loop of Whitlock" I had an "a-hah" experience the other day (or maybe it was an "a-duh" experience) - When reading an N record data set, SAS loops through the data-step N+1 times.

The set or infile end flag is set when the last (Nth) observation is reached, but the data step continues into a final N+1 pass until where it arrives back at the set statement. That final half-pass is usually "invisible" because of the midstream termination of the data step before the default output. This allows final controls of the data step to be placed at the top of the step rather than at the bottom, and after the final implied output.

data one; do i=1 to 5; output; end;

data _null_; put _all_ ': '@; set one end=eof; put _all_ ; run;

eof=0 i=. _ERROR_=0 _N_=1 : eof=0 i=1 _ERROR_=0 _N_=1 eof=0 i=1 _ERROR_=0 _N_=2 : eof=0 i=2 _ERROR_=0 _N_=2 eof=0 i=2 _ERROR_=0 _N_=3 : eof=0 i=3 _ERROR_=0 _N_=3 eof=0 i=3 _ERROR_=0 _N_=4 : eof=0 i=4 _ERROR_=0 _N_=4 eof=0 i=4 _ERROR_=0 _N_=5 : eof=1 i=5 _ERROR_=0 _N_=5 eof=1 i=5 _ERROR_=0 _N_=6 :

This is the same with an input statement:

data _null_; do i=1 to 5; file 'one'; put i; end;

data _null_; put _all_ ': '@; infile 'one' end=eof; input I $ ; put _all_ ; run;

eof=0 I= _ERROR_=0 _N_=1 : eof=0 I=1 _ERROR_=0 _N_=1 eof=0 I= _ERROR_=0 _N_=2 : eof=0 I=2 _ERROR_=0 _N_=2 eof=0 I= _ERROR_=0 _N_=3 : eof=0 I=3 _ERROR_=0 _N_=3 eof=0 I= _ERROR_=0 _N_=4 : eof=0 I=4 _ERROR_=0 _N_=4 eof=0 I= _ERROR_=0 _N_=5 : eof=1 I=5 _ERROR_=0 _N_=5 eof=1 I= _ERROR_=0 _N_=6 :

Looking back at the "Flow of Action in the DATA Step" flowcharts in the v5, v6, v8, and v9 manual/docs I see it's been documented at least since 1985. Thanks once to more Paul & Ian!

Paul Choate DDS Data Extraction (916) 654-2160

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Paul M. Dorfman Sent: Thursday, June 24, 2004 11:51 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Help with By group processing, thanks

David,

Another Dorfmanism would be to convert this self-interleave into a double-DoW. I think I have already posted it in reply to Toby, but I would like to stress another time that although the double-DoW:

Data one ; do until ( last.inst ) ; set ttotal ; by inst ; totfund = sum (totfund, fund, 0) ; end ; do until ( last.inst ) ; set ttotal ; by inst ; output ; end ; Run ;

looks less parsimonious than the self-interleave:

data one; set ttotal(in = summing) ttotal(in = merging); by inst; if summing then do; if first.inst then total_fund = 0; total_fund = sum(total_fund, fund, 0); end; if merging then output; run;

the double-DoW is structurally superior, as it does not rely on conditional logic *inside* a loop to make a decision. Instead of piling up all the observations from two different streams into a single by-pile and relying on IN= to split them, the double-DoW simply goes through each by-group twice: first coming from one input stream, then - from the other. And because the boundaries of the double-Dow coincide with those of the Data step itself, there is no need to initialize the cumulative variable explicitly using first.inst.

As an exercise for curiousity, one may want to try foreseeing, without testing, what will happen to the output if the second BY statement in the double-DoW code is omitted or commented out, then run a test to see if the guess was right. Now here us a different variation on the same double-DoW theme:

data two ; do count = 1 by 1 until ( last.inst ) ; set ttotal ; by inst ; totfund = sum (totfund, fund, 0) ; end ; do _n_ = 1 to count ; set ttotal ; output ; end ; run ;

This can be more efficient than the first variant if, in addition to the sum, count is also needed. Note that in this case, the second BY statement is omitted, yet the output is as expected. I will let curious SAS-Lers ponder how this works.

Kind regards, ---------------- Paul M. Dorfman Jacksonville, FL ----------------

> -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On > Behalf Of David L. Cassell > Sent: Wednesday, June 23, 2004 8:52 PM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Help with By group processing, thanks > > "Dunn, Toby" <tdunn@TEA.STATE.TX.US> sagely replied: > > As a data step solution, consider this option: > > > > data one; > > set ttotal (in = a) > > ttotal (in = b); > > by inst; > > > > if (a = 1) then do; > > if first.inst then total_fund = 0; > > if (fund ne .) then total_fund + fund; end; > > > > if (b = 1) then do; > > output; > > end; > > run; > > A good solution. I have privately referred to this technique > as the "Schreier Self-interleave", because I learned it from > reading some of Howard's posts some time ago. IIRC, Howard > once said that he learned it from Ian. But we can't name > *everything* after Ian. :-) I think it is time to take the > approach of the mathematicians; they couldn't name everything > after Gauss, so eventually they had to name things after the > *next* person to work with them. > > There is one important point I would like to make with this example. > In terms of documentation and maintenance (by others), I find > it is really helpful to use better names with my IN= options. > So I might re- label this data step like so: > > data one; > set ttotal(in = summing) > ttotal(in = merging); > by inst; > > if summing then do; > if first.inst then total_fund = 0; > total_fund = sum(total_fund, fund, 0); > end; > > if merging then output; > > run; > > > And, of course, one can always produce a Dorfmanism to turn > the above do-group into a single computation without the need > for grouping. But that kind of shoots down the whole 'make > it more readable and maintainable' > point. :-) > > Okay, okay, here's what I meant [but didn't bother to test]: > > data one; > set ttotal(in = summing) > ttotal(in = merging); > by inst; > > if summing then total_fund = sum(total_fund*(^first.inst), fund, 0); > if merging then output; > > run; > > > HTH, > David > -- > David Cassell, CSC > Cassell.David@epa.gov > Senior computing specialist > mathematical statistician >


Back to: Top of message | Previous page | Main SAS-L page