LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 13 May 2010 11:08:40 -0400
Reply-To:     Søren Lassen <s.lassen@POST.TELE.DK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Søren Lassen <s.lassen@POST.TELE.DK>
Subject:      Re: Alternative for "goto" in do loop
Comments: To: Jim Anderson <James.Anderson@UCSF.EDU>
Content-Type: text/plain; charset=ISO-8859-1

Jim, I do not think I can show you and example where a goto is strictly necessary.

But there are times when a goto can improve efficiency - not always the most important item anymore, in these days of GHz multicore machines costing next to nothing. But once in a rare while, that does matter.

And there are times when a well-placed goto can make the program easier to read, debug an maintain, especially when dealing with nested loops.

Consider this:

do until(condition 1 or condition 4); initialize loop 1; if not condition 1 then do; do until(condition 2 or condition 4); initialize loop 2; if not condition 2 then do; do until(condition 3 or condition 4); initialize loop 3; if not (condition 3 or condition 4) then do the loop 3 stuff; end; if not condition 4 then do the after loop 3 stuff; end; end; if not condition 4 then do the after loop 2 stuff; end; end;

Against this:

do until(0); initialize loop 1; if condition 1 then leave; do until(0); initialize loop 2; if condition 2 then leave; do until(0); initialize loop 3; if condition 3 then leave; if condition 4 then goto alldone; do the loop 3 stuff; end; do the loop 2 stuff; end; do the loop 1 stuff; end; alldone:

The latter version is slightly faster, because all the conditions are only checked once. But that is not the point. The point is that it is easier to maintain, because the conditions are not repeated. Personally, I also find such code easier to read and write, because there are not so many levels of DOs and ENDs to keep track of.

The second example show both the use of LEAVE to simplify code, and the use of a GOTO where a LEAVE is not enough, as we need to get out of several loops in one jump.

GOTO is very rarely "really, really needed", but at times it can provide faster and better code.

Regards, Søren

On Wed, 12 May 2010 11:38:42 -0700, Anderson, James <James.Anderson@UCSF.EDU> wrote:

>Michael, > >Thanks for the wonderful historical perspective. My admiration for Knuth is even higher than my admiration for Dijkstra, but I think that this is an argument where history shows Dijkstra won. Please, can you give a SAS example where you "really, really needed" goto? > >Jim Anderson >UCSF >-----Original Message----- >From: Michael Raithel [mailto:michaelraithel@WESTAT.COM] >Sent: Wednesday, May 12, 2010 8:56 AM >Subject: Re: Alternative for "goto" in do loop > >Dear SAS-L-ers, > >Søren posted the very well-written and intellectually pleasing post that you can find after the sig line. > >Søren, it is interesting to see the old GOTO discussion surface once again on the 'L! Past discussions have seen various personalities take extreme positions "never use GOTO's" or "GOTO's are okay". Lest we fall back into GOTO hell, I thought it would be interesting for some 'L-ers to learn something about the root of this "controversy". > >Back when I was a computer science major, I was neatly plopped in the No Man's Land between computer science giants Edsger Dijkstra (http:// en.wikipedia.org/wiki/Edsger_Dijkstra ) who advocated that GOTO statements were harmful and Donald Knuth (http://en.wikipedia.org/wiki/Donald_Knuth ), who argued that they could be used in programs. There were "warring camps" among the computer science students in my department--shows we needed some hobbies, badly. During my formative years, structured programming was coming into vogue, and there was a push to either not use GOTO statements or keep them at a bare minimum. Consequently, I learned to program without them, except when I really, really needed them. So, which camp am I in? Well, I am in the camp of getting the programming task done, the data cleaned and analyzed, and the result sets out to the client with or without GOTO statements--whichever works best in the situation. > >For more information, GOTO: http://en.wikipedia.org/wiki/Goto . > >All, best of luck in all your SAS endeavors! > >I hope that this suggestion proves helpful now, and in the future! > >Of course, all of these opinions and insights are my own, and do not reflect those of my organization or my associates. All SAS code and/or methodologies specified in this posting are for illustrative purposes only and no warranty is stated or implied as to their accuracy or applicability. People deciding to use information in this posting do so at their own risk. > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >Michael A. Raithel >"The man who wrote the book on performance" >E-mail: MichaelRaithel@westat.com > >Author: Tuning SAS Applications in the MVS Environment > >Author: Tuning SAS Applications in the OS/390 and z/OS Environments, Second Edition >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=58172 > >Author: The Complete Guide to SAS Indexes >http://www.sas.com/apps/pubscat/bookdetails.jsp?catid=1&pc=60409 > >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >...by whatever means necessary. - Malcolm X >+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > >> -----Original Message----- >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of >> Søren Lassen >> Sent: Wednesday, May 12, 2010 3:19 AM >> To: SAS-L@LISTSERV.UGA.EDU >> Subject: Re: Alternative for "goto" in do loop >> >> Sid, >> Impressive, I didn't think anyone wrote programs like that anymore. >> Anyway, there is nothing inherently "inefficient" about gotos. >> They are among the most basic CPU instructions, and execute very fast. >> When your datastep program has been compiled, the machine code >> generated >> will contain a lot of conditional jump statements, which are like >> ifs and gotos combined. >> >> The reason that gotos are discouraged is that programs written with a >> lot of gotos are hard to read and maintain. >> >> My personal experience is that you should do your best to use the >> basic loop structures to control your program flow, rather than rely on >> gotos. At times, however, programs do become more efficient using >> some goto-like constructs - especially when you need to escape from >> a loop or start over with the next iteration of a loop. >> >> SAS provides a couple of constructs for doing exactly that: LEAVE and >> CONTINUE. LEAVE escapes the current loop, CONTINUE s jumps to the next >> iteration of the loop. Using these two statements, your second datastep >> can be written as: >> >> data range2; >> set range1; >> do y = start_y to end_y by 1; >> if y < 1980 then continue; >> if y > 2009 then leave; >> do m = 1 to 12 by 1; >> if (start_m > m and y = start_y) then continue; >> if (y = end_y and end_m < m) then leave; >> output; >> end; >> end; >> run; >> >> I cheated a little bit there, when replacing the second "goto skip2" >> with >> a "leave" statement - it is not exactly equivalent, as the leave would >> only jump to skip3; but when analyzing the program it is easily seen >> that >> the condition for escaping is reached when the outer loop has run to >> its >> conclusion. In real life, you may get into situations where you have to >> use gotos for exactly this reason: you want to escape from more than >> one level of looping. This is the only time I would consider using >> gotos. >> >> As mentioned above, I much prefer to use the loop conditions when >> possible. Consider this version of your second datastep: >> >> data range2; >> set range1; >> do y = max(start_y,1980) to min(end_y,2009) by 1; >> do m = ifn(y=start_y,start_m,1) to ifn(y=end_y,end_m,12) by 1; >> output; >> end; >> end; >> run; >> >> Not only is this easier to read, it also executes faster, as the poor >> machine is not burdened with checking two conditions in every iteration >> of each of the loops. >> >> When it comes to looping in SAS, it is important to remember that the >> datastep itself is a loop. There are some special constructs for >> controlling that: >> STOP - exits the datastep loop, does not output an observation >> corresponds to LEAVE >> DELETE - exits the current observation, does not output an observation >> corresponds to CONTINUE >> >> A variant of DELETE is the subsetting IF: >> IF <condition>; >> means the same as >> IF NOT (<condition>) then delete; >> >> In a lot of datastep code, you will also see RETURN - a highly >> ambiguous >> statement: >> If a label has been called with a LINK statement, it jumps >> to the statement following the LINK statement. LINKs may also >> be nested. >> If the RETURN statement has not been reached by a LINK, the >> behavior of the program depends on whether there is an output >> statement present somewhere in the program: If there is, >> RETURN behaves exactly like DELETE, if there is not, it first >> outputs an observation and then continues to the next iteration >> of the datastep. >> >> Oops, I mentioned the LINK statement; may as well write a short >> treatise on that: >> >> LINK is another basic machine language instruction, usually called CALL >> when >> writing assembler (machine language where hexadecimal instructions are >> replaced with mnemonics like CALL and JUMP). LINK pushes a return >> address >> onto the stack and then jumps - in SAS, if there is no RETURN >> statement, >> SAS will place an implicit RETURN at the end; in assembler, using CALL >> without RETURN will lead to stack overflows and generally unpredictable >> behavior. >> >> LINK/CALL is used a lot by compilers - procedure and function calls are >> generally done by LINKing - normally, the parameters (or their >> addresses, >> when calling by reference) are pushed to the stack along with the >> return address. >> >> Linking in SAS can sometimes be necessary, but it can also create >> a mess of code that is even worse than what you can accomplish with >> just GOTOs. Consider his piece of pseudocode: >> >> data messy; >> set somedata; >> link label1; >> <code segment 1> >> link label2: >> <code segment2 > >> label2: >> <code segment 3> >> label1: >> <code segment 4> >> if <condition> then return; >> <code segment 5> >> run; >> >> What does the RETURN do? Well that depends... >> It can mean "jump back to code segment 1", or "jump back to code >> segment >> 2", or "output and read next observation", or "do not output but read >> next >> observation", depending on the context. >> >> I generally avoid the LINK statement in datasteps. If you want to >> avoid redundant code segments, use macros. There are times when >> LINKS are the best way of accomplishing something, however. >> For instance in conjunction with SET with KEY= : >> >> data a; >> set b; >> if x = lag(x) then do; >> c_key=sum(x,1); /* use any value different from the current X */ >> link setkey; >> end; >> c_key=x; >> do until(0); >> link setkey; >> if _iorc_ then leave; >> output; >> end; >> _error_=0; >> delete; >> setkey: >> set c key=c_key; >> return; >> run; >> >> The basic idea here is to use keyed access to create the Cartesian >> product of the tables B and C, joined by X=C_KEY. If you get two >> x-values that are the same, you have to "reset" the "set c key=c_key" >> statement, so that you can read all the values once more. It is >> necessary to address the same physical statement twice, because it >> has its own internal state. >> >> LINK and RETURN are much more essential when programming in SAS/SCL, >> but that is another story altogether. >> >> Regards, >> Søren >> >> On Mon, 10 May 2010 23:12:33 -0400, Sid N <nsid31@GMAIL.COM> wrote: >> >> >Hi, >> > >> >I am trying to understand the "goto" sequences in the second data step >> >below: >> > >> >data range1; >> >informat start end date9.; >> >format start end date9.; >> >input start end; >> >start_m = month(start); >> >end_m = month(end); >> >start_y = year(start); >> >end_y = year(end); >> >cards; >> >01JAN1975 01JAN1977 >> >01AUG1979 01JUL1980 >> >01MAR2008 01JAN2010 >> >01DEC2000 01SEP2002 >> >01FEB2010 01NOV2011 >> >; >> >run; >> > >> >data range2; >> >set range1; >> >do y = start_y to end_y by 1; >> >if y < 1980 then goto skip3; >> >if y > 2009 then goto skip2; >> > do m = 1 to 12 by 1; >> > if (start_m > m and y = start_y) then goto skip1; >> > if (y = end_y and end_m < m) then goto skip2; >> > output; >> > skip1:; >> > end; >> >skip3:; >> >end; >> >skip2:; >> >run; >> > >> > >> >Can someone please help me understand the "goto" sequences in the >> above "do >> >loop"? I have heard that using "goto" sequences may not be an >> efficient way >> >to code in SAS. Please suggest an alternative to output what the above >> code >> >is accomplishing using different programming logic. >> > >> >Thank you for your time. >> > >> >Sid


Back to: Top of message | Previous page | Main SAS-L page