LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2009, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 14 Mar 2009 17:01:17 -0400
Reply-To:     Paul Dorfman <sashole@BELLSOUTH.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Paul Dorfman <sashole@BELLSOUTH.NET>
Subject:      Re: SUM statement paradox (bug)?

Tim,

I do not see how the superfluity of explicitly retaining an auto-retained variable makes it excusable for the compiler to fail to initialize a SUM statement variable to 0 when the documentation clearly and unambiguously says that that is what it does without any ifs, ands, or buts.

Of course it is natural (and logical) that any variable identified by the compiler in a data set descriptor is initially set to missing, and if it is not subsequently populated at the execution time, they will remain null, which is easy to prove by, say,

data _null_ ; put _all_ ; stop ; set sashelp.class ; run ; ---------------------------------------------------- Name= Sex= Age=. Height=. Weight=. _ERROR_=0 _N_=1

However, the SUM statement is a compile-time instruction exactly equivalent to

retain VAR 0 ;

(plus a run-time instruction to increment VAR as ordered). Yet when the compiler sees this RETAIN, it does populate VAR with 0 - as it should according to the documentation. Why, then, I ask, this compile-time instruction is obeyed whilst the same exact compile-time instruction contained in the SUM statement is ignored - as it should not be, according to the documentation?

In other words, if you think it natural for the step (with HUH containing a numeric VAR):

data _null_ ; put var = @ ; var + 1 ; set huh ; run ;

to print var=. in _N_=1 iteration, then you should likewise think it natural for the step:

data _null_ ; put var = @ ; retain var 0 ; set huh ; run ;

to print var=. in _N_=1 iteration. Does it? Nope, it prints var=0 instead.

Note that it is the compile-time equivalence of the SUM statement and RETAIN that is emphasized by the only germane documentation piece I can locate, viz., "The variable's value is retained from one iteration to the next, as if it had appeared in a RETAIN statement."

The reason I deem such a behavior a bug is two-fold. First, it directly contradicts the documentation. Second, I posit that the expectation of a the SUM statement variable to be initialized to 0 at compile is so ingrained in the mind of SAS programmers that it is unreasonable for them to expect a different result. I am a fairly seasoned SUM statement (ab)user, and this is the first time in 20+ years I have encountered such a thing.

However, I may be wrong and the only one who expects the SUM statement variable to be initialized to 0 under any circumstances. Y'all who expect a different outcome, would you please raise your hands?

Why do I think the point is of import and hence dwell on it? Certainly not because I derive any pleasure from eureka-ing a bug and rubbing it in to SAS; I just want the software to behave as expected according to the documentation. Thus, I had fully expected that the variable in a SUM statement I had coded would produce a zero if run-time program control never hit it during the step's execution. My logic was akin to:

if 0 then set model ; ... if A then a_cnt + 1 ; else if B then b_cnt + 1 ; else VAR + 1 ;

MODEL contains VAR (and other variables), and its descriptor is read first thing in the step to impose proper sequence and attributes (mainly, formats) on the variables written to a delimited file subsequently submitted to the SQL loader. VAR is one of those variables counting the number of "exception" records not conforming to the conditions A and B. According to my understanding of the SUM statement, I had fully expected it to end up as 0 at the end of the step if no exceptions were found, and that zero - to be written to the delimited file. Yet it produced a missing value, which caused the ensuing production load to go belly up, since VAR is defined in the production DW table as not null.

I could not drop VAR from MODEL (hard-coding VAR's format - or anything else, for that matter - is contrary to the framework I had developed for that ETL), and, simultaneously, I needed VAR to end up 0 if VAR+1 were never executed. After I understood what happened and why (which took a while in the middle of the night), I added:

retain var 0 ; *else non-exec VAR+1 produces null! ;

to the step, which takes care of the problem, but smacks of a downright silly thing to do - talk about redundancy now! That seemed to be a sufficient cause to post a beware note to SAS-L (with the samaritanism of preventing someone from getting up at 3 am thrown in).

Finally, remember that when a SAS novice posts something like

retain var 0 ; ... if condition then var + smth ;

s/he is almost inevitably notified by someone on the list that the RETAIN above is superfluous. Now if we accepted the notion that under certain DATA step circumstances, a SUM statement variable can end up missing and that it is natural and expected, then from the viewpoint of defensive programming we should stipulate that the statement must *always* be paired with a separate explicit RETAIN initializing its variable to 0.

Should we?

Kind regards ------------ Paul Dorfman Jax, FL ------------

On Fri, 13 Mar 2009 16:36:09 -0500, Tim Kynerd <tkynerd@ECD.ORG> wrote:

>Hi Paul, > >To echo the other folks, here's the behavior I'm seeing in SAS 9.1.3: > >When I run your code, I get what you get: var=. var=7 . >When I comment out the SET statement in the second DATA step, I get: var=0 . As I would expect, the second DATA step only executes once because it has no input. > >I knew about the implied RETAIN with the Sum statement, but I didn't know this: > >---- >Tip: The variable is automatically set to 0 before SAS reads the first observation. The variable's value is retained from one iteration to the next, as if it had appeared in a RETAIN statement. >Tip: To initialize a sum variable to a value other than 0, include it in a RETAIN statement with an initial value. >---- > >As to why VAR is initially missing in the first step: > >---- >It is redundant to name any of these items in a RETAIN statement, because their values are automatically retained from one iteration of the DATA step to the next: > > * > > variables that are read with a SET, MERGE, MODIFY or UPDATE statement > * > > a variable whose value is assigned in a sum statement > * > > the automatic variables _N_, _ERROR_, _I_, _CMD_, and _MSG_ > * > > variables that are created by the END= or IN= option in the SET, MERGE, MODIFY, or UPDATE statement or by options that create variables in the FILE and INFILE statements > * > > data elements that are specified in a temporary array > * > > array elements that are initialized in the ARRAY statement > * > > elements of an array that have assigned initial values to any or all of the elements on the ARRAY statement. > >You can, however, use a RETAIN statement to assign an initial value to any of the previous items, with the exception of _N_ and _ERROR_. >---- > >The fact that SAS emphasizes the option of using a RETAIN statement to assign an initial value makes it seem natural to me, if not strictly *logical*, that variables in these cases are initialized to missing, as they apparently are. > >Best, >Tim > > >Tim Kynerd >Computer Programmer/Analyst >ECD/HOPE >4 Old River Place, Suite A >Jackson, MS 39202 >P: (601) 944-9308 >F: (601) 944-0808 >tkynerd@ecd.org > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Paul Dorfman >Sent: Friday, March 13, 2009 1:16 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: SUM statement paradox (bug)? > >Y'all, > >For those inclined to "ponder over many a quaint and curious volume of >forgotten lore", consider: > >data huh ; > retain var 7 ; >run ; > >data _null_ ; > put var = @ ; > var + 1 ; > set huh ; >run ; > >Paradox #1. I expect the second step to print var=0 var=7. It prints var=. >var=7 instead, as if the SUM statement were not present. > >Paradox #2. If VAR is not in HUH, or is dropped from it, or SET is simply >commented out, the step happily prints the expected var=0 var=7. > >It appears that the mere act of the compiler finding VAR in the descriptor >of a data set named in a syntactically valid context changes the implicit >sum-statement compiler action into RETAIN VAR . instead of the expected >RETAIN VAR 0 (or, perhaps more likely, just abstains from setting VAR to >zero and leaves it missing). > >I discovered it a hard way - a production load failed having detected a >missing value for a NOT NULL column - and stared at the step in giddy >incredulity for quite a while before I found that the reason was the SUM >statement behaving not in the way it is documented. > >For nowhere in the SAS documentation this peculiar behavior is mentioned. I >only find the unequivocal assertions "the variable is automatically set to >0 before SAS reads the first observation" and "The sum statement is >equivalent to using the SUM function and the RETAIN statement, as shown >here: retain variable 0; variable=sum(variable,expression);", for some >reason listed under the "Tip" rubrics, as if the fundamental property of >the SUM statement seared in the mind of any semicolon-worthy SAS programmer >is but a useful side effect. > >Since both directly contradict the experimental results capable of biting >an unwary, I hereby declare it a bug. > >Kind regards >------------ >Paul Dorfman >Jax, FL >------------ > >This transmission is intended only for the use of the addressee and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If you are not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately via e-mail at support@ecd.org.


Back to: Top of message | Previous page | Main SAS-L page