LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 7 Nov 2003 16:48:31 -0500
Reply-To:     Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject:      Re: Help with Retain..
Comments: To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>
Comments: cc: Nagakumar Sridhar <nsridhar@ATHEROGENICS.COM>
Content-Type: text/plain

Subject: RE: Help with Retain.. Summary: Although the subject is relevant, my real interest is in the documentation problems for RETAIN. Respondent: Ian Whitlock

Jack,

You moved me to respond when in part you wrote,

>According to the documentation, the RETAIN statement "Causes a variable >that is created by an INPUT or assignment statement to retain its value >from one iteration of the DATA step to the next".

There are several big problems here. The first and perhaps biggest, is the use of the English word "retain". In common English it means the value doesn't change, but in SAS it means nothing of the sort. It means the system will not change the value to missing as part of automatic activity done in implementing the implied iteration of the DATA step loop.

To compound the problem, the word "retain" in the explanation of the SAS RETAIN statement is also a bit misleading. Here one has to read very carefully, to realize that the sentence doesn't say the value will be retained, but rather it means what ever value the variable had at the very bottom of one iteration of the step will be the value of that variable at the very top of the next iteration of the step (with the exception of certain such as _N_ and _ERROR_ for which the documentation claims the application of RETAIN is redundant because they are already retained, although in fact these variables are reassigned at the beginning of every iteration of the DATA step so they are very far from being retained).

Moreover, call routines can give value to variables which might need a RETAIN statement. But call routines are neither INPUT nor assignment statements.

I suggest that most of the beginners problems with understanding the retain statement begin with the above problems, i.e. SAS self inflicted obscurity.

I suspect another problem is that the reader first reading documentation about RETAIN may be quite unaware of the implied loop of the DATA step and how it works. Hence the whole concept seems unnecessary and therefore mysterious.

Now to be fair, the details section does say:

"Default DATA Step Behavior"

"Without a RETAIN statement, SAS automatically sets variables that are assigned values by an INPUT or assignment statement to missing before each iteration of the DATA step."

However, there is no link to a section of the documentation that would explain "iteration of DATA step". (Linking to "Overview of DATA Step Processing" might have been a nice touch. In fact, correcting the mistakes on this page and linking to it from every relevant statement might be the way to say how important it is for the beginner to read and understand it.) However, I do not know of a single link in the Reference Dictionary linking to the essential information provided in the Reference Concepts.

It is interesting that the documentation gives as a tip, "If you specify _ALL_, _CHAR_, or _NUMERIC_, only the variables that are defined before the RETAIN statement are affected." This gives the important piece of information that words like _ALL_ can mean different lists in different contexts without giving a hint to the importance of the idea.

It may be to the credit of the documentation that it does not mention that when no variables are given in the RETAIN statement, then **all** variables are implied. Thus

RETAIN ;

will cause all variables to be "SAS retained". It is perhaps one of simplest ways to mess up anyone expecting a SAS DATA step to behave the way a SAS programmer has come to expect them to behave. I guess this is a case of documentation by omission in hopes that no one will ever take advantage of this behavior. On the other hand,

RETAIN _ALL_ ;

is documented, and it provides a very close second way to mislead most SAS programmers.

Finally, I find it sad that the documentation does not give any example of using RETAIN for the accumulation of a variable. Perhaps this is because the SUM statement is a better way to handle the problem, but I think it most likely that the beginner turns to the RETAIN statement precisely because he wants to solve this problem. I see this as another missing link where both the example and the link it deserves is missing. (Come to think of it, may be those links are set to missing at the beginning of every iteration of the documentation. :)

Now why have I gone in such detail about RETAIN and the state of its documentation? I believe that the large number of questions on SAS-L about RETAIN, numeric-character conversion, and transposing data are an indication that there is something seriously wrong with the handling of these concepts in the documentation. Moreover, I would generalize that to any frequent question on SAS-L. So I wanted to take time to back up that belief with a look at the state of the 8.2 documentation on this subject. After writing that sentence, I did take a quick look at the version 9 documentation for RETAIN, but did not note a single improvement of the problems mentioned above.

IanWhitlock@westat.com


Back to: Top of message | Previous page | Main SAS-L page