Date: Fri, 7 Nov 2003 16:48:31 -0500
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Help with Retain..
Content-Type: text/plain
Subject: RE: Help with Retain..
Summary: Although the subject is relevant, my real interest
is in the documentation problems for RETAIN.
Respondent: Ian Whitlock
Jack,
You moved me to respond when in part you wrote,
>According to the documentation, the RETAIN statement "Causes a variable
>that is created by an INPUT or assignment statement to retain its value
>from one iteration of the DATA step to the next".
There are several big problems here. The first and perhaps biggest, is the
use of the English word "retain". In common English it means the value
doesn't change, but in SAS it means nothing of the sort. It means the
system will not change the value to missing as part of automatic activity
done in implementing the implied iteration of the DATA step loop.
To compound the problem, the word "retain" in the explanation of the SAS
RETAIN statement is also a bit misleading. Here one has to read very
carefully, to realize that the sentence doesn't say the value will be
retained, but rather it means what ever value the variable had at the very
bottom of one iteration of the step will be the value of that variable at
the very top of the next iteration of the step (with the exception of
certain such as _N_ and _ERROR_ for which the documentation claims the
application of RETAIN is redundant because they are already retained,
although in fact these variables are reassigned at the beginning of every
iteration of the DATA step so they are very far from being retained).
Moreover, call routines can give value to variables which might need a
RETAIN statement. But call routines are neither INPUT nor assignment
statements.
I suggest that most of the beginners problems with understanding the retain
statement begin with the above problems, i.e. SAS self inflicted obscurity.
I suspect another problem is that the reader first reading documentation
about RETAIN may be quite unaware of the implied loop of the DATA step and
how it works. Hence the whole concept seems unnecessary and therefore
mysterious.
Now to be fair, the details section does say:
"Default DATA Step Behavior"
"Without a RETAIN statement, SAS automatically sets variables that are
assigned values by an INPUT or assignment statement to missing before each
iteration of the DATA step."
However, there is no link to a section of the documentation that would
explain "iteration of DATA step". (Linking to "Overview of DATA Step
Processing" might have been a nice touch. In fact, correcting the mistakes
on this page and linking to it from every relevant statement might be the
way to say how important it is for the beginner to read and understand it.)
However, I do not know of a single link in the Reference Dictionary linking
to the essential information provided in the Reference Concepts.
It is interesting that the documentation gives as a tip, "If you specify
_ALL_, _CHAR_, or _NUMERIC_, only the variables that are defined before the
RETAIN statement are affected." This gives the important piece of
information that words like _ALL_ can mean different lists in different
contexts without giving a hint to the importance of the idea.
It may be to the credit of the documentation that it does not mention that
when no variables are given in the RETAIN statement, then **all** variables
are implied. Thus
RETAIN ;
will cause all variables to be "SAS retained". It is perhaps one of
simplest ways to mess up anyone expecting a SAS DATA step to behave the way
a SAS programmer has come to expect them to behave. I guess this is a case
of documentation by omission in hopes that no one will ever take advantage
of this behavior. On the other hand,
RETAIN _ALL_ ;
is documented, and it provides a very close second way to mislead most SAS
programmers.
Finally, I find it sad that the documentation does not give any example of
using RETAIN for the accumulation of a variable. Perhaps this is because
the SUM statement is a better way to handle the problem, but I think it most
likely that the beginner turns to the RETAIN statement precisely because he
wants to solve this problem. I see this as another missing link where both
the example and the link it deserves is missing. (Come to think of it, may
be those links are set to missing at the beginning of every iteration of the
documentation. :)
Now why have I gone in such detail about RETAIN and the state of its
documentation? I believe that the large number of questions on SAS-L about
RETAIN, numeric-character conversion, and transposing data are an indication
that there is something seriously wrong with the handling of these concepts
in the documentation. Moreover, I would generalize that to any frequent
question on SAS-L. So I wanted to take time to back up that belief with a
look at the state of the 8.2 documentation on this subject. After writing
that sentence, I did take a quick look at the version 9 documentation for
RETAIN, but did not note a single improvement of the problems mentioned
above.
IanWhitlock@westat.com