Date: Thu, 4 Oct 2001 14:17:59 -0400
Reply-To: Ian Whitlock <WHITLOI1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ian Whitlock <WHITLOI1@WESTAT.COM>
Subject: Re: Order in the data step
Content-Type: text/plain; charset="iso-8859-1"
Jack,
I embeded some comments below on your comments.
IanWhitlock@westat.com
-----Original Message-----
From: Jack Hamilton [mailto:JackHamilton@FIRSTHEALTH.COM]
Sent: Thursday, October 04, 2001 1:32 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Order in the data step
"DeShon, Joe" <jdesho01@SPRINTSPECTRUM.COM> wrote:
>As I understand it, there are two basic types of statements that can go in
>the data step: executable statements and compiler directives.
Also macro statements and system options. They might or might not affect
the compilation and execution of the data step, but they can be placed
there.
IW: Macro statements may physically appear after a DATA statement and before
the RUN statement ending the step, but they are not really part of the
compiling or execution of the step. They are part of the code writing of
that step.
IW: If I had my way, system options and all global statements would only be
allowed between steps and not in them. I consider it highly misleading to
place statements in steps that affect the rest of the program and just
happen to look as if they were meant for one step.
>Executable statements (such as set, output, if, etc.) are those which are
>performed while the data step runs through its implied
>"do-until-end-of-file" routine.
If the data step happens to do that. Some data steps execute only once, by
design. There's no necessary correlation between the number of observations
read and the number of executions of the data step - you can read a billion
observations in one execution of the data step. So I don't think your
definition of executable statements is a good one.
IW: On the other hand, there are definitely statements which do not execute
during the execution of a DATA step. One can find them easily by using the
DEBUG option. The debugger skips over all truly non-executable statements
such as ATTRIB, RETAIN, ARRAY, etc. However some statements fall into a
gray area where they have compile time side affects (other than just
compiling for execution) while also serving an executable role. SET and
INFILE would be the most notorious. It is, perhaps, not coincidental that
these statements set up buffers for executable use.
>In that case, the order the statements
>appear in the data step is obviously significant.
[...]
>Can we start a discussion in SAS-L about the placement of compiler
>directives in a data step? At what times is the placement of such
>statements significant? For example, I know that variables can be
reordered
>with an attrib statement, but only if the attrib statement is the very
first
>statement in the data step.
That's not exactly true. Variables are, loosely speaking, defined in the
order in which they are referenced. ATTRIB provides a way to reference
variables without side-effects. It's not the ATTRIB statement itself which
sets the order, it's the fact that ATTRIB contains a variable reference.
But you can have statements that do not contain a variable reference, and
those statements can go at the top of the data step with no effect on the
variable order.
IW: Well, the following might prove the comment about ATTRIB.
data _null_ ;
attrib ;
x = 1 ;
y = "abc" ;
put _all_ ;
run ;
IW: I would like to see you name a DATA step statement that could go at the
top without affect something in the DATA step, which might cause one to not
want it at the top. Maybe you have in mind a comment statement? But that
hardly seems fair. Ok, DROP, KEEP, and RENAME. Are there any others?
[...]
--
JackHamilton@FirstHealth.com
Development Manager, Technical Group
METRICS Department, First Health
West Sacramento, California USA