|
Hi All,
During the past week Ian Whitlock, Mike Rhoads, and SAS Tech Support have
been helping me get my head around a recent puzzle I stumbled across while
playing with placing the set statement in an expclicit do-loop.
Imagine you code the following. How many records will be read from data set
A?
data a;
do i=1 to 10;
output;
end;
run;
data b;
do until (i = 5);
set a end=eof;
output;
end;
stop;
run;
According to the NOTE in the SAS log, 6 observations were read from A.
What's that? Yes, 6 observations were read, even though that set statement
only executed 5 times. Further examples showed this has nothing to do with
the explicit loop:
78 data b;
79 set a end=eof;
80 output;
81 if _n_ = 5 then stop;
82 run;
NOTE: There were 6 observations read from the data set WORK.A.
84 data _null_;
85 stop;
86 set a end=eof;
87 run;
NOTE: There were 1 observations read from the data set WORK.A.
89 data _null_;
90 if 0 then set a end = eof;
91 run;
NOTE: DATA STEP stopped due to looping.
NOTE: There were 1 observations read from the data set WORK.A.
So what *is* going on here? Apparently when you use the end= option, each
execution of the set statement does a "look-ahead" reading in the next
record to see if it exists or is an end-of-file marker. Thus in my first
example, on iteration #5, the set statement read not only the fifth record,
but the implied look-ahead also read the sixth record to test for EOF.
But what is going on with the last examples, where the set statement never
executes? Apparently there is a "hidden read" which occurs just before
execution time (or perhaps as the first process in execution time, at the
very top of the datastep) which attempts to read in the first record in
order to set the EOF flag. This explains why below the EOF flag is
correctly set at the top of the loop.
153 data b;
154 put eof = ;
155 stop;
156 set a end=eof;
157 run;
eof=0
NOTE: There were 1 observations read from the data set WORK.A.
NOTE: The data set WORK.B has 0 observations and 1 variables.
159 data c;
160 put eof = ;
161 stop;
162 set b end=eof;
163 run;
eof=1
NOTE: There were 0 observations read from the data set WORK.B.
So here is the remaining question. Even though I now understand why SAS is
including an "extra" record in its count of the number of records read, is
this the information we want to be told in the NOTE? That is, do we want to
be told a literal accounting of the number of records that SAS looked at, or
do we instead want the NOTE to reflect a logical count of the number of
records the set statement read into the PDV for processing?
Thanks again to those that have already helped me in thinking about this,
and I'll leave it to Ian, Mike, and any *birdies* to correct any errors in
my presentation of their explanations.
Kind Regards,
--Quentin
|