Date: Wed, 7 Jun 2000 04:31:36 GMT
Reply-To: "Paul M. Dorfman" <sashole@MEDIAONE.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Paul M. Dorfman" <sashole@MEDIAONE.NET>
Organization: KInPh
Subject: Re: De-Duplication
Content-Type: text/plain; charset=us-ascii
Puddin' Man wrote:
> On Tue, 06 June 2000, Paul Dorfman wrote:
>
> >
> > >From: Puddin' Man <pudding_man@ALTAVISTA.COM>
> > >These are questions that I also wondered about.
> > >
> > >I have two more. Note the usage of _n_ in
> > >the assignment statement in the second data
> > >step. Can anyone imagine circumstances in
> > >which such usage might:
> > >
> > > 1.) Confuse a po' programmer ?
> >
> > Po' Puddin',
> >
> > Probably so, depending on the degree of po'ness. But then, did you not use
> > the word 'programmer'?
>
> Indeed I did. Didn't specify which, 'tho.
>
> D'ya think there might be more than one on the planet? :-)
>
> > > 2.) Confuse po' SAS ?
> >
> > Nope. Once _n_ is used explicitly, it is not different from any other
> > variable, with the exception of being automatically dropped which I find
> > quite handy. The same pertains to _iorc_. Feel free to use it if is idle.
> > For apparent reasons, I am more reserved with _error_. Once again,
> > everything comes to the degree of being po'.
>
> DROP doesn't execute. I suspect it just sets flags for
> DROPped variables at compile time. I'm not aware of
> any efficiency issues with DROP.
Puddin',
Right, there is none, but I have never claimed that DROP _executes_ slowly or
that I can reduce the run-time by saving a DROP statement. The economies of
saving an extra DROP is not coding it at all and thus eradicating potentially
pesky errors caused by _forgetting_ to drop a variable that is not dropped
automatically (see below).
> The following code uses _n_ in a DO loop as
> an arbitrary index variable. And it conditionally
> resets the value of the variable in an IF statement
> (something that many have been warned against doing).
> I've never seen anything like this in years of
> reading SAS literature.
I believe you, but does it mean it is wrong, even if many (I do not quite recall
anyone) have been warning against it? I know nobody having a problem
manipulating _ERROR_ and resetting its value conditionally in places like
set driver;
set indexed key=key;
if _iorc_ = 0 then output;
else _error_ = 0;
so why are you so concerned about po' _N_ ?
> Paul, you must be a
> "revolutionary", eh?
What is the difference? Whatever seems to be 'revolutionary', is for the most
part, a well forgotten evolutionary...
> >data nodup;
> > array r (0:&hs) _temporary_;
> > set rnd;
> > do _n_=mod(rnd,&hs) by -1 until (r(_n_) = .);
> > if _n_ < 0 then _n_ = &hs;
> > if r(_n_) = rnd then delete;
> > end;
> > r(_n_) = rnd;
>
> You are free to claim that such programming is "Good Usage".
I do. There is no danger in using _N_ any way I want, such usage cannot corrupt
or overwrite anything. The reason lies on the surface: Contrary to what many
have been thinking, _N_ is _not_ incremented every time the observation loop
iterates. Instead, some another gutteral variable (that we are not even aware
of, and that we have no control over) is incremented every time control is
passed to the top of the step, and it is _assigned_ to _N_ first thing at the
top. Here is a proof:
27 data a; do v=1 to 4; output; end; run;
29 data _null_;
30 put 'top: ' _n_=;
31 set a;
32 do _n_=_n_ by -1 until(_n_ < -5); end;
33 if v = 3 then delete;
34 put 'bottom :' _n_=;
35 run;
top: _N_=1
bottom :_N_=-6
top: _N_=2
bottom :_N_=-6
top: _N_=3
top: _N_=4
bottom :_N_=-6
top: _N_=5
Therefore, _N_ does not play any role other that a value we can interrogate at
the top of the step and get the number of times control has been passed up
north, while the real counter is hidden. Since _N_ controls nothing, I can use
it to my liking. The latter is such that I use it as an index variable and
manipulate it any way I desire, because it is absolutely safe. I cannot forget
to drop it, because it is dropped automatically. Almost any SAS programmer can
recall a story of a darn program where a DATA step, where a loop index I was
used, was doing something funny with no apparent errors, just because in the
preceding step, the index I was also used but mistakenly kept. If you use _N_ as
an index, it is impossible. Hence, using _N_ the way you think is
'revolutionary' has only advantages and no disadvantages. For me, it is a reason
compelling enough to make a good use of it.
> Some will not see it in that light.
Absolutely! SAS programming is a form of art, and different folks hold different
views. This is absolutely normal, and the way it should be.
Kind regards,
=====================
Paul M. Dorfman
Jacksonville, Fl
=====================
> Puddin'
>
> *****************************************************
> *** Puddin' Man *** pudding_man@altavista.com ***
> *****************************************************;
>
> _______________________________________________________________________
>
> Why pay when you don't have to? Get AltaVista Free Internet Access now!
> http://jump.altavista.com/freeaccess4.go
>
> _______________________________________________________________________
>
> ------- End of forwarded message -------
>
> *****************************************************
> *** Puddin' Man *** pudding_man@altavista.com ***
> *****************************************************;
>
> _______________________________________________________________________
>
> Why pay when you don't have to? Get AltaVista Free Internet Access now!
> http://jump.altavista.com/freeaccess4.go
>
> _______________________________________________________________________