Date: Mon, 25 Nov 2002 17:13:59 -0500
Reply-To: Paul McDonald <pdm@SPIKEWARE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul McDonald <pdm@SPIKEWARE.COM>
Subject: Re: Set statement bug?
>As I said before, one issue with the request for a specific number of
>different examples turns on the interpretation of different. I would
>define different to include the same code when used with the intent of
>solving different problems, or different code when used with the
>intent of solving the same problem. If one disagrees then one might
>collapse some of these examples. In all of the examples below there
>is an unmentioned main SAS data set to be processed with a main SET
>statement. I have arranged the examples so the conditional SET
>statements precede those in loops on the basis that condition is more
>primitive than looping. No priority should be inferred from the
>order and many have been previously mentioned.
>1) A conditional SET statement to a file other than the main dataset
> to blank all variables.
> if _iorc_ > 0 then
> do ;
> pt = 1 ;
> set blank point = pt ;
> end ;
I think what you are doing here is that on codition _IORC_ > 0 ,then the
first record for data set WORK.BLANK is added. Further, I "think" that you
intend WORK.BLANK to have the same attributes of the input data set, but
all missing values. This would then give missing values.
Cool idea, I'll use it tomorrow!
>2) A conditional SET statement to stabilize the PDV so that the "--"
> notation can be reliably used when reading a SAS dataset where the
> order of variables can differ from one execution to the next.
> if 0 then set standard ;
I've never seen a practical application where depending upon the order of
the variables in the data set is a good idea. It has always been standard
where I go (probably because I write the standard and HEY that's what I
like) to never depend on the order.
Can you provide an example where depending on the order of the variables in
the data set is helpful?
>3) A conditional SET statement to predetermine variable attributes.
> if 0 then set standard ;
> If you wish to claim this is exactly like the preceding example, I
> agree that it is debatable, however my purpose for using this code
> and the conditions under which I would apply it are quite different
> from that given above, so I see them as different.
Yeah, I liked this one too... but I probably won't use it much. Still,
pracitacl and a good idea.
>4) A conditional SET statement to determine NOBS. I know that this is
> repetition of your idea, but that does not make any less valid.
Again, you could do it, but the macro is faster.
>5) A conditional SET statement to read one record of totals in
> calculating percentages. This was given as exception by Paul
> McDonald [pdm@SPIKEWARE.COM] when not allowing multiple SET
> statements, but it certainly is a condition that deserves mention
> in any more or less complete list of examples.
Yes it does! (see, I got one right!)
>6) SET in an initial loop to fill a temporary array before main
> processing. I agree that you gave such an example before.
I just spent a month re-writting code that did that, and I knocked off 60%
of the run time on a 33-hour job. Please provide a good example of this
one, because I've been looking at a bad one for quite some time!
>7) SET with the POINT or KEY option to look up data in a secondary
> data source. One might argue that this serves the same purpose as
> the preceding suggestion, but I see it as a different technique
> with a different set of advantages and disadvantages. (Does the use
> of KEY constitute a different example?)
Hey, I'll call it a different example! Can you just use UPDATE instead?
>8) SET in a loop with the POINT option to create an efficient
> Cartesian product. I agree that SQL is most appropriate for many
> Cartesian products, but would disagree with replacing "most" with
> "all". See an early Observations article for reasons. In the
> subsequent year I wrote a Coder's Corner paper for either SESUG or
> NESUG on improvements to the code suggested by the authors of the
> Observations article.
Got a url for the paper? would like to read it!
>9) SET in multiple loops to revisit data needed at different times.
> For example, consider generating a program with PUT statements. You
> might want the variable once for a rename clause and then once
> again for an assignment statement. I agree that depending on the
> precise conditions, it may be better to store this information in
> an array, but that does not invalidate the technique. I also agree
> that the introduction of the ATTRIBUTE statement, macro, and SQL
> has lessened the need for this type of example. But again I do not
> see that as invalidating the example.
I'll give it to you, but I probably wouldn't do it
>10) SET in a loop to determine some property of the BY-group before
> processing the BY-group in another loop. There have been a number
> of SAS-L postings using this technique in recent years with
> Quentin McMullen [Quentin_McMullen@BROWN.EDU] giving the most
> recent one.
Here's an example of one of those "confusing" applications.
>11) SET with POINT or KEY option to perform an unsorted merge. Now
> Paul McDonald used that as a reason to avoid the technique, but I
> would not, and suggest that the request for this ability is
> reasonably common on SAS-L.
>At one point in this thread an issue was raised when Paul McDonald
>wrote in response to John Whittington [John.W@MEDISCIENCE.CO.UK]:
>> I believe there is another reason not to
>> get too complex with SAS code - and that
>> is if the person writing the code isn't
>> confident enough that the end user or
>> supporter of the production code will
>> understand the SAS code.
>> In summary, I don't write code based on
>> whether or not I will understand it, but
>> whether or not someone else will
>> understand it.
>An interesting question to raise is when and for whom is this a good
>strategy. I can understand where the manager is coming from and have
>sympathy for the manager, who asks that some code be documented or
>rewritten so that others within his/her group can maintain the code
>without effort or learning. But as someone interested in education
>and raising the common level of programming I would probably disagree
>at times. However, for the individual programmer, I think this
>strategy is a strategy to failure. I think anyone who truly believes
>that he/she should write nothing any better than the worst of those
>who might read his/her code will soon find that everyone has surpassed
>him/her in the ability to program. I also think it wrong for larger
>groups and in general I suspect that any organization that takes such
>a strategy too seriously will also soon find itself replaced.
And I would follow-up with that the programmer works for the group and the
company, and should therefore be writing code for the company and not for
themselves. This does NOT mean to keep all levels of programming
elementary, but to instead bring up the programmers around the team and
company to further support the code at a later time.
I've seen thousands of lines of rancid code that people have been running
for years and no one knows what it does, except it creates the "PTS Report"
that "Marketing needs" (even though marketing only looks at the first page
and throws the rest away). Yet no one can change anything because they
don't understand it. So, you wanna change the program? Gotta bring in a
Big Gun SAS Programmer...
>I do not even think Paul McDonald fully believes what he says.
Oh, I fully believe it... I just reserve the right to change my mind.
>As evidence consider Paul Dorfman's first example asking the compiler to
>obtain the information that he wanted.
> data _null_ ;
> max = na1 max na2 max na3 ;
> put max = ;
> stop ;
> set a nobs = na1 ;
> set b nobs = na2 ;
> set c nobs = na3 ;
> run ;
>Paul McDonald responded with some fine code.
> 30 %macro obscnt (data) /des='returns number obs from a SAS
> dataset' ;
> 31 %local data data_id rc ;
> 33 %let data_id = %sysfunc(open(&data)) ;
> 34 %if &data_id %then %do ;
> 35 %sysfunc(attrn(&data_id, nobs))
> 36 %end ;
> 37 %else %do ;
> 38 %put WARNING: Open for dataset %data(&data) failed ;
> 39 %put WARNING: Macro OBSCNT will return the number of
> observations as missing. ;
> 40 %put %sysfunc(sysmsg()) ;
> 41 .
> 42 %end ;
> 43 %let rc = %sysfunc(close(&data_id)) ;
> 44 %mend obscnt ;
> 45 */
> 47 data _null_ ;
> 48 max = max(%obscnt(a), %obscnt(b), %obscnt(c)) ;
> 49 put max = ;
> 50 run ;
>Can anyone believe that this code is simpler, easier to maintain, and
>will be understood by more programmers? Please note, I do not object
>to the code. It is well written, and it serves a more general purpose
>than Paul Dorfman's. However, where I would want any good beginning
>DATA step programmer to understand DATA steps well enough to
>understand Dorfman's example, I would not expect the same for
>McDonald's code. In fact, I would not want such a beginning
>programmer working with macro code at all, since I believe that one
>should have a good understanding of SAS code before writing any macro.
I find the %obscnt () macro function much simpler, much faster, and much
more handy. FYI, you can put it in a TITLE/FOOTNOTE statement too, if you
Y'see, you put the macro in your macro library, you make some documentaion,
share it with people at your site, and then everyone has a new tool.
>I would include the obligatory Jack Hamilton warning that the datasets
>involved must be native SAS, nontape datasets without any removed
>observations or failure is to be expected. To the extent the
>McDonald's code is more general, it is more obligatory to include such
>restrictions with the macro.
Yep, %obscnt will not work with a tape data set. But I don't think the
other method would work, either...
>I have to ask, why is it more acceptable to open three SAS datasets in
>a DATA step than to read their dictionaries with multiple SET
>statements? Is it that one can permit things in macro that should not
>be done with code? Or is it that using more primitive tools, such as
>OPEN, makes it more permissible? Or is opening a SAS dataset just
>seen as so different from reading it that one is permissible while the
>other is not?
1--you can use a utility macro that you have fully tested and shared with
everyone, complete with documentation.
2--the name "%obscnt" sounds like "Observation Count" and should imply what
it does, giving at least an intutive hint if not a full answer just by
looking at it
3--the macro is more versatile and can run in open code, in a PROC or a
DATA step, where the multiple SET statement can only run in a data step.
4--It's faster and uses less resources
>In reconsidering the whole problem, I suggest the following
>possibility. The original question by David Wright
>[David_wright@SPRA.COM] gave a very simple DATA step:
>> data all;
>> set junk3;
>> set junk;
>Both SET statements are in the *open* DATA step in the sense that they
>are not part of a conditional statement or an explicit loop. Perhaps
>Paul McDonald meant his statements to be restricted in this manner and
>felt no need to say so. If so, I would agree that one is unlikely to
>find practical problems with such a restricted form of multiple SET
>statements, however, such steps still might be important in developing
>the programmer's intuition as to how the SAS DATA step works.
THANK YOU! seems we actually are agreeing (you with more detail, me with