Date: Mon, 25 Nov 2002 22:11:29 +0000
Reply-To: alejandro.jaramillo@ATT.NET
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: alejandro.jaramillo@ATT.NET
Subject: Re: Set statement bug?
Ladies and Gentleman,
I find the information exchange about the set statement very interesting.
Please don't get personal, keep a good spirit and let's move on.
These is my one cent.
Alejandro
> On Fri, 22 Nov 2002 13:10:49 -0500, Ian Whitlock <WHITLOI1@WESTAT.COM>
> wrote:
>
> >Paul,
> >
> >You have a style of making pronouncements as if they were true. I have
> >examined some of them below from my point of view. I ask each reader to
> >draw his/her own conclusions about both points of view and then to
> >continually reconfirm them as new programming experiences dictate. All of
> >my paragraphs begin with ***.
> >
> >IanWhitlock@westat.com
> >
>
>
> Perhaps a better way to phrase that is "I have a style of sticking my foot
> in my mouth" (which anyone who knows me would shrug and say, "yeah...")
>
> When I started posting on this "multiple SET statement" stuff, I really did
> not realize that most of the people on this board actually know how SAS
> works. I unfortunately spend most of my day showing people how to use PROC
> CONTENTS and PROC SORT (with the occasional PROC FREQ). My first reaction
> is to keep anyone from doing anything that they can't handle, and to play
> defensive on the use of SAS software.
>
> >
> >"Unpredictable" certainly was the incorrect choice for a word, because SAS
> >is always predictible if you know everything. But then if you knew
> >everything, you wouldn't need to write a SAS program to analyze the data!
> >
> >HOWEVER, I would still stray away from using multiple SET statements in the
> >same data step for the following reasons:
> >
> >1) Observations from the end of one or more input data sets will be
> >deleted from the output data set unless all input data sets have the same
> >number of observations.
> >
> >*** False as shown by
> >
> > data all ; /* eofa and eofb are not on a or b */
> > do until ( eofa ) ; set a end = eofa ; output ; end ;
> > do until ( eofb ) ; set b end = eofb ; output ; end ;
> > run ;
>
>
> Yeah, if you put other conditions in data step. I posted my log on my test
> job for this earlier, and the two methods of coding produced different
> results.
>
> Here's a question: why would you want to do that when you can have just
> one SET statement with two data sets and avoid the DO UNTIL loops?
> Practical example please.
>
>
> >2) Combining several data sets into one data set with multiple SET
> >statements mimmicks a merge, but each input data step may not be in sorted
> >order and is not required to be in sorted order.
> >
> >*** This is a rather limited point of view. How would you perform the DATA
> >step of the previous paragraph with a MERGE? If your answer is that you
> >wouldn't, then that is a contradiction to the claim that the code mimics a
> >MERGE.
>
> I would not, and my answer is that, in the example above, I would do this:
>
> /* example 1 */
> data all ;
> set a b ;
> run ;
>
> /*or this, example 2 */
> proc append base=all data=a ;
> run ;
>
> proc append base=all data=b ;
> run ;
>
> And that could be sped up even faster if I could skip the first PROC APPEND
> since all it is doing is copying WORK.A to WORK.ALL (but the specs might
> not allow it).
>
> And then I would say that Okay, you're right. In the example you chose,
> the multiple SET statements do not mimmick a merge. However, in the
> original exmaple given that started this thread, the attempt does mimmick a
> merge. If you want to discuss it further, great--we can have another
> example and more detail in the topic.
>
>
> >*** The choice of your words suggests that there is something wrong with
> >mimicking a MERGE. However, your "but" clause suggests that there are
> times
> >when mimicking a MERGE is most appropriate. What is being claimed here?
>
> Change the "but" to an "and" if you like.
>
>
> >3) Combining several data sets into one data set with multiple SET
> >statements mimmicks a merge, but each input data step may not have matching
> >keys (even if it is in "sorted" order).
> >
> >*** The choice of your words suggests that there is something wrong with
> >mimicking a MERGE. However, your "but" clause suggests that there are
> times
> >when mimicking a MERGE is most appropriate. What is being claimed here?
>
> Please see above.
>
> >4) Combining several data sets through one data step and out to multiple
> >data sets runs the risk of multiplying the issues above.
> >
> >*** In view of my thoughts on 1)-3) I am confused about what is being
> >multiplied.
>
>
> What I have found to happen is that users who make errors in coding earlier
> in the program can compound those errors later in the program, and these
> errors in the final output data get multiplied as the program introduces
> more data. The more times data is manipulated, the more chances there are
> for errors.
>
>
>
> >5) From my experience, the intent that most users have in mind when using
> >multiple SET statements in the same data step is better and more
> >efficiently resolved using MERGE statements, PROC SQL. and/or other data
> >manipulation tools.
> >
> >*** Please note this is a statement about your experience. Although I
> >cannot question it, it indicates that you may have met rather limited
> users.
>
> CLIENT: Paul, How do I print a SAS data set?
> PAUL: PROC PRINT.
> CLIENT: Thanks!
> PAUL: That'll be $10,000 please.
>
> I've "MET" many powerful, wonderful, and highly intelligent SAS users.
> I've even worked on projects where we used cutting edge SAS products,
> including the first US site for Risk Dimensions (I was the first SAS
> Quality Partner in North America with RD experience, and the second NA RD
> license). Unfortunately most of my work is indeed very simple SAS code,
> and I'm usually working with people who have an entire 3 days of SAS
> experience.
>
> >To the extent that it is a claim about SAS, it appears to be a repetition
> of
> >2) or 3) now allowing mimics of a MERGE with SQL.
>
> Probably, now that I look at it.
>
>
> >6) Multiple SET statements in one data step could lead to the overwriting
> >of variables with the same name, rather than appending new records
> >(or "creating" new records) as when several data sets are used in one SET
> >statement. No warning will occur in the log if this happens.
> >
> >*** True. However, one assignment statement can overwrite another without
> a
> >warning in the log. Does this mean one should abandon assignment
> >statements? I think a better point of view is that programs are dangerous
> >when you don't know what they are doing. Consequently you should find out,
> >rather than abandon what might be an important programming technique.
>
> Of course not, it just means that programmers should be aware of it.
>
> >
> >7) If any of the SET statements are inside a condition, then the value of
> >the last record read will be retained for the remaining records until the
> >first time one of the input data sets encounters the end-of-data marker.
> >
> >*** Is this a claim. If so, then is it for or against?
>
> ????
>
> >
> >8) If one of the SET statements declares a data set with zero records,
> >then the resulting data set output will provide the n-1 records from the
> >iteration in which the "zero-record" data was called (it could be
> >conditional).
> >
> >*** I don't understand what is being said. If it means that there is
> >something wrong with conditional SET statements to empty data sets, then I
> >think that wrong because I sometimes find such statements very useful in
> >determining the logical PDV and the attributes of input variables.
>
> Cool Idea! Thanks!
>
> >
> >9) The SET statement wasn't designed to occur more than once in a data
> >step. Just because it CAN be done does not mean that it SHOULD be done.
> >
> >*** The first statement is a claim about history that I do not have access
> >to, but it is contradictory to some of the SAS Institute published material
> >and consequently requires some form evidence before one can accept it.
>
> Such as? I'd love to read it! (and I'm not doubting you)
>
> >
> >*** I agree with the second statement, and suggest that the "it" can be
> >replaced by anything that can have a CAN and SHOULD context, since the
> >statement is really about the relationship between CAN and SHOULD.
> However,
> >the statement does not say anything about what SAS code restrictions one
> >should follow, other than to possibly mean that not all valid SAS code
> >should be written.
> >
> >10) Multiple SET statements in one data step are confusing, outside
> >standards, and overly challenging to support in production code.
> >
> >*** There are three unsupported claims here. I find all of them suspect
> and
> >dependent on who is confused, who makes the standards, and who supports the
> >production code.
> >
> >Of course, the aforementioned "IF _N_=1 THEN SET" routine is an exception
> >that is well-documented and supported by SAS, and I would exclude it from
> >these 10 points.
> >
> >And, I suppose, someone could come up with a practical use for these issues
> >and call it a "feature" ...
> >
> >However, I recommend avoid it.
> >
> >*** Recommending avoidance of a class of SAS programs restricts what kinds
> >of programs can be written and consequently the programming ability of
> >anyone following that recomendation, hence I consider it very important to
> >present evidence when making such a claim. I also see it as an important
> >obligation to point when I disagree with such claims.
> >
>
> The more I read through your note, the more it seemed to me that you felt I
> was either making a personal jab at you or that you wanted to destroy
> anything that I said, as though it were some kind of personal vendetta. I
> didn't mean to start a fight, just to prevent beginners from running before
> they could walk.
>
> I've found some neat ideas here, and I'd forgotten about the _IORC_ feature
> with two SET statements. However, most of the remaing examples that I've
> seen I cannot think of a practical application where I would use it, and I
> eagerly await other such input.
>
> Cordially,
>
> Paul McDonald
> SPIKEware, Inc.
> ------------------------------
> Free SAS Tutorials and Newsletter
> http://www.spikeware.com/
|