Date: Mon, 21 Oct 2002 16:46:58 -0400
Reply-To: Howard_Schreier@ITA.DOC.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Howard_Schreier@ITA.DOC.GOV
Subject: Re: "safe" merging?
I think the distinction Brad makes is reasonable, but that the observed
behavior may be a matter of software engineering.
Here's a minimal illustration. Start with two datasets:
data a; retain key 1; run;
data b; retain key 1; run;
Now join them:
proc sql;
select * from a, b where a.key=b.key;
quit;
There is no warning message. The SELECT is perfectly happy to present two
columns with the same name.
Now create an output dataset instead of a listing:
proc sql;
create table c as
select * from a, b where a.key=b.key;
quit;
The log includes the following
WARNING: Variable key already exists on file WORK.C.
So it's the CREATE TABLE which objects to the name collision.
Brad suggests that it should be smart enough to see that the two columns
named KEY were invoked in an equality condition in an inner join, and thus
must have identical values, so that losing one is not a cause for concern.
My theory is that the code which does the CREATE TABLE business cannot see
that deeply into the details of the SELECT.
On Mon, 21 Oct 2002 14:13:12 -0400, Goldman, Brad (AT-Atlanta)
<Brad.Goldman@AUTOTRADER.COM> wrote:
>> From: Ian Whitlock [mailto:WHITLOI1@WESTAT.com]
>> Brad,
>> I suspect that the real problem is the weak way SAS handles
>> error messaging.
>> If you ask for
>>
>> select a.* , b.*
>>
>> This means you want all of the variables from A and B.
>>
>> Thus it seems reasonable to inform you that your request will not be
>> fulfilled. On the other hand, WHEN YOU KNOW WHAT YOU ARE
>> DOING, it would be
>> nice to have a method of saying to SAS, "I am in control and
>> everything is
>> fine, don't report your little problem", or more generally "I
>> am in control
>> and I think the following action should be taken". In this
>> case you happen
>> to know information that SAS cannot presently know. Hence it
>> is impossible
>> to be right by insisting on the message or by insisting it
>> not be there.
>
>Ian, in your example, I think it is appropriate to have the warning.
>However, there is a difference in our examples that is detectable by SAS
>without resorting to telepathy. In my example, the variable being warned
>about is in the where statement, "where a.key=b.key". I imagine someone
>will dream up some counterexamples, but to me it seems that if the variable
>is being used on both sides of an equals sign in the where clause, then it
>is OK and expected and fine that it exists on both datasets, and I can't
see
>how it could do any harm. If that condition is not true, then the warning
>is appropriate.
>
>Brad