Date: Thu, 18 Aug 2011 13:44:18 -0400
Reply-To: "Bian, Haikuo" <HBian@FLQIO.SDPS.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Bian, Haikuo" <HBian@FLQIO.SDPS.ORG>
Subject: Re: outer join mishap
In-Reply-To: <201108181640.p7IESlTA014312@waikiki.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Robert,
It happened once to me when your condition variables contain lots of missing values: in your case, sai and eff_dt. Please see below:
data a;
infile cards missover;
input a b ;
cards;
1 2
2
3
;
data b;
infile cards missover;
input a b ;
cards;
1 2
2
3
;
proc sql;
select * from b right join a
on a.b=b.b
;quit;
SAS does not know which one to match, as they are all "equal" missing values, so SAS ends up matching them all.
Regards,
Haikuo
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Robert Feyerharm
Sent: Thursday, August 18, 2011 12:40 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: outer join mishap
I'm having some problems performing a simple outer join in proc sql.
I'm trying to link two insurance claims datasets by policy number and
policy date. I'm running the following code:
proc sql;
create table merged_sai_effdt as
select *, (a.sai=b.sai and a.eff_dt=b.eff_dt) as ind,
(a.sai=b.sai) as sai_ind,
(a.eff_dt=b.eff_dt) as effdt_ind
from all_markets_valid_nomiss as a
right join natl_accts_valid_nomiss as b
on a.sai=b.sai and a.eff_dt=b.eff_dt;
quit;
The all_markets_valid_nomiss dataset contains 112,575 records, and the
natl_accts_valid_nomiss dataste contains 112,683 records. Yet when I
perform the right join, I end up with a merged dataset containing
20,751,797 records, when the end result should be a right joined dataset
containing 112,683 records. There are no missing values for the linking
variables in either dataset, and the linking variables share the same type
in both datasets.
Any ideas what's going wrong here?
Thanks for your help!
Robert
-----------------------------------------
Email messages cannot be guaranteed to be secure or error-free as
transmitted information can be intercepted, corrupted, lost,
destroyed, arrive late or incomplete, or contain viruses. The
Centers for Medicare & Medicaid Services therefore does not accept
liability for any error or omissions in the contents of this
message, which arise as a result of email transmission.
CONFIDENTIALITY NOTICE: This communication, including any
attachments, may contain confidential information and is intended
only for the individual or entity to which it is addressed. Any
review, dissemination, or copying of this communication by anyone
other than the intended recipient is strictly prohibited. If you
are not the intended recipient, please contact the sender by reply
email and delete and destroy all copies of the original message.
|