Date: Mon, 17 Aug 2009 20:49:34 -0400
Reply-To: Lou <lpogoda@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Lou <lpogoda@HOTMAIL.COM>
Organization: A noiseless patient Spider
Subject: Re: Merge with duplicates
"Ed A" <behrens900@gmail.com> wrote in message
news:f3f0d1f9-35e6-44c8-a4b7-bac4dd20ae75@j9g2000vbp.googlegroups.com...
> Hello,
>
> I am trying to do the following.
> -Compare two data sets and flag every entry that occurs in BOTH
> data sets. However, I only want to output the rows in the first data
> set -- so if something is in the second data set and not in the first,
> I don't want it.
>
>
> data set1;
> input name $ 1-16 num 17-20;
> datalines;
> Mark Williamson 111
> Joe Sanders 333
> John Doe 454
>
> ;
> data set2;
> input name $ 1-16 num 17-20;
> datalines;
> Mark Williamson 787
> Mark Williamson 333
> John Doe 454
> Mike Lee 444
> Andrew Johnson 999
> ;
> proc sort data = set1;
> by name;
> run;
> proc sort data = set2;
> by name;
> run;
proc sort data = set1;
by name num;
proc sort data = set2;
by name num;
data set1only set2only both;
merge set1 (in = set1)
set2 (in = set2);
by name num;
if set1 and set2 then output both;
if set1 and not set2 then output set1only;
if set2 and not set2 then output set2only;
run;
Print whichever output dataset you need.
> Data Matches;
> merge set1(in=in_set1) set2(in=in_set2);
> by name;
> if in_set2;
> if in_set1 = in_set2 then name_flag = 1;
> else name_flag = 0;
> if name = "" then name_flag = 0;
> run;
> proc sort data = set1;
> by num;
> run;
> proc sort data = Matches;
> by num;
> run;
> Data Matches2;
> merge set1(in=in_set1) matches(in=in_matches);
> by num;
> if in_matches;
> if in_set1 = in_matches then num_flag = 1;
> else num_flag = 0;
> if num=. then num_flag = 0;
> total_flags = num_flag + name_flag;
> run;
> proc print data = matches2;
> run;
>
>
>
> Ideal Output would be:
>
>
>
>
> Obs name num name_flag
> num_flag total_ flags
>
> 1 Mark Williamson 111
> 1 0 1
> 2 Joe Sanders 333
> 0 1 1
> 3 John Doe 454
> 1 1 2
>
>
> Actual Output is this:
> -I don't want any output from data set 2 that is not in data set 1.
>
>
> Obs name num name_flag
> num_flag total_ flags
>
> 1 Mark Williamson 333
> 1 1 2
> 2 Mike Lee 444
> 0 0 0
> 3 John Doe 454
> 1 1 2
> 4 Mark Williamson 787
> 1 0 1
> 5 Andrew Johnson 999
> 0 0 0
>
>
> I tried nodups and nodupkeys but those did not work.