LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 20 Aug 2009 14:25:12 -0400
Reply-To:     Ian Whitlock <iw1sas@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Ian Whitlock <iw1sas@GMAIL.COM>
Subject:      Re: Merge with duplicates
Comments: cc: Ed A <behrens900@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Summary: analysis of a matching problem with code to illustrate #iw-value=1

Ed,

Your question looks suspicious to me and I am not sure that you got the help you need.

Given the two data sets

data set1; input name $ 1-16 num 17-20; datalines; Mark Williamson 111 Joe Sanders 333 John Doe 454 ; data set2; input name $ 1-16 num 17-20; datalines; Mark Williamson 787 Mark Williamson 333 John Doe 454 Mike Lee 444 Andrew Johnson 999 ;

And the request to compare SET1 with SET2. My first thought would be when is there a match 1) by NAME NUM, 2) by NAME alone, 3) by NUM alone, and 4) by NAME and NUM independently (i.e. not together).

Your NAME_FLAG and NUM_FLAG when correctly coded, capture the information in two of the four matching possibilities I would want. Your TOTAL_FLAGS adds no new information not already contained in your previous flags.

Here is code to show the problem and a simple direct solution.

data set1; input name $ 1-16 num ; datalines; Mark Williamson 111 Joe Sanders 333 John Doe 454 Ed A 777 Ed B 666 ;

data set2; input name $ 1-16 num ; datalines; Mark Williamson 787 Mark Williamson 333 John Doe 454 Mike Lee 444 Andrew Johnson 999 Ed B 777 Ed A 666 ;

proc sort data = set2 ; by name num ; run ;

proc sql ; create table temp as select name , num , name in (select name from set2) as name_flag , num in (select num from set2) as num_flag from set1 order by name, num ; quit ;

data wanted ; merge temp (in = temp) set2 (in = both) ; by name num ; if temp ; match = both ; run ;

title "WANTED" ; proc print data = wanted ; run ;

Ian Whitlock ================

Date: Mon, 17 Aug 2009 11:44:12 -0700 From: Ed A <behrens900@GMAIL.COM> Subject: Merge with duplicates

Hello,

I am trying to do the following. -Compare two data sets and flag every entry that occurs in BOTH data sets. However, I only want to output the rows in the first data set -- so if something is in the second data set and not in the first, I don't want it.

data set1; input name $ 1-16 num 17-20; datalines; Mark Williamson 111 Joe Sanders 333 John Doe 454

; data set2; input name $ 1-16 num 17-20; datalines; Mark Williamson 787 Mark Williamson 333 John Doe 454 Mike Lee 444 Andrew Johnson 999 ; proc sort data = set1; by name; run; proc sort data = set2; by name; run; Data Matches; merge set1(in=in_set1) set2(in=in_set2); by name; if in_set2; if in_set1 = in_set2 then name_flag = 1; else name_flag = 0; if name = "" then name_flag = 0; run; proc sort data = set1; by num; run; proc sort data = Matches; by num; run; Data Matches2; merge set1(in=in_set1) matches(in=in_matches); by num; if in_matches; if in_set1 = in_matches then num_flag = 1; else num_flag = 0; if num=. then num_flag = 0; total_flags = num_flag + name_flag; run; proc print data = matches2; run;

Ideal Output would be:

Obs name num name_flag num_flag total_ flags

1 Mark Williamson 111 1 0 1 2 Joe Sanders 333 0 1 1 3 John Doe 454 1 1 2

Actual Output is this: -I don't want any output from data set 2 that is not in data set 1.

Obs name num name_flag num_flag total_ flags

1 Mark Williamson 333 1 1 2 2 Mike Lee 444 0 0 0 3 John Doe 454 1 1 2 4 Mark Williamson 787 1 0 1 5 Andrew Johnson 999 0 0 0

I tried nodups and nodupkeys but those did not work.


Back to: Top of message | Previous page | Main SAS-L page