Date: Mon, 19 Jun 2006 12:34:02 -0400
Reply-To: "Rickards, Clinton (GE Consumer Finance)"
<clinton.rickards@GE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Rickards, Clinton (GE Consumer Finance)"
<clinton.rickards@GE.COM>
Subject: Re: de-duping without a unique identifier
In-Reply-To: A<D0EC0BFE19A0BF4D8BED3A902D5A6B510BC020@bdtex.bdtrust.local>
Content-Type: text/plain; charset="iso-8859-1"
Jonathan,
Taking you literally, I think something like the following will do the trick:
proc sort data=master;
by last first;
run;
proc sort data=monthly (keep=last first)
out=monthly_nodups
nodupkey;
by last first;
run;
data new_master;
merge master (in=a)
monthly_nodups (in=b);
by last first; /* choose one of these if conditions: */
**if not (a and b); /* adds new monthly records to master */
**if a and not b; /* keep only master and no monthly */
run;
but I suspect you really want to be more selective and also handle mispellings, addresses, etc. If so, the questions become: how close is close enough to say that two records are identical? And what is your tolerance for error (false matches and false mismatches)?
HTH,
Clint
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Jonathan Woodring
Sent: Monday, June 19, 2006 11:48 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: de-duping without a unique identifier
Hi SAS-L,
I want to remove records (names) in a master file if they are contained
in a monthly update file. Here's the rub: we do not have a unique
identifier to easily do this in both files. Instead, we have first name,
middle initial, last name, address1, address2, city, state, zip, zip4. I
want to 'de-dupe' the master list of names, if the first name and last
name are direct matches. Any words of wisdom from the experts before
this amateur starts playing around? Thanks!
Jonathan
|