Date: Mon, 19 Jun 2006 12:34:02 -0400
Reply-To: "Rickards, Clinton (GE Consumer Finance)"
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Rickards, Clinton (GE Consumer Finance)"
Subject: Re: de-duping without a unique identifier
Content-Type: text/plain; charset="iso-8859-1"
Taking you literally, I think something like the following will do the trick:
proc sort data=master;
by last first;
proc sort data=monthly (keep=last first)
by last first;
merge master (in=a)
by last first; /* choose one of these if conditions: */
**if not (a and b); /* adds new monthly records to master */
**if a and not b; /* keep only master and no monthly */
but I suspect you really want to be more selective and also handle mispellings, addresses, etc. If so, the questions become: how close is close enough to say that two records are identical? And what is your tolerance for error (false matches and false mismatches)?
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU]On Behalf Of
Sent: Monday, June 19, 2006 11:48 AM
Subject: de-duping without a unique identifier
I want to remove records (names) in a master file if they are contained
in a monthly update file. Here's the rub: we do not have a unique
identifier to easily do this in both files. Instead, we have first name,
middle initial, last name, address1, address2, city, state, zip, zip4. I
want to 'de-dupe' the master list of names, if the first name and last
name are direct matches. Any words of wisdom from the experts before
this amateur starts playing around? Thanks!