LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
=========================================================================
Date:         Mon, 17 Jul 2006 14:28:03 -0400
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: FW: Identifying cases that almost match
Comments: To: "Snider-Lotz, Tom" <tsnider-lotz@previsor.com>
In-Reply-To:  <AEA6D84B49CB764DA4CE5DC54FD7E07F01340B77@qwizmail.previsor .com>
Content-Type: text/plain; charset=us-ascii; format=flowed;
              x-avg-checked=avg-ok-7C921353

At 06:23 PM 7/16/2006, Snider-Lotz, Tom wrote:

>I'm trying to identify cases that may belong to the same individuals, >even though their name might be entered slightly differently in the >different records (e.g., Ben Jones and Benjamin Jones). It just >occurred to me that I can easily solve my problem by using the >Duplicate Cases utility to find duplicates for the variable >ShortWholeName that I've created via the syntax. > >String ShortWholeName (a30). >Compute ShortWholeName = Concat (RTRIM(Lname), ", ", >SUBSTR(Fname,1,3)).

That's more or less how you do it: create a key that's broader - more permissive about matching - than is the one you're having trouble with.

There's no magic. You risk false matches, though you're using a pretty strict key that won't get many. "Robert" will match "Robin", "Samuel" match "Samantha". But requiring a strict match on the last name will eliminate most of those. (Worst likely case is siblings in families that like to use similar names for

You also risk false negatives, continuing to miss true matches. In your case, I'd worry more about that: "William" won't match "Bill", "Elizabeth" won't match "Betty", and any variation in spelling of the last name will spoil the match. (You may also find ambiguity about what name is the first. I'm "Walter Richard Ristow." You know me as "Richard Ristow", but occasional lists have me as "Walter.")

Strategy depends on how big your file is, how much work it's worth investing, and how many keys you have; for example, you can look for people who match on address but not on name, if you have address.

That can be a long story, though, since you then need criteria for evaluating the quality - likelihood of being correct - of matches that meet various combinations of criteria. I did one of these, in SAS, some


Back to: Top of message | Previous page | Main SPSSX-L page