LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2010)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 8 Sep 2010 13:28:12 -0700
Reply-To:     "Raffe, Sydelle, SSA" <DRaffe@acgov.org>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Raffe, Sydelle, SSA" <DRaffe@acgov.org>
Subject:      Re: Name Normalization
Comments: To: "Barnett, Adrian (DECS)" <Adrian.Barnett2@SA.GOV.AU>
In-Reply-To:  <6080BB245C48A04BA756C983ED335EC175D91B0224@EMSCM012.sagemsmrd01.sa.gov.au>
Content-Type: multipart/alternative;

How about address normalization software?

________________________________ From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Barnett, Adrian (DECS) Sent: Sunday, September 05, 2010 9:43 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: Name Normalization

Hi Kevan If your file has <10,000 records you can use the free version of LinkageWiz to do the de-duplication for you. It has a table built in which has all the variations on each name (it refers to these as "nicknames") which it uses as part of its fuzzy matching routine (it also uses a NYSIIS or a Soundex match, and optionally will use a string similarity measure as well). Exact matches, nickname matches and phonetic matches all get different weights in computing a match score.

You can find out more at www.linkagewiz.com<http://www.linkagewiz.com/>

If you have more than 10,000 cases, I'd recommend looking at FEBRL, which is free. You can find out more about FEBRL here: http://datamining.anu.edu.au/software/febrl/febrldoc/

What you are trying to do can in principle be done in SPSS (sort of), but it would be very hard to do it well, and would probably take more time than you have.

Adrian Barnett

Project Officer

Educational Measurement and Analysis

Data and Educational Measurement

DECS

ph 82261080

________________________________ From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Edwards, Kevan (MDH) Sent: Friday, 3 September 2010 4:40 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Name Normalization

Hello all...

Is anyone aware of a process, or a data file that can be use to normalize first names?

My goal is to be able to de-duplicate a data file that was put together from several sources of data by converting all instances or Bill, Billy, Willy, William, to William and all instances of Rob, Bob, Bobby, Robby, Robbie, Robert to Robert.

I envision using "IF" "THEN" syntax structures pointing to a data file with two variables, first the specific instance of the first name and second the normalized (standardized) format of that name.

However, I need to find the data file with common variations and a normalized version of first names and I haven't been able to find one to assist the automation of this process..

Thanks.

Kevan

---------------------------- Kevan Edwards Ph.D. Research Scientist III Health Economics Program, DHP/MDH 651-201-3551


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page