LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
=========================================================================
Date:         Wed, 19 Jul 2006 10:36:25 +0930
Reply-To:     "Barnett, Adrian (HEALTH)" <adrian.barnett@HEALTH.SA.GOV.AU>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Barnett, Adrian (HEALTH)" <adrian.barnett@HEALTH.SA.GOV.AU>
Subject:      Re: FW: Identifying cases that almost match
Comments: To: Ian Maddrell <Ian.Maddrell@pcwb.com>
Content-Type: text/plain; charset="us-ascii"

Hi Ian

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Ian Maddrell Sent: Tuesday, 18 July 2006 10:16 To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: FW: Identifying cases that almost match

>What we would like to achieve is automated checking of names and addresses >from the Syntax windows or some other automation method. >The problem with the suggested solutions seem to be mistypes, >if someone for example called Ben Jones was accidently inputted >as Ben Jines (common input error due to neighbouring keys on the >keyboard) this would not be flagged as a potential problem. >This seems to rely on some form of algorithm to notice potential >mistypes and then flag them.

What you could try is converting the names via the SOUNDEX algorithm, and checking to see any which match on their SOUNDEX codes but do not match with an exact string match. Not foolproof, but may get you close enough. There is an implementation of the SOUNDEX algorithm in SPSS on Raynauld's site www.spsstools.net.

There is a handy web-based implementation of SOUNDEX here: http://www.dropby.com/

(Look under Phonetic Encoders)

You could try it out on some sample mis-spellings to see if it does what you want.

The same site also has a NYSIIS calculator, which apparently has certain advantages over SOUNDEX, but you would need to write your own implementation.

These are just two of a big variety of string-comparison methods. There is a list of other methods available here: http://www.dcs.shef.ac.uk/~sam/stringmetrics.html

There are lots of other sources which might have something useful if you Google "string similarity"

>I read the below site about the FEBRL project with great interest, >does anyone have a link to the manual for the application >as I cannot seem to locate the document on their site.

The documentation is with the software itself on the Sourceforge site: http://sourceforge.net/project/showfiles.php?group_id=62161

(it's down low on the page, below the link to the software)

Hope there is something useful for you amongst this lot

Regards

Adrian

-- Adrian Barnett Research & Information Officer Ph: +61 8 82266615 Research, Analysis and Evaluation Fax: +61 8 82267088 Strategic Planning and Research Branch Strategic Planning and Population Health Division SA Department of Health

This e-mail may contain confidential information, which also may be legally privileged. Only the intended recipient(s) may access, use, distribute or copy this e-mail. If this e-mail is received in error, please inform the sender by return e-mail and delete the original. If there are doubts about the validity of this message, please contact the sender by telephone. It is the recipient's responsibility to check the


Back to: Top of message | Previous page | Main SPSSX-L page