=========================================================================
Date: Wed, 19 Jul 2006 10:36:25 +0930
Reply-To: "Barnett, Adrian (HEALTH)" <adrian.barnett@HEALTH.SA.GOV.AU>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Barnett, Adrian (HEALTH)" <adrian.barnett@HEALTH.SA.GOV.AU>
Subject: Re: FW: Identifying cases that almost match
Content-Type: text/plain; charset="us-ascii"
Hi Ian
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Ian Maddrell
Sent: Tuesday, 18 July 2006 10:16
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: FW: Identifying cases that almost match
>What we would like to achieve is automated checking of names and
addresses
>from the Syntax windows or some other automation method.
>The problem with the suggested solutions seem to be mistypes,
>if someone for example called Ben Jones was accidently inputted
>as Ben Jines (common input error due to neighbouring keys on the
>keyboard) this would not be flagged as a potential problem.
>This seems to rely on some form of algorithm to notice potential
>mistypes and then flag them.
What you could try is converting the names via the SOUNDEX algorithm,
and checking to see any which match on their SOUNDEX codes but do not
match with an exact string match. Not foolproof, but may get you close
enough.
There is an implementation of the SOUNDEX algorithm in SPSS on
Raynauld's site www.spsstools.net.
There is a handy web-based implementation of SOUNDEX here:
http://www.dropby.com/
(Look under Phonetic Encoders)
You could try it out on some sample mis-spellings to see if it does what
you want.
The same site also has a NYSIIS calculator, which apparently has certain
advantages over SOUNDEX, but you would need to write your own
implementation.
These are just two of a big variety of string-comparison methods. There
is a list of other methods available here:
http://www.dcs.shef.ac.uk/~sam/stringmetrics.html
There are lots of other sources which might have something useful if you
Google "string similarity"
>I read the below site about the FEBRL project with great interest,
>does anyone have a link to the manual for the application
>as I cannot seem to locate the document on their site.
The documentation is with the software itself on the Sourceforge site:
http://sourceforge.net/project/showfiles.php?group_id=62161
(it's down low on the page, below the link to the software)
Hope there is something useful for you amongst this lot
Regards
Adrian
--
Adrian Barnett
Research & Information Officer Ph: +61 8 82266615
Research, Analysis and Evaluation Fax: +61 8
82267088
Strategic Planning and Research Branch
Strategic Planning and Population Health Division
SA Department of Health
This e-mail may contain confidential information, which also may be
legally privileged. Only the intended recipient(s) may access, use,
distribute or copy this e-mail. If this e-mail is received in error,
please inform the sender by return e-mail and delete the original. If
there are doubts about the validity of this message, please contact the
sender by telephone. It is the recipient's responsibility to check the