Date: Fri, 7 Aug 1998 14:09:08 -0700
Reply-To: "Self, Karsten" <kself@VISA.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Self, Karsten" <kself@VISA.COM>
Subject: Re: FUZZY Matching
I don't believe numeric to alpha coding then blocking via soundex would
be effective. Soundex classifies English-language words by phonetic
components. It doesn't translate well to non-English words, let along
random or arbitrary alpha patterns.
Summing digits blocks fairly effectively (values from 0 to 81 with 9
digits), and I've been playing with this. Once I've blocked values, I
can run a SPEDIS comparison on the original numeric strings -- this
works because SPEDIS is a transform-cost function, and isn't specific to
alphabetic codings, phonetics, linguistic frequency or probability. The
usual problem of interpreting the SPEDIS score and deciding on a
specific cutoff applies.
For those who've suggested going with a commercial solution, the current
problem is that my client *has* chosen a commercial solution (one of the
best), and is trying to evaluate its effectiveness. Hence the
Karsten M. Self (firstname.lastname@example.org)
What part of "Gestalt" don't you understand?
> From: email@example.com[SMTP:firstname.lastname@example.org]
> Sent: Friday, August 07, 1998 6:37 AM
> To: Self, Karsten
> Subject: Re: FUZZY Matching
> Hi. I haven't done blocking with a SPEDIS-transformed variable. I
> your note on SSN matching. It's just a random thought, but I wonder
> if you
> just assigned a consonant to each digit and generated a SOUNDEX how it
> would work out?