Date: Fri, 7 Aug 1998 14:09:08 -0700
Reply-To: "Self, Karsten" <kself@VISA.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Self, Karsten" <kself@VISA.COM>
Subject: Re: FUZZY Matching
Content-Type: text/plain
I don't believe numeric to alpha coding then blocking via soundex would
be effective. Soundex classifies English-language words by phonetic
components. It doesn't translate well to non-English words, let along
random or arbitrary alpha patterns.
Summing digits blocks fairly effectively (values from 0 to 81 with 9
digits), and I've been playing with this. Once I've blocked values, I
can run a SPEDIS comparison on the original numeric strings -- this
works because SPEDIS is a transform-cost function, and isn't specific to
alphabetic codings, phonetics, linguistic frequency or probability. The
usual problem of interpreting the SPEDIS score and deciding on a
specific cutoff applies.
For those who've suggested going with a commercial solution, the current
problem is that my client *has* chosen a commercial solution (one of the
best), and is trying to evaluate its effectiveness. Hence the
independent analysis.
--
Karsten M. Self (kself@visa.com)
Trilogy Consulting
What part of "Gestalt" don't you understand?
> ----------
> From: msz03@health.state.ny.us[SMTP:msz03@health.state.ny.us]
> Sent: Friday, August 07, 1998 6:37 AM
> To: Self, Karsten
> Subject: Re: FUZZY Matching
>
> Hi. I haven't done blocking with a SPEDIS-transformed variable. I
> read
> your note on SSN matching. It's just a random thought, but I wonder
> if you
> just assigned a consonant to each digit and generated a SOUNDEX how it
> would work out?
>
>
|