LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (August 1998, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 7 Aug 1998 14:09:08 -0700
Reply-To:     "Self, Karsten" <kself@VISA.COM>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         "Self, Karsten" <kself@VISA.COM>
Subject:      Re: FUZZY Matching
Comments: To: "msz03@health.state.ny.us" <msz03@health.state.ny.us>
Content-Type: text/plain

I don't believe numeric to alpha coding then blocking via soundex would be effective. Soundex classifies English-language words by phonetic components. It doesn't translate well to non-English words, let along random or arbitrary alpha patterns.

Summing digits blocks fairly effectively (values from 0 to 81 with 9 digits), and I've been playing with this. Once I've blocked values, I can run a SPEDIS comparison on the original numeric strings -- this works because SPEDIS is a transform-cost function, and isn't specific to alphabetic codings, phonetics, linguistic frequency or probability. The usual problem of interpreting the SPEDIS score and deciding on a specific cutoff applies.

For those who've suggested going with a commercial solution, the current problem is that my client *has* chosen a commercial solution (one of the best), and is trying to evaluate its effectiveness. Hence the independent analysis.

-- Karsten M. Self (kself@visa.com) Trilogy Consulting

What part of "Gestalt" don't you understand?

> ---------- > From: msz03@health.state.ny.us[SMTP:msz03@health.state.ny.us] > Sent: Friday, August 07, 1998 6:37 AM > To: Self, Karsten > Subject: Re: FUZZY Matching > > Hi. I haven't done blocking with a SPEDIS-transformed variable. I > read > your note on SSN matching. It's just a random thought, but I wonder > if you > just assigned a consonant to each digit and generated a SOUNDEX how it > would work out? > >


Back to: Top of message | Previous page | Main SAS-L page