| Date: | Wed, 18 Jun 2003 21:49:09 GMT |
| Reply-To: | "Timothy W. Victor" <tvictor@DOLPHIN.UPENN.EDU> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | "Timothy W. Victor" <tvictor@DOLPHIN.UPENN.EDU> |
| Organization: | University of Pennsylvania |
| Subject: | Re: Actively seeking algorithm to compare the "likeness" of two
character |
| Content-Type: | text/plain; charset=us-ascii; format=flowed |
Susie,
Computational geneticists use some things called edit distances to do
what your interested in. Bill Winkler at the Census Bureau has written
about the statistical properties of them and has incorporated them into
his work in record linkage.
Best,
Tim
Susie Li wrote:
> I do a fair amount of string comparisons, mainly between pairs of
> addresses.
>
> I'd like to know if there is efficient SAS codes out there that would
> help
> me identify if there is a distinct difference between the two addresses
> (rather than some cosmetic difference like: 300 N Shore St vs
> 300
> N. Shore Street). I'm tired of exact matches.
>
> I think it may involve categorical clustering. But I'm not sure.
>
> Susie Li
> Sanofi-Synthelabo, Inc.
> 90 Park Ave
> New York, NY 10016
> (212)551-4385
> susie.li@us.sanofi.com
>
>
> Important: The Information in this e-mail belongs to Sanofi-Synthelabo
> Inc., is intended for the use of the individual or entity to which it is
> addressed, and may contain information that is privileged, confidential,
> or exempt from disclosure under applicable law. If you are not the
> intended recipient, you are hereby notified that any disclosure,
> copying, distribution, or use of, or reliance on, the contents of this
> e-mail is prohibited. If you have received this e-mail in error, please
> notify us immediately by replying back to the sending e-mail address,
> and delete this e-mail message from your computer.
|