| Date: | Tue, 17 Jun 2003 09:58:24 -0400 |
| Reply-To: | Ed Heaton <EdHeaton@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Ed Heaton <EdHeaton@WESTAT.COM> |
| Subject: | Re: Actively seeking algorithm to compare the "likeness" of two c
haracter strings |
|
| Content-Type: | text/plain |
|---|
Susie,
If you are looking for something pre-written, look at the SPEDIS function.
(SPElling DIStance). It returns values from 0 to 200 with 200 being the
poorest match. It is basically a metrix of the work to transform one string
into the other. It is not communicative. E.g.:
Data _null_ ;
Length x y $20 ;
x="Susie Li"; y="SUSIE LI"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=)
;
x="Susie Li"; y="Sue Li"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=) ;
x="Susie Li"; y="Li, Susie"; f=speDis(x,y); b=speDis(y,x); Put
(_all_)(=) ;
x="Susie Li"; y="Ed Heaton"; f=speDis(x,y); b=speDis(y,x); Put
(_all_)(=) ;
Run ;
returns
x=Susie Li y=SUSIE LI f=62 b=62
x=Susie Li y=Sue Li f=25 b=16
x=Susie Li y=Li, Susie f=87 b=81
x=Susie Li y=Ed Heaton f=106 b=92
Ed
Edward Heaton, Senior Systems Analyst,
Westat (An Employee-Owned Research Corporation),
1600 Research Boulevard, Room RW-3541, Rockville, MD 20850-3195
Voice: (301) 610-4818 Fax: (301) 610-5128
mailto:EdHeaton@westat.com http://www.westat.com
-----Original Message-----
From: Susie Li [mailto:Susie.Li@US.SANOFI.COM]
Sent: Monday, June 16, 2003 10:15 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Actively seeking algorithm to compare the "likeness" of two
character strings
I do a fair amount of string comparisons, mainly between pairs of
addresses.
I'd like to know if there is efficient SAS codes out there that would
help
me identify if there is a distinct difference between the two addresses
(rather than some cosmetic difference like: 300 N Shore St vs
300
N. Shore Street). I'm tired of exact matches.
I think it may involve categorical clustering. But I'm not sure.
Susie Li
Sanofi-Synthelabo, Inc.
90 Park Ave
New York, NY 10016
(212)551-4385
susie.li@us.sanofi.com
Important: The Information in this e-mail belongs to Sanofi-Synthelabo
Inc., is intended for the use of the individual or entity to which it is
addressed, and may contain information that is privileged, confidential,
or exempt from disclosure under applicable law. If you are not the
intended recipient, you are hereby notified that any disclosure,
copying, distribution, or use of, or reliance on, the contents of this
e-mail is prohibited. If you have received this e-mail in error, please
notify us immediately by replying back to the sending e-mail address,
and delete this e-mail message from your computer.
|