LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2003, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 17 Jun 2003 09:58:24 -0400
Reply-To:   Ed Heaton <EdHeaton@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Ed Heaton <EdHeaton@WESTAT.COM>
Subject:   Re: Actively seeking algorithm to compare the "likeness" of two c haracter strings
Comments:   To: "Susie.Li@US.SANOFI.COM" <Susie.Li@US.SANOFI.COM>
Content-Type:   text/plain

Susie,

If you are looking for something pre-written, look at the SPEDIS function. (SPElling DIStance). It returns values from 0 to 200 with 200 being the poorest match. It is basically a metrix of the work to transform one string into the other. It is not communicative. E.g.:

Data _null_ ; Length x y $20 ; x="Susie Li"; y="SUSIE LI"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=) ; x="Susie Li"; y="Sue Li"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=) ; x="Susie Li"; y="Li, Susie"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=) ; x="Susie Li"; y="Ed Heaton"; f=speDis(x,y); b=speDis(y,x); Put (_all_)(=) ; Run ;

returns

x=Susie Li y=SUSIE LI f=62 b=62 x=Susie Li y=Sue Li f=25 b=16 x=Susie Li y=Li, Susie f=87 b=81 x=Susie Li y=Ed Heaton f=106 b=92

Ed

Edward Heaton, Senior Systems Analyst, Westat (An Employee-Owned Research Corporation), 1600 Research Boulevard, Room RW-3541, Rockville, MD 20850-3195 Voice: (301) 610-4818 Fax: (301) 610-5128 mailto:EdHeaton@westat.com http://www.westat.com

-----Original Message----- From: Susie Li [mailto:Susie.Li@US.SANOFI.COM] Sent: Monday, June 16, 2003 10:15 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Actively seeking algorithm to compare the "likeness" of two character strings

I do a fair amount of string comparisons, mainly between pairs of addresses.

I'd like to know if there is efficient SAS codes out there that would help me identify if there is a distinct difference between the two addresses (rather than some cosmetic difference like: 300 N Shore St vs 300 N. Shore Street). I'm tired of exact matches.

I think it may involve categorical clustering. But I'm not sure.

Susie Li Sanofi-Synthelabo, Inc. 90 Park Ave New York, NY 10016 (212)551-4385 susie.li@us.sanofi.com

Important: The Information in this e-mail belongs to Sanofi-Synthelabo Inc., is intended for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of, or reliance on, the contents of this e-mail is prohibited. If you have received this e-mail in error, please notify us immediately by replying back to the sending e-mail address, and delete this e-mail message from your computer.


Back to: Top of message | Previous page | Main SAS-L page