LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 9 Sep 2007 19:57:07 -0400
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: fuzzy match of two name variables
Comments:   To: lisiqi77@yahoo.com
In-Reply-To:   <1189217681.729132.263770@19g2000hsx.googlegroups.com>
Content-Type:   text/plain; charset="us-ascii"

Tracy: SOUNDEX, SPEDIS, and COMPGED work fairly well for fuzzy matching problems of around 20,000 sets of identifiers. The SAS-L Archives contain many descriptions of how each of these function/operators work. Please write back if you have questions. S

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of lisiqi77@yahoo.com Sent: Friday, September 07, 2007 10:15 PM To: sas-l@uga.edu Subject: fuzzy match of two name variables

Hi, All,

I have a dataset (with obs around 20,000) which contains two variables, Var1 and Var2. Both of them are either persons' names or entities' names. What I want to do is to find the cases where Var1=Var2.

The problems are:

1) names are the only identifer I have; 2) both variables could contain spelling errors (e.g., Fidelity vs. Fiedelity) or variations of one name (e.g., Fidelity management vs. Fidelity MGMT Inc.).

I've stanardized both variables by turning them into upcases, deleting special characters, removing special suffix (such as INC), and deleting multiple blanks, etc.

I am wondering if functions such as SOUNDEX, SPEDIS, or COMPGED will help here. Or something else in the fuzzy match category? (I understand that probably no matter which method I use, I still have to mannual check the matched result.)

Examples of Var1 look like the following:

A. Alfred Taubman A.I.M. Overseas Ltd ABBOTT LABS STOCK RETIREMENT TRUST ABDULLAH TAHA BAKHSH ABELE; JOHN E. ABRAMSON; LEONARD ACKERMAN; JOEL ACKERMANS & VAN HAAREN GROUP ACORN FUND A SERIES OF THE ACORN INVESTM ACORN FUND-A SERIES OF THE ACORN INVESTM ACTINIUM HOLDING CORP ADAMS; MARY C. ...

Thanks very much for your comments!

Tracy


Back to: Top of message | Previous page | Main SAS-L page