LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 28 Feb 2006 14:18:45 -0500
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: Distinguishing business names from personal names
Content-Type:   text/plain; charset="us-ascii"

von Hippel: For a list of local scope, telephone directories tend to separate business from residential directories. A nicely hash-indexed dataset would support a very quick look-up to check for presence in a business directory.

The INDEX() and INDEXW() function in SAS will also search name strings for substrings that would distinguish many business names from personal names: 'Inc.', 'Ltd.', 'Co.', 'Company', 'Store', 'Shop', 'Cafe', 'Restaurant', 'Pharmacy', 'Clinic', and 'Warehouse' come to mind immediately. A lexicon with frequencies of a sample list of business names and another of personal names would help identify differences and limit the number of look-ups required.

At some stage program will likely have to throw out some names for review. A fuzzy measure of membership in the set of business names would help you make a decision on whether to review or not. Sig

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of von-hippel.1@osu.edu Sent: Monday, February 27, 2006 2:50 PM To: sas-l@uga.edu Subject: Distinguishing business names from personal names

Has anyone developed code for distinguishing business names from personal names?


Back to: Top of message | Previous page | Main SAS-L page