Date: Tue, 28 Feb 2006 14:18:45 -0500
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Distinguishing business names from personal names
Content-Type: text/plain; charset="us-ascii"
For a list of local scope, telephone directories tend to separate
business from residential directories. A nicely hash-indexed dataset
would support a very quick look-up to check for presence in a business
The INDEX() and INDEXW() function in SAS will also search name strings
for substrings that would distinguish many business names from personal
names: 'Inc.', 'Ltd.', 'Co.', 'Company', 'Store', 'Shop', 'Cafe',
'Restaurant', 'Pharmacy', 'Clinic', and 'Warehouse' come to mind
immediately. A lexicon with frequencies of a sample list of business
names and another of personal names would help identify differences and
limit the number of look-ups required.
At some stage program will likely have to throw out some names for
review. A fuzzy measure of membership in the set of business names would
help you make a decision on whether to review or not.
From: email@example.com [mailto:firstname.lastname@example.org]
On Behalf Of email@example.com
Sent: Monday, February 27, 2006 2:50 PM
Subject: Distinguishing business names from personal names
Has anyone developed code for distinguishing business names from