| Date: | Tue, 28 Feb 2006 14:18:45 -0500 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: Distinguishing business names from personal names |
| Content-Type: | text/plain; charset="us-ascii" |
von Hippel:
For a list of local scope, telephone directories tend to separate
business from residential directories. A nicely hash-indexed dataset
would support a very quick look-up to check for presence in a business
directory.
The INDEX() and INDEXW() function in SAS will also search name strings
for substrings that would distinguish many business names from personal
names: 'Inc.', 'Ltd.', 'Co.', 'Company', 'Store', 'Shop', 'Cafe',
'Restaurant', 'Pharmacy', 'Clinic', and 'Warehouse' come to mind
immediately. A lexicon with frequencies of a sample list of business
names and another of personal names would help identify differences and
limit the number of look-ups required.
At some stage program will likely have to throw out some names for
review. A fuzzy measure of membership in the set of business names would
help you make a decision on whether to review or not.
Sig
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of von-hippel.1@osu.edu
Sent: Monday, February 27, 2006 2:50 PM
To: sas-l@uga.edu
Subject: Distinguishing business names from personal names
Has anyone developed code for distinguishing business names from
personal names?
|