Date: Mon, 27 Feb 2006 13:03:34 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Distinguishing business names from personal names
Content-Type: text/plain; format=flowed
pchoate@DDS.CA.GOV sagely replied:
>I've done this by sampling the data and flagging the two types of names.
>You can do this by searching for terms such as Mr, Mrs, Inc, Corporation
>etc. and also by hand. Use the scan function to separate words from the
>name fields into two datasets and then subset the terms unique to each
>set. Bootstrap this back to your data with a Cartesian join to look for
>additional common terms unique to one or the other group. When you have
>two working word lists then create weights based on word frequency form
>Finally Cartesian join the data back to the two lists and create weights
>on each record indicating likelihood of group membership based on number
>of matches. Develop an empirical criteria based on inspecting your
>There may be other things in your data that help identify business vs.
>personal - tax id structure, other linked files, etc.
Let me just add that this approach, while sound, is fraught with peril. As
already knows! He just didn't want to scare you off. :-)
I have found that plenty of businesses have names which sound (to a
Doctor Lock (a lock and key specialist)
Mister Carpet (carpet cleaners)
Sara Lee (nobody doesn't like her!)
Mrs. Butterworth's (okay, I *stuck* that one in for fun :-)
And, unfortunately, the converse is true:
Christine Chapel (person, church, service org, or Star Trek character?)
And good luck with individuals who named their business after themselves,
or who have incorporated themselves, or ...
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
Express yourself instantly with MSN Messenger! Download today - it's FREE!