LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 27 Feb 2006 13:03:34 -0800
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: Distinguishing business names from personal names
In-Reply-To:  <200602272025.k1RK2txh004689@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed

pchoate@DDS.CA.GOV sagely replied: >I've done this by sampling the data and flagging the two types of names. >You can do this by searching for terms such as Mr, Mrs, Inc, Corporation >etc. and also by hand. Use the scan function to separate words from the >name fields into two datasets and then subset the terms unique to each >set. Bootstrap this back to your data with a Cartesian join to look for >additional common terms unique to one or the other group. When you have >two working word lists then create weights based on word frequency form >the data. > >Finally Cartesian join the data back to the two lists and create weights >on each record indicating likelihood of group membership based on number >of matches. Develop an empirical criteria based on inspecting your >results. > >There may be other things in your data that help identify business vs. >personal - tax id structure, other linked files, etc.

Let me just add that this approach, while sound, is fraught with peril. As Paul already knows! He just didn't want to scare you off. :-)

I have found that plenty of businesses have names which sound (to a computer) like people:

Doctor Lock (a lock and key specialist) Mister Carpet (carpet cleaners) Sara Lee (nobody doesn't like her!) Mrs. Butterworth's (okay, I *stuck* that one in for fun :-) . . .

And, unfortunately, the converse is true: Christine Chapel (person, church, service org, or Star Trek character?)

And good luck with individuals who named their business after themselves, or who have incorporated themselves, or ...

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/


Back to: Top of message | Previous page | Main SAS-L page