LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2002, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 27 Aug 2002 15:27:20 -1000
Reply-To:   Joanne Mor <jmor@HAWAII.EDU>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Joanne Mor <jmor@HAWAII.EDU>
Subject:   Re: Finding typos in names
Comments:   To: John Whittington <John.W@mediscience.co.uk>
In-Reply-To:   <5.1.0.14.2.20020828020607.03a90670@pop3.powernet.co.uk>
Content-type:   text/plain; charset=us-ascii

Sorry, I should have been clearer. I'm just looking for the records where there's a slight discrepancy between the names. Some I can tell right away that it's a typo, others I will need to go back to the source of the info for the corrections. For example, mother's name = ARAKAWA, father's name = JOHNSON, infant's name = JOHNSON-ARAKWA. I'm looking for a method to find the ones that are close but not perfect, so that I don't need to look at each individual record.

Joanne

-----Original Message----- From: John Whittington [mailto:John.W@mediscience.co.uk] Sent: Tuesday, August 27, 2002 3:12 PM To: Joanne Mor; SAS-L@LISTSERV.UGA.EDU Subject: Re: Finding typos in names

At 14:53 27/08/02 -1000, Joanne Mor wrote:

>I will be getting a data set with about 14K records, each with >mother's, >father's and infant's names. Since I will be using this data set to link >to other files, I want to find the records with typos in the last name >fields. What's the best way to do this? I don't want to use SOUNDEX >because I'm working with a lot of Asian and Pacific Island names (i.e. >lots of vowels). I'm using version 8.2.

Joanne, I may be missing something, but if the dataset is your only source of information, I don't really see how you could hope to determine which

'last names' were typos - since names come in 'all shapes and sizes', and virtually anything is 'possible'. If you had a large list of possible last names for the population/racial groups in question, then you could cross-check to see which of the names in your records corresponded to ones in your list, but even that would probably be of limited value (even if you could find such a reference list) - and could even let some typos through (as well as leaving a good number as being of 'undetermined' accuracy.

Am I missing something?

Kind Regards,

John

---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk Buckingham MK18 4EL, UK mediscience@compuserve.com ----------------------------------------------------------------


Back to: Top of message | Previous page | Main SAS-L page