| Date: | Tue, 27 Aug 2002 15:27:20 -1000 |
| Reply-To: | Joanne Mor <jmor@HAWAII.EDU> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Joanne Mor <jmor@HAWAII.EDU> |
| Subject: | Re: Finding typos in names |
|
| In-Reply-To: | <5.1.0.14.2.20020828020607.03a90670@pop3.powernet.co.uk> |
| Content-type: | text/plain; charset=us-ascii |
Sorry, I should have been clearer. I'm just looking for the records
where there's a slight discrepancy between the names. Some I can tell
right away that it's a typo, others I will need to go back to the source
of the info for the corrections. For example, mother's name = ARAKAWA,
father's name = JOHNSON, infant's name = JOHNSON-ARAKWA. I'm looking for
a method to find the ones that are close but not perfect, so that I
don't need to look at each individual record.
Joanne
-----Original Message-----
From: John Whittington [mailto:John.W@mediscience.co.uk]
Sent: Tuesday, August 27, 2002 3:12 PM
To: Joanne Mor; SAS-L@LISTSERV.UGA.EDU
Subject: Re: Finding typos in names
At 14:53 27/08/02 -1000, Joanne Mor wrote:
>I will be getting a data set with about 14K records, each with
>mother's,
>father's and infant's names. Since I will be using this data set to
link
>to other files, I want to find the records with typos in the last name
>fields. What's the best way to do this? I don't want to use SOUNDEX
>because I'm working with a lot of Asian and Pacific Island names (i.e.
>lots of vowels). I'm using version 8.2.
Joanne, I may be missing something, but if the dataset is your only
source
of information, I don't really see how you could hope to determine which
'last names' were typos - since names come in 'all shapes and sizes',
and
virtually anything is 'possible'. If you had a large list of possible
last
names for the population/racial groups in question, then you could
cross-check to see which of the names in your records corresponded to
ones
in your list, but even that would probably be of limited value (even if
you
could find such a reference list) - and could even let some typos
through
(as well as leaving a good number as being of 'undetermined' accuracy.
Am I missing something?
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------
|