| Date: | Mon, 14 Mar 2011 14:09:56 -0700 |
| Reply-To: | Wei Wang <weiwangum@YAHOO.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Wei Wang <weiwangum@YAHOO.COM> |
| Subject: | Re: Find Different Names |
|
| In-Reply-To: | <AANLkTi==kF=MJRYm9cwM4-Tohu--pCa1ShUamvjytdQA@mail.gmail.com> |
| Content-Type: | text/plain; charset=iso-8859-1 |
Thanks Joe. In this case, as long as there are any different letters between two names they are treated as different names. For instance, Danny and Dan are two different names.
Wei
--- On Mon, 3/14/11, Joe Matise <snoopy369@GMAIL.COM> wrote:
From: Joe Matise <snoopy369@GMAIL.COM>
Subject: Re: Find Different Names
To: SAS-L@LISTSERV.UGA.EDU
Date: Monday, March 14, 2011, 3:53 PM
Wei,
This is a fairly trivial task if you're just comparint identical (Joe, Joe),
and even if it's just shortenings (Joe, Joey; Dan, Danny). However,
Sean-Shawn begins to add a significant layer of complexity to it - how would
you tell a computer algorithmically those are identical? Same goes for
"Marge/Margaret", "John/Jack", "Chris/Krystal", etc., particularly when you
have some that are sometimes valid and sometimes invalid (Chris/Krystal for
example). Some of the luminaries on SAS-L have addressed this sort of issue
in the past, and hopefully they can assist you (or you can google the
listserv's history for 'fuzzy matching'). In general, I would say that the
easiest way to approach this is one of the SOUNDEX type functions, but any
of those will have some risk of false positives and false negatives - the
only true way to do this is to make the list by hand.
-Joe
On Mon, Mar 14, 2011 at 3:39 PM, Wei Wang <weiwangum@yahoo.com> wrote:
> Hi guys,
>
> data have;
> infile datalines missover;
> input id name1 $char8. name2 $char8. name3 $char8.;
> datalines;
> 1 mike mike mike
> 2 joe joe joey
> 3 andy andy
> 4
> 5 danny dan
> 6 sean shawn
> 7 aaron
> ;
> run;
>
> I want to create a flag varialbe indicating different non-missing
> names. Here is the data I need.
>
> id name1 name2 name3 flag
> 1 mike mike mike 0
> 2 joe joe joey 1
> 3 andy andy 0
> 4 0
> 5 danny dan 1
> 6 sean shawn 1
> 7 aaron 0
>
> Thanks,
> Wei
>
>
>
>
|