Date: Fri, 16 Mar 2007 10:39:53 -0400
Reply-To: Dominc Mitchell <mitchell.d@VIDEOTRON.CA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dominc Mitchell <mitchell.d@VIDEOTRON.CA>
Subject: Re: Subsetting data based on similar sounding names
In-Reply-To: <9b5abea60703160709v5267d9f7m6bcefa7065c8538@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"
Hi,
Well here is a bit of code that does just that. Even though you do not need
cleaning the name variable, it is probably easier to deal with the first and
the last name and leave out the middle part. At least, this is what I
think with the example that you showed us.
Regards,
Dominic.
data x;
length name $100;
name="Anthony Tamar" ;output;
name="Anthony V Tamar" ;output;
name="paul V king" ;output;
name ="paul king"; ;output;
name="moon park";output;
name="thomas li";output;
run;
data test;
set x;
name1=prxchange('s/^([a-z]+).*\s([a-z]+)/$1 $2/i',-1,name);
last=trim(left(scan(name1,2)));
first=trim(left(scan(name1,1)));
proc sort;
by last first;
run;
data test1;
set test;
obs=_n_;
run;
proc sql;
create table duplicates as
select a.name
from test1 as a, test1 as b
where soundex(a.last)=soundex(b.last) and soundex(a.first)=soundex(b.first)
and a.obs ne b.obs;
quit;
proc print;
run;
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of souga
soga
Sent: Friday, March 16, 2007 10:09
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Subsetting data based on similar sounding names
I apologize for not explaining the task clearly.I need the output to look
like this:
Anthony Tamar
Anthony V Tamar
paul V king
paul king
Essentially the task is to output all the names that sound similar to any
other name in the dataset.
I am hoping that someone could help me with this.
Thanks again.
Sa Polo
On 3/16/07, Gerhard Hellriegel <gerhard.hellriegel@t-online.de> wrote:
>
> Sorry, but could you explane what in "paul" sounds like "Anthony"?? Ok, my
> english is bad, but that I don't see!
> Gerhard
>
>
> On Thu, 15 Mar 2007 16:51:27 -0400, souga soga <souga1234@GMAIL.COM>
> wrote:
>
> >Thanks, but i need only the first 4 observations as they are similar in
> the
> >output set and they do not have to be cleaned.
> >
> >On 3/15/07, Dominc Mitchell <mitchell.d@videotron.ca> wrote:
> >>
> >>
> >>
> >> Hi,
> >>
> >> That would work with your example. It only uses the first and last
> name.
> >> But if your data set has more complex comparison (eg typos in names)
> then
> >> you would need something more elaborate.
> >>
> >> Dominic.
> >>
> >> data x;
> >> length name $100;
> >> name="Anthony Tamar" ;output;
> >> name="Anthony V Tamar" ;output;
> >> name="paul V king" ;output;
> >> name ="paul king"; ;output;
> >> name="moon park";output;
> >> name="thomas li";output;
> >> run;
> >>
> >>
> >> data test;
> >> set x;
> >> name1=prxchange('s/^([a-z]+).*\s([a-z]+)/$1 $2/i',-1,name);
> >> proc print;
> >> run;
> >>
> >>
> >>
> >> -----Original Message-----
> >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> souga
> >> soga
> >> Sent: Thursday, March 15, 2007 16:04
> >> To: SAS-L@LISTSERV.UGA.EDU
> >> Subject: Subsetting data based on similar sounding names
> >>
> >> I have a dataset which has similar names
> >>
> >> data x;
> >> name="Anthony Tamar" ;output;
> >> name="Anthony V Tamar" ;output;
> >> name="paul V king" ;output;
> >> name ="paul king"; ;output;
> >> name="moon park";output;
> >> name="thomas li";output;
> >> run;
> >>
> >> i would like to spit out all names that appear to be the same i.e
> >> rows 1 through 4.
> >>
> >> Thanks as always,
> >> Sa
> >>
> >>
>
|