LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 16 Mar 2007 10:39:53 -0400
Reply-To:     Dominc Mitchell <mitchell.d@VIDEOTRON.CA>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Dominc Mitchell <mitchell.d@VIDEOTRON.CA>
Subject:      Re: Subsetting data based on similar sounding names
Comments: To: souga soga <souga1234@GMAIL.COM>
In-Reply-To:  <9b5abea60703160709v5267d9f7m6bcefa7065c8538@mail.gmail.com>
Content-Type: text/plain; charset="us-ascii"

Hi,

Well here is a bit of code that does just that. Even though you do not need cleaning the name variable, it is probably easier to deal with the first and the last name and leave out the middle part. At least, this is what I think with the example that you showed us.

Regards,

Dominic.

data x; length name $100; name="Anthony Tamar" ;output; name="Anthony V Tamar" ;output; name="paul V king" ;output; name ="paul king"; ;output; name="moon park";output; name="thomas li";output; run;

data test; set x; name1=prxchange('s/^([a-z]+).*\s([a-z]+)/$1 $2/i',-1,name); last=trim(left(scan(name1,2))); first=trim(left(scan(name1,1))); proc sort; by last first; run;

data test1; set test; obs=_n_; run;

proc sql; create table duplicates as select a.name from test1 as a, test1 as b where soundex(a.last)=soundex(b.last) and soundex(a.first)=soundex(b.first) and a.obs ne b.obs; quit; proc print; run;

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of souga soga Sent: Friday, March 16, 2007 10:09 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Subsetting data based on similar sounding names

I apologize for not explaining the task clearly.I need the output to look like this:

Anthony Tamar Anthony V Tamar paul V king paul king

Essentially the task is to output all the names that sound similar to any other name in the dataset.

I am hoping that someone could help me with this.

Thanks again. Sa Polo

On 3/16/07, Gerhard Hellriegel <gerhard.hellriegel@t-online.de> wrote: > > Sorry, but could you explane what in "paul" sounds like "Anthony"?? Ok, my > english is bad, but that I don't see! > Gerhard > > > On Thu, 15 Mar 2007 16:51:27 -0400, souga soga <souga1234@GMAIL.COM> > wrote: > > >Thanks, but i need only the first 4 observations as they are similar in > the > >output set and they do not have to be cleaned. > > > >On 3/15/07, Dominc Mitchell <mitchell.d@videotron.ca> wrote: > >> > >> > >> > >> Hi, > >> > >> That would work with your example. It only uses the first and last > name. > >> But if your data set has more complex comparison (eg typos in names) > then > >> you would need something more elaborate. > >> > >> Dominic. > >> > >> data x; > >> length name $100; > >> name="Anthony Tamar" ;output; > >> name="Anthony V Tamar" ;output; > >> name="paul V king" ;output; > >> name ="paul king"; ;output; > >> name="moon park";output; > >> name="thomas li";output; > >> run; > >> > >> > >> data test; > >> set x; > >> name1=prxchange('s/^([a-z]+).*\s([a-z]+)/$1 $2/i',-1,name); > >> proc print; > >> run; > >> > >> > >> > >> -----Original Message----- > >> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > souga > >> soga > >> Sent: Thursday, March 15, 2007 16:04 > >> To: SAS-L@LISTSERV.UGA.EDU > >> Subject: Subsetting data based on similar sounding names > >> > >> I have a dataset which has similar names > >> > >> data x; > >> name="Anthony Tamar" ;output; > >> name="Anthony V Tamar" ;output; > >> name="paul V king" ;output; > >> name ="paul king"; ;output; > >> name="moon park";output; > >> name="thomas li";output; > >> run; > >> > >> i would like to spit out all names that appear to be the same i.e > >> rows 1 through 4. > >> > >> Thanks as always, > >> Sa > >> > >> >


Back to: Top of message | Previous page | Main SAS-L page