LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2008, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 30 Aug 2008 15:34:49 -0400
Reply-To:     msz03@albany.edu
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mike Zdeb <msz03@ALBANY.EDU>
Subject:      Re: Sounds Like...
Content-Type: text/plain;charset=iso-8859-1

hi ... in case anyone would like to try a SOUNDEX alternative, NYSIIS coding is another phonetic based coding scheme

there is no NYSIIS function in SAS, but a NYSIIS macro is available

http://www.albany.edu/~msz03/basug/nysiis.zip

I originally found the macro on the SAMHSA web site

http://csat.samhsa.gov/

it seems to have disappeared from that site, but it's referenced in both of the following

Linking Client Records from Substance Abuse, Mental Health and Medicaid State Agencies http://csat.samhsa.gov/IDBSE/idb/modules/linking/material/linking.pdf

Record Linking 102 http://csat.samhsa.gov/IDBSE/idb/modules/linking/presentations/RecordLinking102_alt.ppt

here's a SOUNDEX/NYSIIS comparison (remember the original posting compared GREG and GREGORY ... they are still different with the NYSIIS CODE)

data names; input name : $15. @@; soundex=soundex(name); %nysiis(name,nysiis_code); datalines; SMITH SMYTHE O'HENRY OHENRY ZDEB ZDYB SOO SU CLINTON CLINTEN WASHINGTON WOSHINGTEN TERRI TERRY JACKSON JAKSON SCHMIDT SCHMIT KALE CALE GREG GREGORY ; run;

nysiis_ name soundex code SMITH S53 SNATH SMYTHE S53 SNAT O'HENRY O56 O'ANR OHENRY O56 OANR ZDEB Z31 ZDAB ZDYB Z31 ZDAB SOO S S SU S S CLINTON C4535 CLANTAN CLINTEN C4535 CLANTAN WASHINGTON W25235 WASANGTAN WOSHINGTEN W25235 WASANGTAN TERRI T6 TAR TERRY T6 TAR JACKSON J25 JACSAN JAKSON J25 JACSAN SCHMIDT S53 SNAD SCHMIT S53 SNAT KALE K4 CAL CALE C4 CAL GREG G62 GRAG GREGORY G626 GRAGAR

if you use either SOUNDEX or NYSIIS coding, it's nice to know the coding rules

http://en.wikipedia.org/wiki/Soundex http://en.wikipedia.org/wiki/NYSIIS

(Wikipedia does actually have them correct)

ps re Paul Choate's posting ... great to have both "DDS" and "Extraction" in the same signature, "DDS Data Extraction" ... so, is there also a "DDS Data Root Canal"

-- Mike Zdeb U@Albany School of Public Health One University Place Rensselaer, New York 12144-3456 P/518-402-6479 F/630-604-1475

> My guess would be on the second the SQL processor computes the soundex > scores and then matches, hence a optimized join, while the first matches > and then computes the soundex scores, hence a Cartesian join. > > As expected the second runs much more quickly. Interesting point Mike. > > Paul Choate > DDS Data Extraction > (916) 654-2160 > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Mike Zdeb > Sent: Friday, August 29, 2008 11:33 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: Re: Sounds Like... > > hi ... so if the SOUNDS LIKE operator compares the SOUNDEX of variables, > I don't understand why > the first use of PROC SQL below produces the NOTE > > NOTE: The execution of this query involves performing one or more > Cartesian product joins that can > not be optimized. > > while the second on does not > > can any of you SQL folk explain? thanks > > > data a; > input name1 : $10. @@; > datalines; > SMITH SMYTHE O'HENRY OHENRY ZDEB ZDYB SOO SU CLINTON CLINTEN > ; > run; > > data b; > set a (rename=(name1=name2)); > run; > > proc sql; > create table matches as > select * > from a,b > where name1 =* name2; > quit; > > proc sql; > create table matches as > select * > from a,b > where soundex(name1) = soundex(name2); > quit; > > > -- > Mike Zdeb > U@Albany School of Public Health > One University Place > Rensselaer, New York 12144-3456 > P/518-402-6479 F/630-604-1475 > >


Back to: Top of message | Previous page | Main SAS-L page