LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2002, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 21 Nov 2002 14:25:10 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Using contains with SQL
Comments: To: "Michael D. Boldin" <mboldin@MINDSPRING.COM>
Content-Type: text/plain; charset="iso-8859-1"

A simple illustration of a fairly general method appears on http://www.sconsig.com/

Search the SAS Tips and Techniques for 'Hermansen'. If you have further questions, I may be able to answer them.

I've looked at a fuzzy indexing function that is being developed, as I understand it, for one of the SAS V9.x's. Alternatively, the perl-style regular expression processor might work well if you have more specific patterns to match.

Sig

-----Original Message----- From: Michael D. Boldin [mailto:mboldin@MINDSPRING.COM] Sent: Wednesday, November 20, 2002 7:57 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Using contains with SQL

I have a dataset with about 100 company names and I want find close matches to these names in another set that has 20,000 company names and an associated ID number. I know that there are numerous data step ways to do this for a single company ie where cname contains "ABC Corp"; and I could use these ways in a macro loop but I thought there would be more elegant and flexible SQL way to handle all names in a single pass. Unfortunately, all of may efforts to use 'contains' 'like' or 'index()' in the same way as on a.cname = b.name gives an error or this note

66 proc sql; 67 create table temp2 68 as select * 69 from temp1 as b 70 left join cnames as a 71 on a.cname contains b.cname1; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. NOTE: Table WORK.TEMP2 created, with 5 rows and 15 columns.

and no joins are made (all comuns in temp2 are filled with missing values except cname1

any ideas or is this hopeless?


Back to: Top of message | Previous page | Main SAS-L page