Date: Thu, 2 Dec 2010 06:23:05 -0800
Reply-To: art297 <atabachneck@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: art297 <atabachneck@GMAIL.COM>
Subject: Re: A Match question. Thanks.
Content-Type: text/plain; charset=ISO-8859-1
CC,
Depending upon the number of acceptable irregularities you have, you
might be able to get away with something like:
proc sql noprint;
create table want as
select a.company
from B as b
left join A as a
on index(a.company,trim(b.company)||' ') gt 0
or index(a.company,trim(b.company)||'-') gt 0
;
quit;
Art
-----------
On Dec 2, 2:27 am, CC <chchanghe...@gmail.com> wrote:
> Hello,
>
> I have two data sets. Here is A data:
>
> 1ST QUALITY CYLINDERS INC.
> 21ST CENTURY CO INC
> 3M
> 3M CO
> 3M CO INC
> 4FRONT ENGINEERED SOLUTIONS
> 50% DAIRY FARMERS OF AMERICA 50% PRAIRIE FARM
> 5667503
> 606072130
> 63515605
> 999 INC.
> A & L LABS. INC.
> A & W READY MIX CONCRETE
> A MATRIX METALS CO LLC
> A WHITMAN CO.
> A&A GROUP HOLDINGS INC
> A&A GROUP HOLDINGS INC.
> A-1 PRODUCTION INC.
> A. B. CARTER INC.
> A. B. DICK CO.
> A. G. SIMPSON AUTOMOTIVE INC.
> A. H. HARRIS & SONS
> A. J. DAW PRINTING INK CO.
> A. J. MURPHY CO. INC.
> A. M. TODD CO.
> A. O. SMITH CORP
> A. O. SMITH CORP.
> A. O. SMITH ELECTRICAL PRODUCTS
> A. P. GREEN INDUSTRIES INC.
> A. ROTONDO & SONS INC.
> A. SCHONBEK & CO. LTD.
> A. SCHULMAN INC
> A. SCHULMAN INC.
> A. T. MASSEY COAL CO. INC.
> A. TENENBAUM CO INC.
> A.M. CASTLE
> A.O. SMITH CORP
> A.P. GREEN REFRACTORIES
> A.P.I.
> A.R.E. ACCESSORIES LLC.
> A.S. AMERICA INC.
> A.SCHULMAN INC.
> A/P IND.
> AAF - MCQUAY INC.
> AAF- MCQUAY
> AAF-MCQUAY INC
> AAF-MCQUAY INC.
> AAII (FORMERLY PEMCO)
> AALBERTS INDUSTRIES
> AAON INC
> AAR CORP
> AAR CORP.
> AAR MANUFACTURING INC.
> AARHUSKARLSHAMN USA INC
> AARQUE STEEL CORP.
> AAVID THERMAL TECHNOLOGIES INC.
> AB MAURI FOOD INC.
> AB VOLVO
> ABB HOLDINGS INC
> ABB HOLDINGS INC.
> ABB INC
> ABB INC.
> ABB LTD
> ABB LTD.
> ABB POWER T&D CO INC
> ABB POWER T&D CO. INC.
> ABBOTT LABORATORIES
> ABBOTT LABORATORIES INC.
> ABBOTT LABS.
> ABC COMPOUNDING INC.
>
> I want to keep observations in A data as long as a part of its name
> showing in the B data. Here is the B data:
>
> ABBOTT
> AAF
> AALBERTS
> ABB
> AAII
> A.P.I.
> A. TENENBAUM
>
> The resulting data I want will be looked like this:
>
> ABBOTT ABBOTT LABORATORIES
> ABBOTT ABBOTT LABORATORIES INC.
> ABBOTT ABBOTT LABS.
> AAF AAF - MCQUAY INC.
> AAF AAF- MCQUAY
> AAF AAF-MCQUAY INC
> AAF AAF-MCQUAY INC.
> AALBERTS AALBERTS INDUSTRIES
> ABB ABB HOLDINGS INC
> ABB ABB HOLDINGS INC.
> ABB ABB INC
> ABB ABB INC.
> ABB ABB LTD
> ABB ABB LTD.
> ABB ABB POWER T&D CO INC
> ABB ABB POWER T&D CO. INC.
> AAII AAII (FORMERLY PEMCO)
> A.P.I. A.P.I.
> A. TENENBAUM A. TENENBAUM CO INC.
>
> *******************************************************
>
> Is there anyone could give me a hand on this issue? I tried the
> spedis function but it is not working well since some matchings even
> their score are less than 30, they are still not a right match.
>
> For instance, if I checked
> min(spedis(ABB,ABBOTT LABORATORIES INC.,spedis(ABBOTT LABORATORIES
> INC.,ABB))
>
> and
>
> min(spedis(ABB,ABB POWER T&D CO. INC.,spedis(ABB POWER T&D CO.
> INC.,ABB))
>
> They will all give me a score of 30. So, I feel it is safer to use
> contains or index in this case.
>
> Could anyone give me some help on it? Thanks in advance!
|