LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2009, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 16 Dec 2009 16:56:01 -0500
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: matching names + cleaning data
Comments:   To: ChrisG <chris.godlewski@GMAIL.COM>
In-Reply-To:   <bf1ff274-5b4c-4b04-bac2-e4c0dcf33970@p19g2000vbq.googlegroups.com>
Content-Type:   text/plain; charset="us-ascii"

CG: You really need a mnemonic Bank ID that can be mapped to a standard bank name. We have to do this type of standardization often. It never works perfectly in new situations.

I'd experiment first with different length constraints on the SOUNDEX() function. Once you have a relation of soundex values and standard bank names, then you may be able to use one of the similarity functions (edit distance or rearrangement cost [e.g., SPEDIS()]) to check to see if the relation contains a close enough match to a standard name. If not, it would have to be reconciled off-line.

Here's a rough cut of how you might create a relation of standardized bank names and soundex values: data banks; infile cards pad truncover; input @1 loanid $char8. @10 class $4. @16 Bank_Name $char39. @55 Type $char20. ; datalines; BF486261 Corp Credit Lyonnais Mandated Arranger BF486265 Corp JP Morgan Bookrunner BF486265 Corp JP Morgan Mandated Arranger BF486265 Corp KFW Kreditanstalt Fur Wiederaufbau Mandated Arranger BF486269 Corp Societe Generale Agent(s) BF486269 Corp KFW Kreditanstalt Fur Wiederaufbau Arranger(s) BF486269 Corp Societe Generale Bookrunner BF486269 Corp Societe Generale Mandated Arranger BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Agent(s) BF486273 Corp Landesbank Baden-Wuerttemberg Arranger(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Bookrunner BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Lender(s) BF486273 Corp Landesbank Baden-Wuerttemberg Lender(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Mandated Arranger BF497955 Corp Societe Generale Bookrunner BF497955 Corp Societe Generale Mandated Arranger BF500064 Corp Royal Bank of Scotland Agent(s) BF500064 Corp Royal Bank of Scotland Bookrunner BF500064 Corp BNP Paribas Group Lender(s) BF500064 Corp HSBC Financial Products (France) SNC Mandated Arranger BF500064 Corp Royal Bank of Scotland Mandated Arranger BF500797 Corp ING Bank NV Bookrunner BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF501049 Corp Credit Lyonnais Agent(s) BF501049 Corp CIT Group Lender(s) BF501049 Corp Credit Lyonnais Lender(s) BF501049 Corp ING Groep NV Lender(s) BF501049 Corp CIT Group Mandated Arranger BF501049 Corp Credit Lyonnais Mandated Arranger BF501049 Corp ING Groep NV Mandated Arranger LN059143 Corp Banca Commerciale Italiana Lender(s) LN059143 Corp Banca di Roma (France) Lender(s) LN059143 Corp Banco Bilbao Vizcaya Argentaria Lender(s) LN059143 Corp Banco Espanol de Credito Lender(s) LN059143 Corp Bank One NA Lender(s) LN059143 Corp Barclays Bank PLC (US) Lender(s) LN059143 Corp Deutsche Bank AG Lender(s) LN059143 Corp Dexia Banque SA Lender(s) LN059143 Corp ING Bank NV/United States Lender(s) LN059143 Corp Rabobank International Lender(s) LN059143 Corp Raiffeisen Zentralbank Oesterreich AG Lender(s) LN059143 Corp Sanpaolo IMI SpA Lender(s) LN059143 Corp Unicredito Italiano Lender(s) LN059143 Corp Deutsche Bank AG Mandated Arranger LN059143 Corp IntesaBci SpA Mandated Arranger ; run; proc sql; create table BankStan as select distinct Bank_Name,substr(soundex(Bank_Name),1,8) as stanBankName from banks group by stanBankName having Bank_Name=min(Bank_Name) ; quit;

The subsequent comparisons of observed bank names to standardized bank names using SPEDIS() or other similarity functions would come next, and would return the standardized name. I am assuming here that the bank name doesn't change with Type. S

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of ChrisG Sent: Wednesday, December 16, 2009 9:40 AM To: SAS-L@LISTSERV.UGA.EDU Subject: matching names + cleaning data

Hi guys

Thank you in advance for any clue, advice and so on ...

Here is my big issue

I have data on the composition of bank loan syndicates (i.e. several banks make a loan together to a borrower) I have a variable LOANID to identify the loan, a variable BANK_NAME for the name of the bank, and TYPE to classify the "rank" of the bank in the syndicate I have copy paste a portion of the data below :

loanid Bank_Name Type BF486261 Corp Credit Lyonnais Mandated Arranger BF486265 Corp JP Morgan Bookrunner BF486265 Corp JP Morgan Mandated Arranger BF486265 Corp KFW Kreditanstalt Fur Wiederaufbau Mandated Arranger BF486269 Corp Societe Generale Agent(s) BF486269 Corp KFW Kreditanstalt Fur Wiederaufbau Arranger(s) BF486269 Corp Societe Generale Bookrunner BF486269 Corp Societe Generale Mandated Arranger BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Agent(s) BF486273 Corp Landesbank Baden-Wuerttemberg Arranger(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Bookrunner BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Lender(s) BF486273 Corp Landesbank Baden-Wuerttemberg Lender(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Mandated Arranger BF497955 Corp Societe Generale Bookrunner BF497955 Corp Societe Generale Mandated Arranger BF500064 Corp Royal Bank of Scotland Agent(s) BF500064 Corp Royal Bank of Scotland Bookrunner BF500064 Corp BNP Paribas Group Lender(s) BF500064 Corp HSBC Financial Products (France) SNC Mandated Arranger BF500064 Corp Royal Bank of Scotland Mandated Arranger BF500797 Corp ING Bank NV Bookrunner BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF500797 Corp ING Bank NV Mandated Arranger BF501049 Corp Credit Lyonnais Agent(s) BF501049 Corp CIT Group Lender(s) BF501049 Corp Credit Lyonnais Lender(s) BF501049 Corp ING Groep NV Lender(s) BF501049 Corp CIT Group Mandated Arranger BF501049 Corp Credit Lyonnais Mandated Arranger BF501049 Corp ING Groep NV Mandated Arranger LN059143 Corp Banca Commerciale Italiana Lender(s) LN059143 Corp Banca di Roma (France) Lender(s) LN059143 Corp Banco Bilbao Vizcaya Argentaria Lender(s) LN059143 Corp Banco Espanol de Credito Lender(s) LN059143 Corp Bank One NA Lender(s) LN059143 Corp Barclays Bank PLC (US) Lender(s) LN059143 Corp Deutsche Bank AG Lender(s) LN059143 Corp Dexia Banque SA Lender(s) LN059143 Corp ING Bank NV/United States Lender(s) LN059143 Corp Rabobank International Lender(s) LN059143 Corp Raiffeisen Zentralbank Oesterreich AG Lender(s) LN059143 Corp Sanpaolo IMI SpA Lender(s) LN059143 Corp Unicredito Italiano Lender(s) LN059143 Corp Deutsche Bank AG Mandated Arranger LN059143 Corp IntesaBci SpA Mandated Arranger

I need to do 2 things (not sure in what order to make things less complicated / more efficient)

1. i need to harmonize the names of the banks as you can see in the data above, for instance i have sometimes ING Bank NV and sometimes ING Bank NV/United States I want to have always let say ING Bank NV or ING or whatever but one name for any possible "variation" of a company which belongs to the ING Group I tried some proc transpose then some compged and so on but honestly i have the impression im going nowhere...

let say for instance that at the end of that step i would like to get something like that (with unique and simple names of banks):

loanid Bank_Name Type BF486261 Corp Credit Lyonnais Mandated Arranger BF486265 Corp JP Morgan Bookrunner BF486265 Corp JP Morgan Mandated Arranger BF486265 Corp KFW Kreditanstalt Mandated Arranger BF486269 Corp Societe Generale Agent(s) BF486269 Corp KFW Kreditanstalt Arranger(s) BF486269 Corp Societe Generale Bookrunner BF486269 Corp Societe Generale Mandated Arranger BF486273 Corp Bank of Tokyo-Mitsubishi Agent(s) BF486273 Corp Landesbank Baden-Wuerttemberg Arranger(s) BF486273 Corp Bank of Tokyo-Mitsubishi Bookrunner BF486273 Corp Bank of Tokyo-Mitsubishi Lender(s) BF486273 Corp Landesbank Baden-Wuerttemberg Lender(s) BF486273 Corp Bank of Tokyo-Mitsubishi Mandated Arranger BF497955 Corp Societe Generale Bookrunner BF497955 Corp Societe Generale Mandated Arranger BF500064 Corp Royal Bank of Scotland Agent(s) BF500064 Corp Royal Bank of Scotland Bookrunner BF500064 Corp BNP Paribas Lender(s) BF500064 Corp HSBC Mandated Arranger BF500064 Corp Royal Bank of Scotland Mandated Arranger BF500797 Corp ING Bank Bookrunner BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF500797 Corp ING Bank Mandated Arranger BF501049 Corp Credit Lyonnais Agent(s) BF501049 Corp CIT Group Lender(s) BF501049 Corp Credit Lyonnais Lender(s) BF501049 Corp ING Bank Lender(s) BF501049 Corp CIT Group Mandated Arranger BF501049 Corp Credit Lyonnais Mandated Arranger BF501049 Corp ING Bank Mandated Arranger LN059143 Corp Banca Commerciale Italiana Lender(s) LN059143 Corp Banca di Roma Lender(s) LN059143 Corp Banco Bilbao Vizcaya Argentaria Lender(s) LN059143 Corp Banco Espanol de Credito Lender(s) LN059143 Corp Bank One Lender(s) LN059143 Corp Barclays Bank Lender(s) LN059143 Corp Deutsche Bank Lender(s) LN059143 Corp Dexia Banque Lender(s) LN059143 Corp ING Bank Lender(s) LN059143 Corp Rabobank International Lender(s) LN059143 Corp Raiffeisen Zentralbank Lender(s) LN059143 Corp Sanpaolo IMI Lender(s) LN059143 Corp Unicredito Italiano Lender(s) LN059143 Corp Deutsche Bank Mandated Arranger LN059143 Corp IntesaBci SpA Mandated Arranger

so if anyone have a clue, thanks a lot in advance !

2. i need to clean the data in the following way as you can see, for each loan, the name of a bank can appear several times let me pick that one : BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Agent(s) BF486273 Corp Landesbank Baden-Wuerttemberg Arranger(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Bookrunner BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Lender(s) BF486273 Corp Landesbank Baden-Wuerttemberg Lender(s) BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Mandated Arranger

here Bank of Tokyo-Mitsubishi UFJ Ltd appears 4 times because it acts in different roles in that particular bank syndicate (as Agent, Bookrunner, Lender and Mandated Arranger) I need to keep only one occurrence of this bank for that loan as the hierarchy is Agent > Mandated Arranger > Lender, if this bank is agent, i don't care that it is arranger or lender i want to keep it just once as agent same applies to the second bank of that loan Landesbank Baden-Wuerttemberg appears 2 times, once as arranger and once as lender again as arranger > lender, i just want to keep it as arranger to get at the end something like that

BF486273 Corp Bank of Tokyo-Mitsubishi UFJ Ltd Agent(s) BF486273 Corp Landesbank Baden-Wuerttemberg Arranger(s)

again thanks in advance for any advice / clue

Best regards CG


Back to: Top of message | Previous page | Main SAS-L page