LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2009, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Thu, 22 Jan 2009 10:22:02 -0500
Reply-To:   Don Henderson <donaldjhenderson@HOTMAIL.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
Comments:   RFC822 error: <W> MESSAGE-ID field duplicated. Last occurrence was retained.
From:   Don Henderson <donaldjhenderson@HOTMAIL.COM>
Subject:   Regex - Seven of Nine Matching
Content-Type:   text/plain; charset="us-ascii"

And no, this is not a Sci-Fi question ;-).

Am asking on behalf of a colleague who is doing a project involving matching based on Social Security numbers. The client wants to match two files using the Social Security number and wants to detect as possible matches cases where position by position 7 of the nine digits match. I am presuming that is because they want to account for the possibility of transposed digits (but don't know that for sure).

I've come up with what I call a brute force solution that uses the sum function in the where clause to count position by position how many characters match. Based on the sample below, it seems to work.

But I am wondering if any of the RegEx gurus on the list can comment on whether this can be done with RegEx.

TIA. My sample data and quick and dirty brute force code follows.

data one; input ssn $9.; datalines; 123456789 987654321 121212121 343434343 ; data two; input ssn $9.; datalines; 123456789 987645321 121212121 232323232 ; proc sql; create table matches as select a.ssn as ssn_one, b.ssn as ssn_two, case when (a.ssn ne b.ssn) then 'Not Exact' else 'Exact' end as MatchResults from one a, two b where sum(substr(a.ssn,1,1)=substr(b.ssn,1,1), substr(a.ssn,2,1)=substr(b.ssn,2,1), substr(a.ssn,3,1)=substr(b.ssn,3,1), substr(a.ssn,4,1)=substr(b.ssn,4,1), substr(a.ssn,5,1)=substr(b.ssn,5,1), substr(a.ssn,6,1)=substr(b.ssn,6,1), substr(a.ssn,7,1)=substr(b.ssn,7,1), substr(a.ssn,8,1)=substr(b.ssn,8,1), substr(a.ssn,9,1)=substr(b.ssn,9,1) ) ge 7; quit;

Don Henderson (301) 570-5530 (office) (301) 980-9027 (cell)

Back to: Top of message | Previous page | Main SAS-L page