Date: Thu, 22 Jan 2009 10:22:02 -0500 Don Henderson "SAS(r) Discussion" RFC822 error: MESSAGE-ID field duplicated. Last occurrence was retained. Don Henderson Regex - Seven of Nine Matching text/plain; charset="us-ascii"

And no, this is not a Sci-Fi question ;-).

Am asking on behalf of a colleague who is doing a project involving matching based on Social Security numbers. The client wants to match two files using the Social Security number and wants to detect as possible matches cases where position by position 7 of the nine digits match. I am presuming that is because they want to account for the possibility of transposed digits (but don't know that for sure).

I've come up with what I call a brute force solution that uses the sum function in the where clause to count position by position how many characters match. Based on the sample below, it seems to work.

But I am wondering if any of the RegEx gurus on the list can comment on whether this can be done with RegEx.

TIA. My sample data and quick and dirty brute force code follows.

data one; input ssn \$9.; datalines; 123456789 987654321 121212121 343434343 ; data two; input ssn \$9.; datalines; 123456789 987645321 121212121 232323232 ; proc sql; create table matches as select a.ssn as ssn_one, b.ssn as ssn_two, case when (a.ssn ne b.ssn) then 'Not Exact' else 'Exact' end as MatchResults from one a, two b where sum(substr(a.ssn,1,1)=substr(b.ssn,1,1), substr(a.ssn,2,1)=substr(b.ssn,2,1), substr(a.ssn,3,1)=substr(b.ssn,3,1), substr(a.ssn,4,1)=substr(b.ssn,4,1), substr(a.ssn,5,1)=substr(b.ssn,5,1), substr(a.ssn,6,1)=substr(b.ssn,6,1), substr(a.ssn,7,1)=substr(b.ssn,7,1), substr(a.ssn,8,1)=substr(b.ssn,8,1), substr(a.ssn,9,1)=substr(b.ssn,9,1) ) ge 7; quit;

Don Henderson Don.Henderson@hcsbi.com http://www.hcsbi.com (301) 570-5530 (office) (301) 980-9027 (cell)

Back to: Top of message | Previous page | Main SAS-L page