Date: Thu, 6 Aug 1998 11:39:16 -0400
Reply-To: RHOADSM1 <RHOADSM1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: RHOADSM1 <RHOADSM1@WESTAT.COM>
Subject: Re: Data categorization for fuzzy match?
Content-Type: text/plain; charset=US-ASCII
Would an off-the-shelf matching package be an option? This is a
common enough problem that canned packages do exist, and I expect that
they address your issues. I have heard good things about AutoMatch --
more info is available from www.matchware.com.
I am comparing methods of matching membership data based on several key
and demographic fields in very large datasets (100m + records).
I need to find ways of restricting the number of potential matches. I
am looking for ideas or references to:
- Hash or key numeric fields such that transposes and near-misses are
keyed with identical or similar values. Should be suitable for SSN.
- Hash or key text fields so that they may be searched readily for
similar words and/or text elements. Should be suitable for name and
Karsten M. Self (firstname.lastname@example.org)
What part of "Gestalt" don't you understand?