Date: Wed, 16 Jan 2008 13:06:14 -0500
Reply-To: Gene Maguin <emaguin@buffalo.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Gene Maguin <emaguin@buffalo.edu>
Subject: Re: Restructuring a fuzzy matched data set
In-Reply-To: <200801152037.m0FH772p032099@mailgw.cc.uga.edu>
Content-Type: text/plain; charset="us-ascii"
Anton,
>>I am trying to unduplicate a file using a fuzzy match approach via the CDC
software LinkPlus. The issue I'm having is trying to restructure the data
set so all Ids that match appear on one row in the. For example LinkPlus
spits out a data set (actually a report) that has all pairs of matches:
Match# ID Name
1 27 Carl
1 42 Carl
2 27 Carl
2 53 Carl
3 42 Carl
3 53 Carl
4 18 Sue
4 99 Sue
I'd love to have a data set that looks like this:
Match1 Match2 Match3
27 42 53
18 99
I think I'd work the problem this way.
Sort cases by name id.
Compute dups=0.
If (id eq lag(id)) dups=lag(dups)+1.
Select if (dups eq 0).
* Now you have a set of unique ids within a name value.
* Use Casestovars to build a single record.
Casestovars /id=name.
I really never use casestovars so you may need to fiddle around with it a
bit but I think you will get records consisting of
Name Match1 Match2 ... MatchN ID1 ID2 ... IDN
You will be interested in the variables ID1 thru IDN.
Gene Maguin
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD