Date: Tue, 18 Nov 2008 09:10:08 -0600
Reply-To: "Peck, Jon" <peck@spss.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Peck, Jon" <peck@spss.com>
Subject: Re: Probability matching of two files
In-Reply-To: A<9399C524728DDD45B1FBC5FE8CF4153F1F25AD@exchange-be3.centre.ad.gla.ac.uk>
Content-Type: text/plain; charset="us-ascii"
There is no built-in way to do probability matching, but there is an extension command (usable with version 16 or 17) that will do case-control exact matching. You can specify a set of variables that must match exactly, and it will sample randomly for one or more cases from those that match exactly on the specified variables. The command is CASECTRL, and it can be downloaded from SPSS Developer Central. It requires the Python programmability plug-in, but no knowledge of Python is needed to use it.
If an exact match can't be found, the matching case will be, natch, missing. Sometimes collapsing fine-grained variables into slightly broader categories is sufficient for this.
HTH,
Jon Peck
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Muir Houston
Sent: Tuesday, November 18, 2008 3:19 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: [SPSSX-L] Probability matching of two files
Hi all,
I have two datasets - one a baseline of school pupils containing the
usual suspects (dob, gender, post code (zip in US), school name plus a
motivational inventory and a number of items which ask about career
influence and future plans.
The second dataset contains dob, gender, postcode and school and was
collected at various events or activities related to a career in the
health sector from pupils drawn for the first sample.
What I would like to do, is match respondents from the second dataset,
to the first on the basis of probability matching - I think I need to
create a vector of log odds relating to the probability of each
component of a record (my variables noted above - gender, dob, postcode
and school name) being a match. SO, birth date may match in a comparison
of records from each dataset, this would provide one score or weight in
the vector - the other variables (gender, postcode and school name)
would also be scored as being a probability of match or not match - so a
vector of all four variables would be formed
Any ideas how to go about this? My command of syntax, although evolving
is not up to this yet!
Or references?
Thanks
Muir
Dr M. Houston
DACE
University of Glasgow
0141-330-4699
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|