LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2008)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 18 Nov 2008 09:10:08 -0600
Reply-To:     "Peck, Jon" <peck@spss.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Peck, Jon" <peck@spss.com>
Subject:      Re: Probability matching of two files
Comments: To: Muir Houston <m.houston@educ.gla.ac.uk>
In-Reply-To:  A<9399C524728DDD45B1FBC5FE8CF4153F1F25AD@exchange-be3.centre.ad.gla.ac.uk>
Content-Type: text/plain; charset="us-ascii"

There is no built-in way to do probability matching, but there is an extension command (usable with version 16 or 17) that will do case-control exact matching. You can specify a set of variables that must match exactly, and it will sample randomly for one or more cases from those that match exactly on the specified variables. The command is CASECTRL, and it can be downloaded from SPSS Developer Central. It requires the Python programmability plug-in, but no knowledge of Python is needed to use it.

If an exact match can't be found, the matching case will be, natch, missing. Sometimes collapsing fine-grained variables into slightly broader categories is sufficient for this.

HTH, Jon Peck

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Muir Houston Sent: Tuesday, November 18, 2008 3:19 AM To: SPSSX-L@LISTSERV.UGA.EDU Subject: [SPSSX-L] Probability matching of two files

Hi all, I have two datasets - one a baseline of school pupils containing the usual suspects (dob, gender, post code (zip in US), school name plus a motivational inventory and a number of items which ask about career influence and future plans.

The second dataset contains dob, gender, postcode and school and was collected at various events or activities related to a career in the health sector from pupils drawn for the first sample.

What I would like to do, is match respondents from the second dataset, to the first on the basis of probability matching - I think I need to create a vector of log odds relating to the probability of each component of a record (my variables noted above - gender, dob, postcode and school name) being a match. SO, birth date may match in a comparison of records from each dataset, this would provide one score or weight in the vector - the other variables (gender, postcode and school name) would also be scored as being a probability of match or not match - so a vector of all four variables would be formed

Any ideas how to go about this? My command of syntax, although evolving is not up to this yet!

Or references?

Thanks Muir

Dr M. Houston DACE University of Glasgow 0141-330-4699

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Back to: Top of message | Previous page | Main SPSSX-L page