Date: Tue, 24 Jul 2007 08:00:42 -0500
Reply-To: "Peck, Jon" <peck@spss.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Peck, Jon" <peck@spss.com>
Subject: Re: Adding Cases
In-Reply-To: A<F6A60FEE75522640AD9B229EABC6CBAE010EFADA@mailer.tangozebra.com>
Content-Type: text/plain; charset="UTF-8"
So you want to add cases from two files based on matching up the variable labels instead of the names, if I understand you. The following little Python function does the job. Note that it assumes that the labels are unique. (I can email you the py file if the listserve mangles this too badly. Explanation below.
# match files based on variable labels
import spss, spssaux
def mergeByLabel(firstfile, secondfile):
"""Merge cases from two sav files matching up the variables by the variable labels
instead of the names. The result is based on the names in firstfile.
The labels are assumed to be unique so that the match is unambiguous."""
spssaux.OpenDataFile(secondfile)
labeldict2 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()])
spssaux.OpenDataFile(firstfile)
labeldict1 = dict([(item.VariableLabel, item.VariableName) for item in spssaux.VariableDict()])
renamein = []
renameout = []
for (label, name) in labeldict1.items():
renamein.append(labeldict2[label])
renameout.append(name)
renamesubcmd = "(" + " ".join(renamein) + "=" + " ".join(renameout) + ")"
cmd = r"""ADD FILES /FILE=*
/FILE='%(secondfile)s'
/RENAME = %(renamesubcmd)s.""" % locals()
spss.Submit([cmd, "EXECUTE."])
secondfile = "c:/temp/dataset2.sav"
firstfile = "c:/temp/dataset1.sav"
mergeByLabel(firstfile, secondfile)
spss.Submit("save outfile = 'c:/temp/firstPlusSecond.sav'")
This code, which would be enclosed in BEGIN/END PROGRAM first builds a dictionary of the second file where the keys are the variable labels and the values are the names.
Then it goes through the first file and for each variable it looks up the label in the second file and retrieves the corresponding variable name.
Finally it submits an ADD FILES command with a rename that changes the variable names in the second file so that they match up with the first one.
The last four lines just illustrate the usage.
It wouldn't be hard to generalized this so that variables with unmatched labels are ignored.
HTH,
Jon Peck
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Michael Pearmain
Sent: Tuesday, July 24, 2007 6:28 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: [SPSSX-L] Adding Cases
Hi All,
Not sure if this is a tricky question or if I'm not thinking about it
properly,
I have two datasets a 2006 and a 2007, all of the variables are called
n_"something numeric"
I need to merge cases from 2006 onto the 2007 dataset
The problem is the client hasn't been to clever and has used a different
order in this years, however the var labels are still the same, does
anyone have any good potential solutions? To match up the correct n_
from 2006 with the correct one for 2007?
My initial thoughts are using python to inspect the Var labels and then
do something clever, but didn't know if someone had a nicer solution?
Cheers for your time
Mike