Date: Mon, 13 Sep 2010 17:23:31 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Merge problems
In-Reply-To: <040101cb5352$0614ef10$123ecd30$@com>
Content-Type: text/html; charset="us-ascii"
<html>
<body>
At 10:43 AM 9/13/2010, Mike Pritchard wrote:<br><br>
<blockquote type=cite class=cite cite="">There is a subset of variables
in the latest working file (that has been modified through
coding/recoding and labeling) that is also in the other file. The
other file is an earlier version with a<br>
few additional variables that were dropped inadvertently from the working
file when some operations - primarily SAVE with a different order - were
done. So I needed to recover these variables.</blockquote><br>
To start with (and it's not what you asked), you have no 'BY' clause in
either of you <tt><font size=2>MATCH FILES</font></tt> commands. From the
<i>Command Syntax Reference</i>,<br><br>
<blockquote type=cite class=cite cite=""><font size=1>..
</font><font face="Courier New, Courier" size=1>If
</font><font face="Courier, Courier" size=1>BY
</font><font face="Courier New, Courier" size=1>is not used, the program
performs a parallel (sequential) match, combining the first case from
each file, then the second case from each file, and so on, without regard
to any identifying values that may be present.</font></blockquote><br>
So, one extra case, or one missing one, from either file, and your result
can have values for some cases that belong with other cases altogether.
Do you have any set of variables that can form a record key within your
files? If so, use them.<br>
<br>
But as to what you asked about, your syntax<br><br>
<tt><font size=2>MATCH FILES <br>
/FILE=*<br>
/FILE='DataSet10'.<br><br>
</font></tt>works because (<i>CSR</i> again)<br>
<blockquote type=cite class=cite cite="">
<font face="Courier New, Courier" size=1>If the same variable name is
used in more than one input file, data are taken from the file specified
first. Dictionary information is taken from the first file containing
value labels, missing values, or a variable label for the common
variable. If the first file has no such information,
</font><font face="Courier, Courier" size=1>MATCH FILES
</font><font face="Courier New, Courier" size=1>checks the second file,
and so on, seeking dictionary information.</font></blockquote><br>
So, for all the variables that appear in both files, you get the value
from the active file ('<tt><font size=2>FILE=*</font></tt>'). Fine, if
that's what you want, but make sure it <i>is</i> what you want.<br><br>
(And, by the way, this syntax will blow up if any variables from the two
files have the same name but are type-incompatible: that is, one numeric
and one string, or two strings of different lengths. But your files don't
have that problem.)<br><br>
Now, as you write, the GUI generates syntax,<br>
<blockquote type=cite class=cite cite=""><tt><font size=2>MATCH FILES
/FILE=*<br>
/FILE='DataSet10'<br>
/RENAME (var1 var2 ... var1442 = d0 d1 ... d1442)<br>
/DROP = d0 d1 ... d1442.</font></tt></blockquote>That's because
the GUI's code-generating logic takes the premise that all variables
(except key variables) are actually different between the files, and if
any do have the same name, it's a conflict. So the GUI generates this
awkward code to get rid of all variables in
<tt><font size=2>DataSet10</font></tt> that also occur in the active
file.<br><br>
<blockquote type=cite class=cite cite="">If I run the merge from the GUI
I get a bunch of errors about temporary variables. The errors are all
about undefined variable names.</blockquote><br>
You'd have to give us a few examples of what variable names are
'undefined'. 1,442 variables is a very long RENAME list, but there's no
documented limit of the number of variables to RENAME. It looks like
there could be 1,442 source variables and 1,44<u>3</u> target variables;
might that be true? Although the GUI's code-generator should be smart
enough not to let that happen.<br><br>
Anyway, go ahead and use your simple syntax. However, if I were doing
this, I'd load the old file, keeping only key variables and the 'lost'
variables I wanted to recover; sort both files by the set of key
variables; and use something like (untested),<br><br>
<tt><font size=2>MATCH FILES<br>
/FILE=<newfile><br>
/FILE=<oldfile><br>
/BY <keyvars><br><br>
</font></tt><blockquote type=cite class=cite cite="">The other file is an
earlier version with a few variables that were dropped inadvertently file
when some operations - primarily SAVE with a different order - were
done.</blockquote><br>
One moral: always end a KEEP list with the keyword 'ALL', unless you're
trying to drop some variables. That way, any variables you forget to name
will still be there, at the end of the file.</body>
<br>
</html>
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|