I think that's spot-on, Bruce, although when I read it, a couple of statements confused me. It could be me, but if not they'll likely confuse the OP.
So I hope you don't mind, but I've made a couple of alterations. If I've got it wrong, I'm sure you'll tell me.
1. Merge (via ADD FILES) the original data set with the new (cross-validation) data set for which you want predictions.
2. If the outcome variables exist in this new combined data set, compute a model using data for which there are outcome variables, but only using instances from the very original data set. Save the fitted values for all of the data (the entire merged file): you can now look at the individual fitted cases from the new data set (which was combined with the very original data set) as these are your predictions for that data set.
3. By using the cases where there are outcome variables in the original data set (and not from the other dataset), you ensure that only the original data are used for building the model;
but fitted values will be saved for *all* cases in the file, from which you can read the predicted values for the cases of interest.
I hope I'm not being a pedant. By writing this out I could check my understanding.
From: Bruce Weaver <email@example.com>
Sent: Friday, 21 May, 2010 12:31:24
Subject: Re: Predictions from glm modelling?
Robert Lundqvist-3 wrote:
> I have got this set of data where the goal is to build a glm model with
> actual weight and height as dependent variables and questionnaire data
> (stated weight and height, gender,...) as predictors. When the modelling
> is done, I want to use another dataset with the same variables to make
> predictions, using the model from the first step. Is there any neat way to
> do such predictions? It could naturally be done by setting up a linear
> combination using the coefficients from the modelling step in the new
> dataset, but it seems a bit awkward. Any suggestions for a simpler
> solution? A matrix approach should work, but it doesn't seem that easy to
> build the needed matrices from the SPSS output. Am I missing something
> obvious here?
1. Merge (via ADD FILES) the original data set with the new
(cross-validation) data set.
2. If the outcome variables exist in the new data set, compute copies of the
outcome variables, but only for the original data set.
3. Run your model using the copies of the outcome variables, and save the
fitted values from the model. By using the copies of the outcome variables,
you ensure that only the original data are used for building the model; but
fitted values will be saved for all cases in the file.
I should add that some procedures (e.g., REGRESSION) may allow you to choose
via a sub-command which cases to use in building the model; but I'm not sure
if all procedures have this. The method of setting the outcome variables to
missing for non-selected cases will always work though.
"When all else fails, RTFM."
NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
View this message in context: http://old.nabble.com/Questions-about-mixed-%281%29-tp28610254p28632623.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
For a list of commands to manage subscriptions, send the command