Date: Tue, 1 May 2012 15:02:46 +0000
Reply-To: "Poes, Matthew Joseph" <mpoes@UILLINOIS.EDU>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Poes, Matthew Joseph" <mpoes@UILLINOIS.EDU>
Subject: Re: Multivariate Multilevel Mixed-Effects Model
Content-Type: text/plain; charset="iso-8859-1"
Below is my Knee Jerk Responses to these issues:
Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of torvon
Sent: Tuesday, May 01, 2012 6:05 AM
Subject: Multivariate Multilevel Mixed-Effects Model
In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here.
*Design & sample:*
- 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day).
*MP: You could use these as linear continuous variables, but it seems like this is not necessarily a tenable assumption. You can also treat them as a set of dummy coded categorical variables, and this may introduce less biased estimates.
*MP: When you say response variable, you mean this is your DV? If so, treating them as separate means running 9 separate analysis, this is a bit crazy. On top of that, error in single item response questions used as DV's is going to be high and create problems for the analysis. I would recommend not doing this personally. I think SEM would be a better approach if this is what you want to do.
- 5 measurement points
- 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point.
- every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less)
- 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender)
*MP: Are these covariates important explanatory variables, or true covariates? If they are important explanatory variables, you may want to develop a set of hypothesis on how each is expected to impact the outcome, and what this function would look like. Besides being important for interpretation, this may help reduce the modeling complexity some.
*MP: Are any of these expected to interact with any others, or with any other factors in the model?
- time-varying covariates (e.g. workload between this and last measurement
*MP: That's fine, just make sure you put them in the right location in your model.
- missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout.
It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs.
*MP: You have a bunch of issues to work on here. First, NMAR means you need to add to your model the predictive reason for missing data. In this case, stress on the previous score. I might consider a probability score for each time point that the next time point will be missing. I'd also consider adding a dummy variable for those missing variables. I've seen various large scale models of psychological distress with large amounts of missing data in the end time points make it to publication, in fact, this was a specialty for a past instructor of mine. HLM itself will handle this fine, but the accuracy and generalizability of the estimates in the time points with more missing data decreases.
*MP: I'd also look into some of the other approaches to NMAR data for this. I'm not an expert, but I understand pattern-mixture modeling to be a common good approach.
*MP: Little suggests the use of Last observation carried forward as a pattern mixture model. This would seem like the best approach to me, though I'm really not an expert on this.
- Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total).
*MP: And they probably do this for a reason. Like I said, there is an implicit idea to summative scales that they have reduced error in measuring the construct that the set of item's collectively measures, which no single item can measure accurately.
*MP: Have you considered a factor analysis of items to see if they can load as you intend? Even if they don't, what about combining the items into summative scales as shown above? Doing 9 separate models is just nuts, and the accuracy of the estimates will not be very good.
(1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further.
(I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.)
*MP: Type 1 error would be reduced by correctly using a correction for the fact that you are running so many tests.
*MP: The correlation amongst factors would be fixed if you used a sum scale as I've suggested, but the correlation isn't so high as to matter much either.
*MP: To fully account for things as you seem to want to, the only thing that makes sense to me is a fully specified SEM, which will account for measurement error, correlation amongst terms, etc.
(2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate).
I'd be happy about any kind of input how I could try to answer my research question.
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-Multilevel-Mixed-Effects-Model-tp5677844.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
For a list of commands to manage subscriptions, send the command