Date: Thu, 9 Sep 2010 09:55:39 -0500
Reply-To: Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Subject: Re: ESTIMATE statement in Proc MIXED
Content-Type: text/plain; charset="us-ascii"
In surveys, we often think that those who return the surveys likely
respond differently than those who do. As such, we try to correct for
this non-response bias. if we can detect trends in the response rates
associated with time of return, we can model the missing data
(non-returns). I wonder if you couldn't do something similar here,
where the missing data are imputed based on a model of what you do
observe. For example, maybe patients stop coming in once they get well,
so missing data reflect the resolved cases. Or, maybe they get too sick
to come in, so they reflect the terminal cases. In either case, looking
at the data to see if some model explains the missingness may help you
model the missingness.
I'm no expert in this field, but I recently saw a talk where the authors
used the idea of having model hyperparameters within a Bayesian setting
to help account for the missing data, where missing data were not MAR.
Warren Schlechte
_____________________________________________
From: Dale McLerran [mailto:stringplayer_2@YAHOO.COM]
Sent: Wednesday, September 08, 2010 11:49 AM
Subject: Re: ESTIMATE statement in Proc MIXED
John,
It sounds like the assumption of missing at random is violated in your
data. Perhaps responses which are missing in the last four time periods
are more likely to have had a high(low) value of the response in earlier
time periods than responses which are observed. I might construct
side-by-side box plots of the response in week 1, week 2, ..., week 15
with week 17 observed on one side and week 17 missing on the other. Do
the same for week 18 observed and week 18 missing, week 19
observed/missing, and week 20 observed/missing. You could also generate
the 16 side-by-side box plots where the first box plot in each weekly
pair represents no missing information during the last four weeks and
the second box plot represents any missing information in the last four
weeks. Since you have assumed a compound symmetric residual covariance
structure, you could produce side-by-side box plots for all data from
week 1 through week 16 stratified according to missing information in
weeks 17 through 20.
I am not well versed on how to deal with the issue of informative
missingness. I do know that it can be a difficult problem to produce
appropriate point estimates when MAR cannot be assumed.
I might question the model which you have fit. Typically, with data
collected over time (especially over a rather long time period), the
assumption that the residual covariance between adjacent time periods is
the same as the residual covariance between distant time periods is not
maintained. Typically, an AR(1) model holds better for the residuals
than does a compound symmetric covariance structure. There could be a
strong person-specific (random intercept) effect to account for which,
in the absence of any further residual covariance structure would
produce a compound symmetric residual covariance.
Quite likely, there might be a person-specific random effect along with
a within-person AR(1) residual covariance structure.
Thus, I might try the model:
proc mixed data=test ;
class treat visit patient;
model result = treat|visit ;
random intercept / subject=patient(treat);
repeated visit / subject=patient(treat) type=ar(1);
estimate "Between treatments, Visits 17-20"
treat -1 1
treat*visit
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.25 -0.25 -0.25 -0.25
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25 / CL; run ;
I don't know how much this alternative model for the covariance
structure will ameliorate problems of non-random missingness.
The better the entire data covariance structure is modeled, the better
your estimates should be. But there might still be some issues
associated with violation of a MAR assumption.
Dale
---------------------------------------
Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 667-2926
Fax: (206) 667-5977
---------------------------------------
--- On Tue, 9/7/10, John Whittington <John.W@MEDISCIENCE.CO.UK> wrote:
From: John Whittington <John.W@MEDISCIENCE.CO.UK>
Subject: ESTIMATE statement in Proc MIXED
To: SAS-L@LISTSERV.UGA.EDU
Date: Tuesday, September 7, 2010, 5:02 AM
Hi Folks,
This is really an extension to my query (for which I got some very
helpful responses) about repeated measures models a couple of weeks ago.
With similar data to that which I described before, I am now exploring
it more extensively. However, I fear that I may be either doing it all
wrong or misunderstanding what is going on....
For illustration, my test data consists of 20 serial measurements in a
group of subjects split into two treatment groups. My immediate
interest (which is typical of other things I want to do with this and
similar data) is in the difference between treatments in the mean of the
last 4 measurements for each subject. I thought that the following code
(with pretty self-explanatory variable names) should achieve
that:
proc mixed data=test ;
class treat visit patient;
model result = treat|visit ;
repeated visit / subject=patient(treat)type=cs ;
estimate "Between treatments, Visits 17-20"
treat -1 1
treat*visit
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.25 -0.25 -0.25 -0.25
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25 / CL; run ;
This certainly produces very credible answers. If I've 'gone wrong'
already, then I obviously need help. However, if the above code should
do what the doctor wants it to do, then I found it interesting to
contrast this with a situation in which I modelled ONLY the last 4
measurements, with similar code:
proc mixed data=test (where = (visit gt 16)) ;
class treat visit patient;
model result = treat|visit ;
repeated visit / subject=patient(treat)type=cs ;
estimate "Between treatments, Visits 17-20 (b)"
treat -1 1
treat*visit
-0.25 -0.25 -0.25 -0.25
0.25 0.25 0.25 0.25 / CL;
run ;
I was not surprised that the SE of the estimate differed a bit with
these two approaches, but I had expected the magnitude of the reported
estimate to be the same. This seems to be the case when there is no
missing data. When, in tests, I introduced some missing data at random,
the two approaches usually gave slightly different magnitudes for the
estimate - which, in itself, suggests that I do not fully understand
what the estimates represent. However, the more important point is that
(although I have yet to be able to reproduce anything like this with
test data), I have a set of real data of this sort in which the two
approaches appear to give very different values for the estimate.
Either (a) I'm doing this all wrong, (b) I have done something silly (in
which case I look further for my silliness) or (c) there is something I
don't understand about these 'estimates'.
Can someone help me? TIA.
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: John.W@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------