LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 9 Sep 2010 09:55:39 -0500
Reply-To:     Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Warren Schlechte <Warren.Schlechte@TPWD.STATE.TX.US>
Subject:      Re: ESTIMATE statement in Proc MIXED
Content-Type: text/plain; charset="us-ascii"

In surveys, we often think that those who return the surveys likely respond differently than those who do. As such, we try to correct for this non-response bias. if we can detect trends in the response rates associated with time of return, we can model the missing data (non-returns). I wonder if you couldn't do something similar here, where the missing data are imputed based on a model of what you do observe. For example, maybe patients stop coming in once they get well, so missing data reflect the resolved cases. Or, maybe they get too sick to come in, so they reflect the terminal cases. In either case, looking at the data to see if some model explains the missingness may help you model the missingness.

I'm no expert in this field, but I recently saw a talk where the authors used the idea of having model hyperparameters within a Bayesian setting to help account for the missing data, where missing data were not MAR.

Warren Schlechte

_____________________________________________ From: Dale McLerran [mailto:stringplayer_2@YAHOO.COM] Sent: Wednesday, September 08, 2010 11:49 AM Subject: Re: ESTIMATE statement in Proc MIXED


It sounds like the assumption of missing at random is violated in your data. Perhaps responses which are missing in the last four time periods are more likely to have had a high(low) value of the response in earlier time periods than responses which are observed. I might construct side-by-side box plots of the response in week 1, week 2, ..., week 15 with week 17 observed on one side and week 17 missing on the other. Do the same for week 18 observed and week 18 missing, week 19 observed/missing, and week 20 observed/missing. You could also generate the 16 side-by-side box plots where the first box plot in each weekly pair represents no missing information during the last four weeks and the second box plot represents any missing information in the last four weeks. Since you have assumed a compound symmetric residual covariance structure, you could produce side-by-side box plots for all data from week 1 through week 16 stratified according to missing information in weeks 17 through 20.

I am not well versed on how to deal with the issue of informative missingness. I do know that it can be a difficult problem to produce appropriate point estimates when MAR cannot be assumed.

I might question the model which you have fit. Typically, with data collected over time (especially over a rather long time period), the assumption that the residual covariance between adjacent time periods is the same as the residual covariance between distant time periods is not maintained. Typically, an AR(1) model holds better for the residuals than does a compound symmetric covariance structure. There could be a strong person-specific (random intercept) effect to account for which, in the absence of any further residual covariance structure would produce a compound symmetric residual covariance. Quite likely, there might be a person-specific random effect along with a within-person AR(1) residual covariance structure. Thus, I might try the model:

proc mixed data=test ; class treat visit patient; model result = treat|visit ; random intercept / subject=patient(treat); repeated visit / subject=patient(treat) type=ar(1); estimate "Between treatments, Visits 17-20" treat -1 1 treat*visit 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.25 -0.25 -0.25 -0.25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25 / CL; run ;

I don't know how much this alternative model for the covariance structure will ameliorate problems of non-random missingness. The better the entire data covariance structure is modeled, the better your estimates should be. But there might still be some issues associated with violation of a MAR assumption.


--------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

--- On Tue, 9/7/10, John Whittington <John.W@MEDISCIENCE.CO.UK> wrote:

From: John Whittington <John.W@MEDISCIENCE.CO.UK> Subject: ESTIMATE statement in Proc MIXED To: SAS-L@LISTSERV.UGA.EDU Date: Tuesday, September 7, 2010, 5:02 AM

Hi Folks,

This is really an extension to my query (for which I got some very helpful responses) about repeated measures models a couple of weeks ago. With similar data to that which I described before, I am now exploring it more extensively. However, I fear that I may be either doing it all wrong or misunderstanding what is going on....

For illustration, my test data consists of 20 serial measurements in a group of subjects split into two treatment groups. My immediate interest (which is typical of other things I want to do with this and similar data) is in the difference between treatments in the mean of the last 4 measurements for each subject. I thought that the following code (with pretty self-explanatory variable names) should achieve that:

proc mixed data=test ; class treat visit patient; model result = treat|visit ; repeated visit / subject=patient(treat)type=cs ; estimate "Between treatments, Visits 17-20" treat -1 1 treat*visit 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 -0.25 -0.25 -0.25 -0.25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.25 0.25 0.25 0.25 / CL; run ;

This certainly produces very credible answers. If I've 'gone wrong' already, then I obviously need help. However, if the above code should do what the doctor wants it to do, then I found it interesting to contrast this with a situation in which I modelled ONLY the last 4 measurements, with similar code:

proc mixed data=test (where = (visit gt 16)) ; class treat visit patient; model result = treat|visit ; repeated visit / subject=patient(treat)type=cs ; estimate "Between treatments, Visits 17-20 (b)" treat -1 1 treat*visit -0.25 -0.25 -0.25 -0.25 0.25 0.25 0.25 0.25 / CL; run ;

I was not surprised that the SE of the estimate differed a bit with these two approaches, but I had expected the magnitude of the reported estimate to be the same. This seems to be the case when there is no missing data. When, in tests, I introduced some missing data at random, the two approaches usually gave slightly different magnitudes for the estimate - which, in itself, suggests that I do not fully understand what the estimates represent. However, the more important point is that (although I have yet to be able to reproduce anything like this with test data), I have a set of real data of this sort in which the two approaches appear to give very different values for the estimate. Either (a) I'm doing this all wrong, (b) I have done something silly (in which case I look further for my silliness) or (c) there is something I don't understand about these 'estimates'.

Can someone help me? TIA.

Kind Regards,


---------------------------------------------------------------- Dr John Whittington, Voice: +44 (0) 1296 730225 Mediscience Services Fax: +44 (0) 1296 738893 Twyford Manor, Twyford, E-mail: Buckingham MK18 4EL, UK ----------------------------------------------------------------

Back to: Top of message | Previous page | Main SAS-L page