|Date: ||Tue, 28 Feb 2006 17:41:48 -0800|
|Reply-To: ||David L Cassell <davidlcassell@MSN.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||David L Cassell <davidlcassell@MSN.COM>|
|Subject: ||Re: Multilevel weights in PROC MIXED|
|Content-Type: ||text/plain; format=flowed|
>There is a claim that PROC MIXED does not handle weighting correctly in
>Using TIMSS to Analyze Correlates of Performance Variation in Mathematics
>(See Appendix D in particular for a comparison of results using their
>estimator and the PROC MIXED estimates.)
>In the paper, the authors claim that PROC MIXED produces biased estimates
>of the fixed effects. They then go on to use a correction attributed to
>Pfeffermann et. al
>Pfeffermann, D., Skinner, C. J., Holmes D. J,and H. Goldstein, Rasbash,
>J., (1998). Weighting for Unequal Selection Probabilities in Multilevel
>Models. JRSS, Series B, 60, 123-40.
>There is also some discussion about this on:
>This seems to be the usual argument that ordinary statistical procedures
>are not suitable for survey data and possibly a need for a SURVEYMIXED
>procedure. However, survey procedures and non-survey procedures produce
>unbiased estimates for fixed effects (just the standard errors are at
>issue). Here, the claim is that PROC MIXED produces biased estimates with
>sampling weights. Any thoughts or any other evaluations using own data?
Before anyone starts lobbying for PROC SURVEYMIXED, I want to address
the key point:
If you go read my SUGI paper, I have some things to say on the subject.
The key point for *me* is that many HLMs and multi-level models that need
to be analyzed with survey sampling techniques don't really HAVE random
Let me repeat that.
We perform a cluster sample at stage one, so we have some (random) subset
of the primary sampling units. Then we perform further cluster sampling to
to stage two, or even further, and end up with multi-stage sampling. If we
at this from a survey sample perspective, there's no problem. We have
which can be addressed by the simple method of using the CLUSTER statement.
We do not have random effects. The fact that we have some subset of the
total number of primary sampling units does not give us a random effect. So
we don't necessarily have a mixed model. At all.
But if we ignore the sample survey issue, then suddenly we have to start
pretending that we have a random effect. We have taken K of the K' possible
subsets, and we want to be able to infer up to the fullness of all K'
That sure sounds like a random effect. But it isn't! It's cluster
an entirely different paradigm, from an entirely different area of
So we have to weed out all the not-really-random-effects problems before
we get to any data structures which have *real* random effects buried in
real multi-stage samples.
As for the issue of biased parameter estimates, the problem is the mixed
We can get unbiased estimates for the fixed effects, as you noted. This
because of the way that we estimate fixed effects using Taylor series
in sampling theory. But we don't do the computations for random effects in
the same way, so we have other problems creeping in.
I have seen a case for real random effects modeling in a survey sample
But far more often, what I see are hierarchical linear models and
which should not be treated as if they have random effects.
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
FREE pop-up blocking with the new MSN Toolbar – get it now!