Date: Fri, 30 Dec 2005 16:16:00 -0800
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: Post-Stratification Variance Question
In-Reply-To: <200512291607.jBTFKp6g022562@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
Warren.Schlechte@TPWD.STATE.TX.US wrote:
>All (especially David Cassell, since he seems to be an expert in survey
>statistics),
Or at least he likes to *think* that he is. :-) :-)
>Hope all is well with you and yours this holiday season. I thought you
>might be able to help me with this question, even though it really isn't
>a SAS question. I have posted the same question to sci.stat.consult and
>sci.stat.math, but after several days, no-one has responded, so I
>thought I might inquire within this group.
Hey, we do stats here too. SAS is more than data warehousing, you
know! :-)
>I have some data that I post-stratified. Looking at equations in
>Cochran (1977) Sampling Techniques (p.135) and Levy and Lemeshow (1991)
>Sampling of Populations (p.137), they give the following for the
>formula of variance for the mean in a post-stratified sample :
>Var(x-bar post-strat) = ((N-n)/N) * (1/n) * sum[ (Nh/N) * VARhx] +
>((N-n)/N) * (1/n^2) * sum[VARhx*((N-Nh)/N)]
>{Notice, almost everything is given in terms of overall sample (n) and
>pop (N) size}
>The formula for variance for the mean in a pre-stratified sample is:
>Var(x-bar pre-strat) = ((Nh-nh)/Nh) * (1/N^2) * sum[ (Nh^2) * VARhx/nh]
>{Notice, almost everything is given in terms of stratum sample (nh) and
>pop (Nh) size}
>I have an example below, where I compare the variance computed if the
>sample was pre or post-stratified. When the sample sizes are equal,
>the pre-stratified variances (within strata) are always smaller than
>the post-stratified. Further, the overall variance of the mean is
>smaller as well. (This is as I anticipated)
>However, when the sample sizes are different, the pre-stratified
>variances (within strata) are sometimes larger than the post-stratified
>variances (within), and further, the total pre-stratified variances can
>be larger than the total post-stratified variance.
>Strata Var n N Pre-Strat Post-Strat
>1 4 2 20 0.2 0.27
>2 4 2 20 0.2 0.27
>3 4 2 20 0.2 0.27
>SUM 6 60 0.6 0.8
>Strata Var n N Pre-Strat Post-Strat
>1 4 10 20 0.02 0.083
>2 4 2 20 0.2 0.083
>3 4 2 20 0.2 0.083
>SUM 14 60 0.42 0.25
>This confuses me. I thought we should be penalized for
>post-stratification.
>The issue is, in pre-stratified, the increase in sample size in a
>single stratum improves only that stratum's variance estimate, whereas
>in post-stratified, it improves all estimates.
>Is this right?
Actually there should NOT be that much penalty for post-stratification. But
post-stratification works from a single sample across your strata, so adding
sample points anywhere helps the whole variance-estimation process.
And yes, in stratified sampling, you are - in essence - performing K
different
independent samples. So increasing n1 helps that stratum's variance
estimation directly.. but should help any overal estimation indirectly, in
much
the same way that you can improve the pooled-variance estimation in ANOVA
when you increase the sample size in only one of the treatment groups.
Also, there can be plenty of situations where post-stratification can do a
lot
better than using a stratified sample (your 'pre-stratification').
Stratified samples
do better (in a statistical sense) than ordinary SRS (weighted or not) when
you have two conditions met:
[1] The variance within strata is (mostly) much smaller than than the
variance
across the strata.
[2] Your (stratum variable) misclassification rate is low, say well under
20%.
(see Olsen and Urquhart, Proceedings of the American Statistical
Association,
round about 1991 or so.)
If these two conditions are not met, then stratified sampling is not helping
you.
In this case, ordinary SRS is likely to yield better variance estimates, and
post-
stratification is going to be preferable. Period.
Stratified sampling is perhaps the most over-used tool in the survey
sampling
toolbox, since people get told that 'it gives improved error estimates'
without
much in the way of caveats. However, in a lot of cases, there are other
non-statistical reasons to use stratified sampling. Like budget and
logistics.
In lots of surveys, I have opted to use multipliers (the SIZE variable in
PROC
SURVEYSELECT) instead of stratified sampling to get the right sizes of
subsets.
It doesn't yield *exact* subset sizes, but then stratified sampling seldom
does
either, due to fieldwork issues.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE!
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/