```Date: Fri, 30 Dec 2005 16:16:00 -0800 Reply-To: David L Cassell Sender: "SAS(r) Discussion" From: David L Cassell Subject: Re: Post-Stratification Variance Question Comments: To: Warren.Schlechte@TPWD.STATE.TX.US In-Reply-To: <200512291607.jBTFKp6g022562@mailgw.cc.uga.edu> Content-Type: text/plain; format=flowed Warren.Schlechte@TPWD.STATE.TX.US wrote: >All (especially David Cassell, since he seems to be an expert in survey >statistics), Or at least he likes to *think* that he is. :-) :-) >Hope all is well with you and yours this holiday season. I thought you >might be able to help me with this question, even though it really isn't >a SAS question. I have posted the same question to sci.stat.consult and >sci.stat.math, but after several days, no-one has responded, so I >thought I might inquire within this group. Hey, we do stats here too. SAS is more than data warehousing, you know! :-) >I have some data that I post-stratified. Looking at equations in >Cochran (1977) Sampling Techniques (p.135) and Levy and Lemeshow (1991) >Sampling of Populations (p.137), they give the following for the >formula of variance for the mean in a post-stratified sample : >Var(x-bar post-strat) = ((N-n)/N) * (1/n) * sum[ (Nh/N) * VARhx] + >((N-n)/N) * (1/n^2) * sum[VARhx*((N-Nh)/N)] >{Notice, almost everything is given in terms of overall sample (n) and >pop (N) size} >The formula for variance for the mean in a pre-stratified sample is: >Var(x-bar pre-strat) = ((Nh-nh)/Nh) * (1/N^2) * sum[ (Nh^2) * VARhx/nh] >{Notice, almost everything is given in terms of stratum sample (nh) and >pop (Nh) size} >I have an example below, where I compare the variance computed if the >sample was pre or post-stratified. When the sample sizes are equal, >the pre-stratified variances (within strata) are always smaller than >the post-stratified. Further, the overall variance of the mean is >smaller as well. (This is as I anticipated) >However, when the sample sizes are different, the pre-stratified >variances (within strata) are sometimes larger than the post-stratified >variances (within), and further, the total pre-stratified variances can >be larger than the total post-stratified variance. >Strata Var n N Pre-Strat Post-Strat >1 4 2 20 0.2 0.27 >2 4 2 20 0.2 0.27 >3 4 2 20 0.2 0.27 >SUM 6 60 0.6 0.8 >Strata Var n N Pre-Strat Post-Strat >1 4 10 20 0.02 0.083 >2 4 2 20 0.2 0.083 >3 4 2 20 0.2 0.083 >SUM 14 60 0.42 0.25 >This confuses me. I thought we should be penalized for >post-stratification. >The issue is, in pre-stratified, the increase in sample size in a >single stratum improves only that stratum's variance estimate, whereas >in post-stratified, it improves all estimates. >Is this right? Actually there should NOT be that much penalty for post-stratification. But post-stratification works from a single sample across your strata, so adding sample points anywhere helps the whole variance-estimation process. And yes, in stratified sampling, you are - in essence - performing K different independent samples. So increasing n1 helps that stratum's variance estimation directly.. but should help any overal estimation indirectly, in much the same way that you can improve the pooled-variance estimation in ANOVA when you increase the sample size in only one of the treatment groups. Also, there can be plenty of situations where post-stratification can do a lot better than using a stratified sample (your 'pre-stratification'). Stratified samples do better (in a statistical sense) than ordinary SRS (weighted or not) when you have two conditions met: [1] The variance within strata is (mostly) much smaller than than the variance across the strata. [2] Your (stratum variable) misclassification rate is low, say well under 20%. (see Olsen and Urquhart, Proceedings of the American Statistical Association, round about 1991 or so.) If these two conditions are not met, then stratified sampling is not helping you. In this case, ordinary SRS is likely to yield better variance estimates, and post- stratification is going to be preferable. Period. Stratified sampling is perhaps the most over-used tool in the survey sampling toolbox, since people get told that 'it gives improved error estimates' without much in the way of caveats. However, in a lot of cases, there are other non-statistical reasons to use stratified sampling. Like budget and logistics. In lots of surveys, I have opted to use multipliers (the SIZE variable in PROC SURVEYSELECT) instead of stratified sampling to get the right sizes of subsets. It doesn't yield *exact* subset sizes, but then stratified sampling seldom does either, due to fieldwork issues. HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330 _________________________________________________________________ Express yourself instantly with MSN Messenger! Download today - it's FREE! http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/ ```

Back to: Top of message | Previous page | Main SAS-L page