Date: Fri, 21 Sep 2001 11:41:51 -0400
Reply-To: "Elmaache, Hamani" <Hamani.Elmaache@CCRA-ADRC.GC.CA>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Elmaache, Hamani" <Hamani.Elmaache@CCRA-ADRC.GC.CA>
Subject: Urgent: Proc MEANS and SURVEYMEANS problem!
Content-Type: text/plain; charset="iso-8859-1"
Hi there.
Hi David and thank you for all you did for my.
To calculate some mean I used both Proc MEANS and SURVEYMEANS and
I got the same mean but NOT the same Std Error. Look at these two tables:
Here the results obtained by PROC MEANS.
Std Error
MARIT_ST Mean of Mean
======================================
Divorced 0.1114163 0.0141138
Marri_Cl 0.0823918 0.0158222
Separated 0.2323455 0.0189060
======================================
and here the results obtained by PROC SURVEYMEANS
Std Error
MARIT_ST Mean of Mean
======================================
Divorced 0.111416 0.021776
Marri_Cl 0.082392 0.020003
Separated 0.232345 0.032735
======================================
Can some body explain my why?
Thanks in advance.
==================
==================
-----Original Message-----
From: Cassell.David@epamail.epa.gov
[mailto:Cassell.David@epamail.epa.gov]
Sent: September 10, 2001 4:24 PM
To: barrere Bendia
Cc: SAS-L@LISTSERV.UGA.EDU
Subject: Re: SURVEYMEANS_Problem
[personal email cc also]
barrere Bendia wrote [in part]:
> Here N = POPULATION SIZE
>
> HERE WE HAVE STRATA 3x4=12
>
> From the stratum (Divorced,AB) they have been draw
> a sample with size n11=50 ( the first cell)
> and from the stratum (Divorced,BC) they have been
> draw a sample with size n12=50
> .... and so on until the last cell (or stratum) (Widowed,EX)
> they have been draw a sample with size n43=4
So they have a stratified sample without replacement. I guess.
Did they tell you how they did the sampling within each stratum?
Did they tell you why they used strata at all [it is not the optimal
approach in many situations]? Did they say why they were working with
strata of such vastly different sizes? Did they tell whether there was
any problem with unavailable data [I mean, why 4 of 7 in the last stratum
instead of all 7? Why only 15 in stratum 6? Were there non-response
issues?]?
> Then I have to weight the observations in each cell with equal
> probaility of inclusion.; for example all the observations i
> n the first stratum (Divorced,AB) will be have the
> weight=N11/N =19827/532147
> and all the observations in the first stratum
> (Divorced,BC) will be have the weight=N12/N =24879/532147
But they're not sampled with equal inclusion probabilities. Stratum
12 is obviously sampled much more densely than stratum 1, for example.
Can you find out if there were non-response issues which led to the
sample size numbers you showed? Because, if there were, then you may need
to make some further weight adjustments.
You know what the population size is [well, at least you supposedly do].
I would speculate [based on what you have learned so far] that the correct
weights to use are *not* the ones you wrote above, but in fact weights
that look like the size of the stratum divided by the size of the sample.
then each weight would literally become the number of people in the
population represented by a point in your sample. So:
weight_1 = 19827 / 50 ;
weight_2 = 24879 / 50 ;
weight_3 = 44 / 30 ;
weight_4 =199423 / 24 ; * Weird! Why a low sample size for a big
stratum? ;
weight_5 =236401 / 29 ; * Ditto!
;
weight_6 = 665 / 15 ;
.
.
.
weight_12= 7 / 4 ;
This will give you a way to collapse back to your original categories. But
the vastly-differing weights will be a BAD thing. By my calculations, your
sample weights range from 1.75 up to 8,309.29167 . That's a ratio of
nearly
5000 . A ratio like that will cause some of your groups to have
essentially
NO NOTICEABLE EFFECT on your estimates. But you're stuck with someone
else's
sample, so that's not your fault. It *is* your problem, unfortunately.
Could you check with the creators of the sample on the above issues, and
see
if they have any resolutions? Then we can start to address your SAS
questions
when you come back next time...
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician