LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2001, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 21 Sep 2001 11:41:51 -0400
Reply-To:     "Elmaache, Hamani" <Hamani.Elmaache@CCRA-ADRC.GC.CA>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Elmaache, Hamani" <Hamani.Elmaache@CCRA-ADRC.GC.CA>
Subject:      Urgent: Proc MEANS and SURVEYMEANS problem!
Comments: To: "Cassell.David@epamail.epa.gov" <Cassell.David@epamail.epa.gov>,
          barrere Bendia <bendiabare@NETSCAPE.NET>
Content-Type: text/plain; charset="iso-8859-1"

Hi there. Hi David and thank you for all you did for my. To calculate some mean I used both Proc MEANS and SURVEYMEANS and I got the same mean but NOT the same Std Error. Look at these two tables:

Here the results obtained by PROC MEANS.

Std Error MARIT_ST Mean of Mean ====================================== Divorced 0.1114163 0.0141138 Marri_Cl 0.0823918 0.0158222 Separated 0.2323455 0.0189060 ======================================

and here the results obtained by PROC SURVEYMEANS

Std Error MARIT_ST Mean of Mean ====================================== Divorced 0.111416 0.021776 Marri_Cl 0.082392 0.020003 Separated 0.232345 0.032735 ======================================

Can some body explain my why? Thanks in advance. ================== ================== -----Original Message----- From: Cassell.David@epamail.epa.gov [mailto:Cassell.David@epamail.epa.gov] Sent: September 10, 2001 4:24 PM To: barrere Bendia Cc: SAS-L@LISTSERV.UGA.EDU Subject: Re: SURVEYMEANS_Problem

[personal email cc also] barrere Bendia wrote [in part]: > Here N = POPULATION SIZE > > HERE WE HAVE STRATA 3x4=12 > > From the stratum (Divorced,AB) they have been draw > a sample with size n11=50 ( the first cell) > and from the stratum (Divorced,BC) they have been > draw a sample with size n12=50 > .... and so on until the last cell (or stratum) (Widowed,EX) > they have been draw a sample with size n43=4

So they have a stratified sample without replacement. I guess. Did they tell you how they did the sampling within each stratum? Did they tell you why they used strata at all [it is not the optimal approach in many situations]? Did they say why they were working with strata of such vastly different sizes? Did they tell whether there was any problem with unavailable data [I mean, why 4 of 7 in the last stratum instead of all 7? Why only 15 in stratum 6? Were there non-response issues?]?

> Then I have to weight the observations in each cell with equal > probaility of inclusion.; for example all the observations i > n the first stratum (Divorced,AB) will be have the > weight=N11/N =19827/532147 > and all the observations in the first stratum > (Divorced,BC) will be have the weight=N12/N =24879/532147

But they're not sampled with equal inclusion probabilities. Stratum 12 is obviously sampled much more densely than stratum 1, for example. Can you find out if there were non-response issues which led to the sample size numbers you showed? Because, if there were, then you may need to make some further weight adjustments.

You know what the population size is [well, at least you supposedly do]. I would speculate [based on what you have learned so far] that the correct weights to use are *not* the ones you wrote above, but in fact weights that look like the size of the stratum divided by the size of the sample. then each weight would literally become the number of people in the population represented by a point in your sample. So: weight_1 = 19827 / 50 ; weight_2 = 24879 / 50 ; weight_3 = 44 / 30 ; weight_4 =199423 / 24 ; * Weird! Why a low sample size for a big stratum? ; weight_5 =236401 / 29 ; * Ditto! ; weight_6 = 665 / 15 ; . . . weight_12= 7 / 4 ;

This will give you a way to collapse back to your original categories. But the vastly-differing weights will be a BAD thing. By my calculations, your sample weights range from 1.75 up to 8,309.29167 . That's a ratio of nearly 5000 . A ratio like that will cause some of your groups to have essentially NO NOTICEABLE EFFECT on your estimates. But you're stuck with someone else's sample, so that's not your fault. It *is* your problem, unfortunately.

Could you check with the creators of the sample on the above issues, and see if they have any resolutions? Then we can start to address your SAS questions when you come back next time...

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page