Date: Fri, 11 Apr 2003 16:52:40 -0700
Reply-To: cassell.david@EPAMAIL.EPA.GOV
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: Proc Surveyselect and Minimal Cell Sizes - Was RE: Proc Sort
Rand om
Content-type: text/plain; charset=us-ascii
"Gerstle, John" <yzg9@CDC.GOV> replied [in part]:
> David (not Dale right?),
Right. :-)
> It does seem to be sample crazy on the list the past few days.
And it's probably not due to my speech at SUGI either. :-)
> I read in the documentation on Proc Surveyselect about the CONTROL
statement
> etc. Intuitively, one would think that if it's a random sampling
procedure,
> one would not need to sort the dataset - test each record for it's
inclusion
> in one of the levels of the strata and place in the resulting dataset,
and
> continue until the sample size requirements are met. And I repeat,
> intuitively. Of well, not a big deal. But wait, what I mention
above, is
> this what the CONTROL and SIZE statements do? I need to read more on
this.
But the CONTROL statement is not designed for doing SRS sampling.
It is designed so that you can do sequential or systematic sampling
(in the manner of Chromy). For these, you need a defined order.
PROC SURVEYSELECT even lets you do 'serpentine' ordering on multiple
variables.. and lets you output the resulting re-ordered frame using
the OUTSORT= option.
> proc sort data=matchSN_BS2; by quadrant quad4; run;
> proc surveyselect data=matchSN_BS2 out=matchedSN_BS
> n = (200 200 200 50 50 50 10 10 10 10 10 10) seed=123 ;
> strata quadrant quad4;
> id _all_;
> run;
>
> This did not work because the dataset that I'm using to test the code
does
> NOT have these cell numbers, i.e. there are only 10 cases where
quadrant=1,
> even though I need to sample towards 200. So I need to figure out a
way to
> run the sampling where it takes into account the minimum value between
the
> actual n and the maximum n (200).
>
> I was reading about SAMPSIZE=SAS-dataset but I'm not sure how that
dataset
> should be organized. There aren't examples showing this. I could use
ODS
> Output on a Proc Freq, get the actual counts for the data, transpose
the
> dataset into one row. Would that be what is required here?
Actually, you want to organize that auxiliary data set as if you were
about to merge the frame with the auxiliary. So put in three variables:
QUADRANT, QUAD4, and _NSIZE_ (use that exact variable name). Put in a
record for each value of quadrant and quad4, with the desired/feasible
sample size for that stratum. Do that PROC FREQ, so you know to put in
values of sample size which are achievable within each stratum, even if
you are selecting *every* record in the stratum (that's legal, SAS will
compute the right sample weight for you). When you have a requirement
like SRS sampling and limitations like frame sizes that cannot meet your
specs, SAS will not know what to do. You need to put in workable sample
sizes by hand here.
BTW, you can specify a seed basically as any integer between 1 and
2**31-1
so you don't need to stick with '123'. I know you knew that, but I
thought I'd say so for the benefit of the home audience. :-)
HTH,
David
--
David Cassell, CSC
Cassell.David@epa.gov
Senior computing specialist
mathematical statistician
|