=========================================================================
Date: Fri, 7 Jul 2006 16:15:10 -0500
Reply-To: "Marks, Jim" <Jim.Marks@lodgenet.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Marks, Jim" <Jim.Marks@lodgenet.com>
Subject: Re: PPS sampling
Content-Type: text/plain; charset="us-ascii"
Is this what you want?
** Every 4th case, starting on case 3, with the sample proportionate to
the school size.
** sample data (use your sample of individual students.
data list free /school (f8.0).
BEGIN DATA
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
END DATA.
FREQ school.
SORT CASES BY school.
COMPUTE sel_order = mod($CASENUM,4).
COMPUTE sample = sel_order EQ 3.
FILTER BY sample.
FREQ school.
The operations In your file of individuals students, 1) sort on the
variable of interest, 2) calculute the modulus of the casenumber,
3)select one value of the modulus for your sample, 4) filter or delete
unselected cases.
Notice that the samples are not exact proportions. In my sample data, we
have
5/20 (25%) for school = 1
7/30 (23.3%)for school = 2
7/25 (28%) for school = 3
19/75 (25.3%) for the complete sample.
If you want to randomize within schools, calculate a random value for
each student, and use that in the sort command:
COMPUTE rndize = uniform(1).
SORT CASES BY school rndize.
Continue with rest of the COMPUTE statements.
--jim
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Cathal McCrory
Sent: Friday, July 07, 2006 10:06 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: PPS sampling
I have the enrolment numbers for 3200 schools and I want to select
13,000 pupils from within these schools on a PPS basis. For example:
Pupil Cumulative
School Enrolment Population
1 116 116
2 163 279
3 232 511
4 204 715
5 274 989
6 188 1177
7 210 1387
8 407 1794
9 298 2092
I want to generate a random start within SPSS and select every 4th pupil
from within the cumulative population total (rather than a set number of
cases) and have the program iterate until I have drawn my sample of
13,000.
I was wondering whether anyone could give me any pointers (to existing
routines) or offer any guidance on this. Many thanks.