Date: Tue, 20 May 2008 23:37:26 -0400
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Random Sample by Counselor
In-Reply-To: <51F45499BCFE674F8A4EA4841F9CE6920B98E1B9B2@STLEXVN01P.cent
ene.com>
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 02:57 PM 5/20/2008, Allen Frommelt wrote:
>I have a database of participant id, and health counselor. I need
>to create a 25% random sample by counselor. Is there a way to do
>this in SPSS? Thanks!
All the participants for 25% of the counselors, or 25% of the
participants for all counselors?
If you want all participants for 25% of the counselors, you get a
list of the counselors, sample 25% of them, and merge back with the
original file. As for the sampling, use any method you please; though
the SAMPLE command requires you to hard-code the exact number of
counselors, if you want a sample as near as possible to an exact 25%.
The following code is tested, but this is simply the code, not a
listing. (The LIST commands should be removed for production use.) It
uses dataset logic (SPSS 14 and later), and assumes that the data is
in an active dataset named PartiList.
DATASET DECLARE CounsList.
AGGREGATE OUTFILE=CounsList
/BREAK=Counselor
/CaseLoad 'No. of clients for counselor' = NU.
DATASET ACTIVATE CounsList WINDOW=FRONT.
* Sample 25% of the counselors by the 'K/N' metnod: .
COMPUTE NOBREAK = 1.
DATASET DECLARE CounsCount.
AGGREGATE OUTFILE=CounsCount
/BREAK=NOBREAK
/N 'Number of counselors' = NU.
DATASET ACTIVATE CounsCount WINDOW=FRONT.
NUMERIC K (F3).
VAR LABEL K 'Counselors to sample'.
COMPUTE K = RND(0.25*N).
FORMATS N K (F3).
DATASET ACTIVATE CounsList WINDOW=FRONT.
MATCH FILES
/FILE =*
/TABLE=CounsCount
/BY NOBREAK.
DO IF $CASENUM EQ 1.
. COMPUTE #K = K.
. COMPUTE #N = N.
END IF.
NUMERIC InSample (F2).
VAR LABELS InSample 'Indicator: Counselor is in sample'.
COMPUTE InSample = RV.BERNOULLI(#K/#N).
COMPUTE #K = #K - InSample.
COMPUTE #N = #N - 1.
LIST Counselor InSample.
DATASET ACTIVATE PartiList WINDOW=FRONT.
* Attach 'Sampled' flag to participant records .
MATCH FILES
/FILE =PartiList
/TABLE=CounsList
/BY Counselor
/DROP = NOBREAK K N CaseLoad.
. /**/ LIST /*-*/.
SELECT IF InSample.
============================
APPENDIX: Test data and code
============================
* ................................................................. .
* ................. Test data ..................... .
SET RNG = MT /* 'Mersenne twister' random number generator */ .
SET MTINDEX = 9518 /* A phone number in Maryland */ .
INPUT PROGRAM.
. NUMERIC Counselor (N3)
Participant(F5).
. LEAVE Counselor.
. LOOP #I_Couns = 1 TO 12.
. COMPUTE Counselor = TRUNC(RV.UNIFORM(100,1000)).
. COMPUTE #N_Client = RV.POISSON(5).
. LOOP #I_Client = 1 TO #N_Client.
. COMPUTE Participant = TRUNC(RV.UNIFORM(1E4,1E5)).
. END CASE.
. END LOOP.
. END LOOP.
END FILE.
END INPUT PROGRAM.
SORT CASES BY Counselor Participant.
DATASET NAME PartiList WINDOW=FRONT.
LIST.
* ................. Post after this point ..................... .
* ................................................................. .
DATASET DECLARE CounsList.
AGGREGATE OUTFILE=CounsList
/BREAK=Counselor
/CaseLoad 'No. of clients for counselor' = NU.
DATASET ACTIVATE CounsList WINDOW=FRONT.
* ................. Post after this point ..................... .
* Sample 25% of the counselors by the 'K/N' metnod: .
COMPUTE NOBREAK = 1.
DATASET DECLARE CounsCount.
AGGREGATE OUTFILE=CounsCount
/BREAK=NOBREAK
/N 'Number of counselors' = NU.
DATASET ACTIVATE CounsCount WINDOW=FRONT.
NUMERIC K (F3).
VAR LABEL K 'Counselors to sample'.
COMPUTE K = RND(0.25*N).
FORMATS N K (F3).
DATASET ACTIVATE CounsList WINDOW=FRONT.
MATCH FILES
/FILE =*
/TABLE=CounsCount
/BY NOBREAK.
DO IF $CASENUM EQ 1.
. COMPUTE #K = K.
. COMPUTE #N = N.
END IF.
NUMERIC InSample (F2).
VAR LABELS InSample 'Indicator: Counselor is in sample'.
COMPUTE InSample = RV.BERNOULLI(#K/#N).
COMPUTE #K = #K - InSample.
COMPUTE #N = #N - 1.
LIST Counselor InSample.
DATASET ACTIVATE PartiList WINDOW=FRONT.
* Attach 'Sampled' flag to participant records .
MATCH FILES
/FILE =PartiList
/TABLE=CounsList
/BY Counselor
/DROP = NOBREAK K N CaseLoad.
. /**/ LIST /*-*/.
SELECT IF InSample.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD