Date: Wed, 16 Jan 2008 03:34:19 -0500
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: drawing samples for hundreds of workers
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 08:33 PM 1/14/2008, Raffe, Sydelle, SSA wrote:
>In my file, there are unique case records. These are apportioned to
>hundreds of different workers such that each worker has multiple cases.
>
>We want to make a random selection of each workers cases. I don't
>think that's what I led [John Norton] to understand.
And at 01:40 PM 1/15/2008, Raffe, Sydelle, SSA wrote:
>Actually, we want 6 cases randomly selected for each worker.
King Douglas gave a nice implementation using SORT CASES and RANK. As
an alternative, here's the implementation with AGGREGATE and 'k/n'
logic. (It requires that the file be grouped, but not necessarily
sorted, by ID.) I'm selecting three records per worker.
|-----------------------------|---------------------------|
|Output Created |16-JAN-2008 03:32:16 |
|-----------------------------|---------------------------|
ID Fname Lname RecdDate
A35 Aaron Aardvark 18-DEC-2004
A35 Aaron Aardvark 25-MAY-2005
A35 Aaron Aardvark 16-JUL-2005
A42 Bethany Birkinwell 30-OCT-2004
A42 Bethany Birkinwell 05-DEC-2004
A42 Bethany Birkinwell 24-DEC-2004
A42 Bethany Birkinwell 25-DEC-2004
C19 Charles Cubbage 25-JUL-2003
C19 Charles Cubbage 02-SEP-2003
C21 Dorothy Dickens 14-NOV-2002
D98 Ellis Etheridge 19-SEP-2000
Number of cases read: 11 Number of cases listed: 11
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK=ID
/NRecords 'Number of records for employee'=NU.
NUMERIC #K #N (F3).
DO IF $CASENUM EQ 1
OR ID NE LAG(ID).
. COMPUTE #N = NRecords /* Total records, per worker */.
. COMPUTE #K = MIN(3,#N) /* Number to sample, per worker */.
END IF.
. /*-- PRINT / 'Record ' ID Fname Lname RecdDate ': ' /*-*/
/*-- 'K=' #K ', N=' #N /*-*/.
COMPUTE #Take_It = RV.BERNOULLI(#K/#N).
COMPUTE #K = #K - #Take_It.
COMPUTE #N = #N - 1.
. /*-- PRINT / ' TAKE=' #Take_It /*-*/.
SELECT IF #Take_It.
. /*-- EXECUTE /*-*/.
LIST.
List
|-----------------------------|---------------------------|
|Output Created |16-JAN-2008 03:32:17 |
|-----------------------------|---------------------------|
ID Fname Lname RecdDate NRecords
A35 Aaron Aardvark 18-DEC-2004 3
A35 Aaron Aardvark 25-MAY-2005 3
A35 Aaron Aardvark 16-JUL-2005 3
A42 Bethany Birkinwell 30-OCT-2004 4
A42 Bethany Birkinwell 05-DEC-2004 4
A42 Bethany Birkinwell 25-DEC-2004 4
C19 Charles Cubbage 25-JUL-2003 2
C19 Charles Cubbage 02-SEP-2003 2
C21 Dorothy Dickens 14-NOV-2002 1
D98 Ellis Etheridge 19-SEP-2000 1
Number of cases read: 10 Number of cases listed: 10
===================
APPENDIX: Test data
===================
* ................................................................. .
* ................. Test data ..................... .
SET RNG = MT /* 'Mersenne twister' random number generator */ .
SET MTINDEX = 3605 /* Providence, RI telephone book */ .
INPUT PROGRAM.
. DATA LIST LIST
/ID Fname Lname
(A4,A8, A12).
. LEAVE ID Fname Lname.
. NUMERIC RecdDate (DATE11).
. LEAVE RecdDate.
. COMPUTE RecdDate=RV.UNIFORM(DATE.MDY(01,01,2000),
DATE.MDY(01,01,2005)).
. COMPUTE RecdDate=XDATE.DATE(RecdDate).
. NUMERIC #NRecrds #RecdNum (F3).
. COMPUTE #NRecrds = TRUNC(RV.UNIFORM(1,5)).
. LOOP #RecdNum = 1 TO #NRecrds.
. COMPUTE RecdDate = RecdDate + RV.EXP(1/TIME.DAYS(45)).
. COMPUTE RecdDate=XDATE.DATE(RecdDate).
. END CASE.
. END LOOP.
END INPUT PROGRAM.
BEGIN DATA
A35 Aaron Aardvark
A42 Bethany Birkinwell
C19 Charles Cubbage
C21 Dorothy Dickens
D98 Ellis Etheridge
END DATA.
LIST.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD