Date: Thu, 29 Sep 2005 11:17:24 -0400
Reply-To: Richard Ristow <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <firstname.lastname@example.org>
Subject: Re: Random number without replacement
In-Reply-To: <5CFEFDB5226CB54CBB4328B9563A12EE02BF8FF2@hqemail2.spss.com >
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 08:20 AM 9/29/2005, Peck, Jon wrote:
>It is worth noting that the SPSS Select Cases dialog can, in effect,
>do this for you. Go to Data/Select Cases/Random Sample of
>Cases. Choose "exactly m cases from the first n cases". If you paste
>the code from the dialog, you will see a striking resemblance.
My goodness! You're right, of course. That's neat, to have the "k/n"
algorithm clicked up from the menus. GOOD for you guys.
I'm a syntax jock, and I haven't explored everything in the menus. I'd
simply assumed that Data/Select cases/Random sample of cases
implemented the "approximately xx% of cases," and nothing else. Too
>If you have the integers as a variable over those cases, it is doing
>the same thing as below.
Yes. I adapted the logic to loop internally over the set of integers,
directly, thinking there was no need to write and read back the
10,000-case file. But selecting cases from a file is much more common.
>By default, the random number generators use a random seed, so it is
>only necessary to set the seed explicitly if you need the calculation
>to be exactly repeatable.
Again, I'm doing what I learned was wise, and probably is too elaborate
with modern software.
Having the calculation exactly repeatable isn't too elaborate. It's
advised so you can tell whether a different sample on a different run
was normal random selection, or a change in logic. On the other hand,
it's overdone for a simple k/n program.
I was taught to mistrust automatic seeding. Traditionally, it was done
off the system clock. Depending on what bits of the clock counter were
used, you could get correlated sequences from different seedings,
especially if you did them too close together, before the clock had a
chance to change much. (That's also one of the many reasons not to
re-seed the random number generator during a run. I've posted on that,
a couple of times.)
SPSS and other modern systems probably do something more sophisticated.
I'll guess the system clock comes into it, since that's still a good
source of lots of bit sequences. (You could, of course, look up on page
769 of the Providence telephone book. :-s)
Similarly, I'm not sure whether you'd now recommend discarding a number
of variates after seeding. (Perhaps SPSS even does this, after
Back on 6 August, Richard Oliver gave a reference to the random-number
algorithms SPSS uses: "The new random number generator is the Mersenne
Twister, and the old one is now referred to as the "SPSS 12.0
compatible" random number generator." Could you say how SPSS
random-initializes it, and what you now think about throwing away a few
variates after seeding?
And thanks for correcting what I'd assumed about the menus.