LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 29 Sep 2005 11:17:24 -0400
Reply-To:     Richard Ristow <>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <>
Subject:      Re: Random number  without  replacement
Comments: To: "Peck, Jon" <>
In-Reply-To:  < >
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 08:20 AM 9/29/2005, Peck, Jon wrote:

>It is worth noting that the SPSS Select Cases dialog can, in effect, >do this for you. Go to Data/Select Cases/Random Sample of >Cases. Choose "exactly m cases from the first n cases". If you paste >the code from the dialog, you will see a striking resemblance.

My goodness! You're right, of course. That's neat, to have the "k/n" algorithm clicked up from the menus. GOOD for you guys.

I'm a syntax jock, and I haven't explored everything in the menus. I'd simply assumed that Data/Select cases/Random sample of cases implemented the "approximately xx% of cases," and nothing else. Too little confidence.

>If you have the integers as a variable over those cases, it is doing >the same thing as below.

Yes. I adapted the logic to loop internally over the set of integers, directly, thinking there was no need to write and read back the 10,000-case file. But selecting cases from a file is much more common.

>By default, the random number generators use a random seed, so it is >only necessary to set the seed explicitly if you need the calculation >to be exactly repeatable.

Again, I'm doing what I learned was wise, and probably is too elaborate with modern software.

Having the calculation exactly repeatable isn't too elaborate. It's advised so you can tell whether a different sample on a different run was normal random selection, or a change in logic. On the other hand, it's overdone for a simple k/n program.

I was taught to mistrust automatic seeding. Traditionally, it was done off the system clock. Depending on what bits of the clock counter were used, you could get correlated sequences from different seedings, especially if you did them too close together, before the clock had a chance to change much. (That's also one of the many reasons not to re-seed the random number generator during a run. I've posted on that, a couple of times.)

SPSS and other modern systems probably do something more sophisticated. I'll guess the system clock comes into it, since that's still a good source of lots of bit sequences. (You could, of course, look up on page 769 of the Providence telephone book. :-s)

Similarly, I'm not sure whether you'd now recommend discarding a number of variates after seeding. (Perhaps SPSS even does this, after auto-seeding.)

Back on 6 August, Richard Oliver gave a reference to the random-number algorithms SPSS uses: "The new random number generator is the Mersenne Twister, and the old one is now referred to as the "SPSS 12.0 compatible" random number generator." Could you say how SPSS random-initializes it, and what you now think about throwing away a few variates after seeding?

And thanks for correcting what I'd assumed about the menus.

-Cheers, Richard

Back to: Top of message | Previous page | Main SPSSX-L page