Date: Wed, 17 Nov 2004 09:47:38 -0800 Reply-To: Dale McLerran Sender: "SAS(r) Discussion" Comments: DomainKeys? See http://antispam.yahoo.com/domainkeys From: Dale McLerran Subject: Re: RE : Random selection Comments: To: Sigurd Hermansen In-Reply-To: <446DDE75CFC7E1438061462F85557B0F0613E7BF@remail2.westat.com> Content-Type: text/plain; charset=us-ascii Sig, I am not David, but I will still put my foot in here. --- Sigurd Hermansen wrote: > David: > I know that you enjoy playing with our minds. OK. So the distribution > of > ranuni(0)<150/totalobs > is binomial (and I assume that you mean for totalobs>150). What does > that mean? The above sampling approach results in M observations selected, where M is a random variable distributed BINOM(N, 150/N) for N=totalobs. The random variable M has expectation 150 and variance N*P*(1-P), where P=150/N. > For the constant 150, does that approximate a normal > distribution? For all practical purposes, that is almost certainly true. If we perform the experiment ad infinitum, then we would observe that the distribution is a little long in the tails, regardless of the value of totalobs. However, the amount of departure from a normal distribution will be quite small, such that if we repeated the experiment only 30 times, we would almost certainly not reject the hypothesis that the values were normally distributed. One would observe, though, that the values are all integer, which in and of itself is some indication that the random variable M is not truly normally distributed. > What happens to the distribution if you generate use 200/totalobs to > generate approximately 200 observations and then truncate the output > to exactly 150 obs? That I don't want to comtemplate. You certainly do not get any standard probability sample. > What number x/totalobs would guarantee that the > process would generate at least 150 obs? Oh, that is easy. If we take x=totalobs, we are guaranteed that we will observe M>=150 (for totalobs>=150). Otherwise, there is no guarantee of returning at least 150 observations. We can have an extremely high probability of returning at least 150 observations, but a guarantee of returning M>=150 only when we do not allow a probability sample. > What value does knowing that the > number of > rows generated by the program has a binomial distribution? The value of this knowledge is that one recognizes immediately that the number of observations returned by that sampling method is not fixed at the desired sample size. If we sample more than 150 observations, we may go over budget in the subsequent use of those samples. If we do not obtain 150 observations, then we may not have the desired precision for some subsequent statistic which we compute from the observed sample. ===== --------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 --------------------------------------- __________________________________ Do you Yahoo!? The all-new My Yahoo! - Get yours free! http://my.yahoo.com

Back to: Top of message | Previous page | Main SAS-L page