Date: Wed, 17 Nov 2004 09:47:38 -0800
Reply-To: Dale McLerran <stringplayer_2@YAHOO.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Dale McLerran <stringplayer_2@YAHOO.COM>
Subject: Re: RE : Random selection
Content-Type: text/plain; charset=us-ascii
I am not David, but I will still put my foot in here.
--- Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote:
> I know that you enjoy playing with our minds. OK. So the distribution
> is binomial (and I assume that you mean for totalobs>150). What does
> that mean?
The above sampling approach results in M observations selected,
where M is a random variable distributed BINOM(N, 150/N) for
N=totalobs. The random variable M has expectation 150 and
variance N*P*(1-P), where P=150/N.
> For the constant 150, does that approximate a normal
For all practical purposes, that is almost certainly true. If
we perform the experiment ad infinitum, then we would observe
that the distribution is a little long in the tails, regardless
of the value of totalobs. However, the amount of departure from
a normal distribution will be quite small, such that if we
repeated the experiment only 30 times, we would almost
certainly not reject the hypothesis that the values were
normally distributed. One would observe, though, that the
values are all integer, which in and of itself is some
indication that the random variable M is not truly normally
> What happens to the distribution if you generate use 200/totalobs to
> generate approximately 200 observations and then truncate the output
> to exactly 150 obs?
That I don't want to comtemplate. You certainly do not get any
standard probability sample.
> What number x/totalobs would guarantee that the
> process would generate at least 150 obs?
Oh, that is easy. If we take x=totalobs, we are guaranteed that
we will observe M>=150 (for totalobs>=150). Otherwise, there
is no guarantee of returning at least 150 observations. We can
have an extremely high probability of returning at least 150
observations, but a guarantee of returning M>=150 only when we
do not allow a probability sample.
> What value does knowing that the
> number of
> rows generated by the program has a binomial distribution?
The value of this knowledge is that one recognizes immediately
that the number of observations returned by that sampling
method is not fixed at the desired sample size. If we sample
more than 150 observations, we may go over budget in the
subsequent use of those samples. If we do not obtain 150
observations, then we may not have the desired precision for
some subsequent statistic which we compute from the observed
Fred Hutchinson Cancer Research Center
Ph: (206) 667-2926
Fax: (206) 667-5977
Do you Yahoo!?
The all-new My Yahoo! - Get yours free!