**Date:** Sat, 9 Dec 2006 15:30:17 -0500
**Reply-To:** Richard Ristow <wrristow@mindspring.com>
**Sender:** "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
**From:** Richard Ristow <wrristow@mindspring.com>
**Subject:** Re: guessing mean of bounded variable with 1:30 sampling ratio
**In-Reply-To:** <7.0.0.16.2.20061201115818.021fb5e0@unibo.it>
**Content-Type:** text/plain; charset="us-ascii"; format=flowed
At 06:11 AM 12/8/2006, Nicola Baldini asked:

>I have a population of N=12000. I want to know the mean (and possibly
>the standard deviation) of a variable x, bounded between 1 and 7. I
>took a (let's suppose random) sample of n=400 and estimated mean =
>3.14 (standard error = .15) and standard deviation = 2.28. Can I trust
>such estimates?

To which, at 10:41 AM 12/9/2006, Stephen Brand replied:

>You have the Central Limit Theorem working for you here. Even though
>the
>distribution of individual cases is not normal, the distribution of
>sample
>means (with a sample size of 400) will approximate the normal
>distribution
>and should provide you with a reasonable estimate of the population
>mean and the standard error of the means of samples of 400 cases.

To which I'll add, the Central Limit Theorem has an important ally
here. Because your population mean and standard deviation are bounded
(1<=mean<7; 0<=SD<=2.5, if my arithmetic's right*), convergence should
be rapid, plenty good enough with n=400. THAT's not your problem.

Here's your problem: "I took a (let's suppose random) sample of n=400."
Nope; no supposing. The arguments using the Law of Large Numbers and
Central Limit Theorem only apply if the sample is random. You need to
have a decent argument that you have a random sample, or at least that
your sampling distribution is independent of the variable x.

You wave a big red flag: "I need to state formally that, despite a
ridiculous response rate, my research is not that bad." 'Response
rate': your 400 are respondents to a survey? How many did you survey -
all 12,000? If so, there's no practical chance that a 3% response rate
is a random sample of the population, not even approximately.

If you sampled a fraction, selected randomly, and had a higher response
rate within that fraction, you may have a good argument. Otherwise, I'm
afraid not likely.