Date: Mon, 14 May 2007 22:43:12 -0700
Reply-To: David L Cassell <davidlcassell@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: David L Cassell <davidlcassell@MSN.COM>
Subject: Re: When you have the population
Content-Type: text/plain; format=flowed
>I got a question to ask the stat geeks. When one has a data of the
>population, say more than one million records, logistic regression
>analysis or hierarchical modeling without sampling is OK? (Well, even
>if you don't think it is the population as statisticians would say,
>assume it is.) Aggregation of the records would loose many information
>at individual level. Is random sampling the way to go even when the
>population data is there? Could you link to some literature that deals
>with very large database / population? People at census must have
>something to say. *.:.*
Sig has covered most of the major points. You may recognize one
of the crotchety old statisticians he quoted. :-)
If you have all the observations from your target population, then
your sample estimates will have zero variance. By definition. The
FPCF (Finite Population Correction Factor) will scale the variances
to zero for you. So there will be no point in bothering with hypothesis
tests, or confidence intervals, or . . . .
So what is the purpose of the project? What are your data sources,
and what scope do you want for your estimates/predictions ?
BTW, nobody gets *all* of their target population. Even the U.S.
Census has an undercount. And the reasons why the data are not
recorded are often directly related to the variables people wanted to
David L. Cassell
3115 NW Norwood Pl.
Corvallis OR 97330
PC Magazine’s 2007 editors’ choice for best Web mail—award-winning Windows