LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (May 2007, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 14 May 2007 22:43:12 -0700
Reply-To:     David L Cassell <davidlcassell@MSN.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         David L Cassell <davidlcassell@MSN.COM>
Subject:      Re: When you have the population
In-Reply-To:  <d096b5010705141334q3c1a46deq8305088e82ec52b2@mail.gmail.com>
Content-Type: text/plain; format=flowed

imamx8@GMAIL.COM wrote: > >Dear all, > >I got a question to ask the stat geeks. When one has a data of the >population, say more than one million records, logistic regression >analysis or hierarchical modeling without sampling is OK? (Well, even >if you don't think it is the population as statisticians would say, >assume it is.) Aggregation of the records would loose many information >at individual level. Is random sampling the way to go even when the >population data is there? Could you link to some literature that deals >with very large database / population? People at census must have >something to say. *.:.* > >Thanks, >Imam

Sig has covered most of the major points. You may recognize one of the crotchety old statisticians he quoted. :-)

If you have all the observations from your target population, then your sample estimates will have zero variance. By definition. The FPCF (Finite Population Correction Factor) will scale the variances to zero for you. So there will be no point in bothering with hypothesis tests, or confidence intervals, or . . . .

So what is the purpose of the project? What are your data sources, and what scope do you want for your estimates/predictions ?

BTW, nobody gets *all* of their target population. Even the U.S. Census has an undercount. And the reasons why the data are not recorded are often directly related to the variables people wanted to collect...

HTH, David -- David L. Cassell mathematical statistician Design Pathways 3115 NW Norwood Pl. Corvallis OR 97330

_________________________________________________________________ PC Magazine’s 2007 editors’ choice for best Web mail—award-winning Windows Live Hotmail. http://imagine-windowslive.com/hotmail/?locale=en-us&ocid=TXT_TAGHM_migration_HM_mini_pcmag_0507


Back to: Top of message | Previous page | Main SAS-L page