LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2008, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 15 Jan 2008 13:58:28 -0500
Reply-To:     susie.li@BOEHRINGER-INGELHEIM.COM
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Susie C Y Li <susie.li@BOEHRINGER-INGELHEIM.COM>
Subject:      Re: Question about the relationship between statistical
              Population, Sample, training set, validation set and test set.
In-Reply-To:  A<CA8F89971ADA9F47A6C915BA2397844207B423C1@MAILBE2.westat.com>
Content-Type: text/plain; charset="us-ascii"

I have a somewhat simplistic, yet practical answer to your question.

Say, if I equate SECONDARY DATA as from the 100% population, and PRIMARY DATA as from sampling surveys (%of the population), then my choices will be:

Secondary/observational data sources - Pro - The entire universe of data (100% sample, or census) are available for analysis, therefore no errors or bias. Con - Can answer only "What" and "Who" type of questions. For "Why" or "How" type of questions, you'll have to build a model to estimate or guess at the answers, which introduces errors in estimation. Primary/survey data sources - Pro - Can ask "How" and "Why" types of questions directly. Con - Expensive to conduct surveys. A survey ample size tends to be small, resulting in sampling errors and biases (i.e., non- responses, lies, biased samples). It is usually complicated to adjust the results.

Susie CY Li

-----Original Message----- From: Minze Su [mailto:slhappyls@GMAIL.COM] Sent: Tuesday, January 15, 2008 7:53 AM Subject: Re: Question about the relationship between statistical Population, Sample, training set, validation set and test set.

Thank you!

You said sometime we observ the outcome of population. then I want to add a question that if we have all outcomes for every observations in population, what is the goal of building a predictive model. What else does our model building for since we dont have observation left to predict.


Back to: Top of message | Previous page | Main SAS-L page