Date: Tue, 15 Jan 2008 13:58:28 -0500
Reply-To: susie.li@BOEHRINGER-INGELHEIM.COM
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Susie C Y Li <susie.li@BOEHRINGER-INGELHEIM.COM>
Subject: Re: Question about the relationship between statistical
Population, Sample, training set, validation set and test set.
In-Reply-To: A<CA8F89971ADA9F47A6C915BA2397844207B423C1@MAILBE2.westat.com>
Content-Type: text/plain; charset="us-ascii"
I have a somewhat simplistic, yet practical answer to your question.
Say, if I equate SECONDARY DATA as from the 100% population, and PRIMARY DATA
as from sampling surveys (%of the population), then my choices will be:
Secondary/observational data sources -
Pro - The entire universe of data (100% sample, or census) are
available for analysis, therefore no errors or bias.
Con - Can answer only "What" and "Who" type of questions. For "Why"
or "How" type of questions, you'll have to build a model to
estimate or guess at the answers, which introduces errors in
estimation.
Primary/survey data sources -
Pro - Can ask "How" and "Why" types of questions directly.
Con - Expensive to conduct surveys. A survey ample size tends to be
small, resulting in sampling errors and biases (i.e., non-
responses, lies, biased samples). It is usually complicated to
adjust the results.
Susie CY Li
-----Original Message-----
From: Minze Su [mailto:slhappyls@GMAIL.COM]
Sent: Tuesday, January 15, 2008 7:53 AM
Subject: Re: Question about the relationship between statistical
Population, Sample, training set, validation set and test set.
Thank you!
You said sometime we observ the outcome of population. then I want to
add a question that if we have all outcomes for every observations in
population, what is the goal of building a predictive model. What else
does our model building for since we dont have observation left to
predict.
|