Date: Sat, 9 May 2009 07:02:28 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
Subject: Oversampling Questions. Please help
Content-Type: text/plain; charset=ISO-8859-1
1. When selecting observations from good data, how many observation is
ideal. In the other word, what is best good vs. bad ratio after
2. I always deal with very rare even projects. I always have concern
whether my random sample from the good data represents the whole
population well (let's say 3k out of 500k)?
Is there a way that I can select the sample that has similar
characteristics of the whole population?
for example similar mean and similar variance
3. Currently, how I am dealing with the issue above is to bootstrap
the data and get n samples, run the regression/decision tree n times
and ensemble n models in EM. But the results are not as expected.I am
wondering if anyone has the same experience and help me out?
Can any expert help?
Thanks in advance.