Date: Mon, 17 Oct 2005 17:07:08 -0400
Reply-To: Talbot Michael Katz <topkatz@MSN.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Talbot Michael Katz <topkatz@MSN.COM>
Subject: Re: PROC SURVEYSELECT with weights--How to do
Hi.
Well, since I got you into this mess...
* separate the responders and non-responders if not already done, this may
be faster than sorting;
DATA view_resp_order / VIEW = view_resp_order;
SET InputDS (where = (response = 0)) InputDS (where = (response = 1));
RUN;
* generate 3:1 random sample by response type;
PROC SURVEYSELECT DATA = view_resp_order OUT = mod_samp SAMPSIZE = (3000
1000) METHOD = SRS;
STRATA response;
RUN;
* logistic model with samplingweight automatically generated by
SURVEYSELECT;
PROC LOGISTIC DATA = mod_samp OUTEST = mod_samp_outest;
MODEL response = &predictor_list. / SELECTION = S
WEIGHT SamplingWeight;
RUN;
You may have a lot of other options you want to set in PROC LOGISTIC, and
you may need to run it with several different scenarios, but this
demonstrates the use of the SamplingWeight variable that PROC SURVEYSELECT
generates for you. By the way, I WAS JUST KIDDING about "SELECTION = S"!
That is officially prohibited on SAS-L.
Good luck!
-- TMK --
"The Macro Klutz"
On Mon, 17 Oct 2005 15:15:06 -0500, Nick . <ni14@MAIL.COM> wrote:
>Hello,
>
>This question pertains to bank-related direct marketing campaigns. We
sent out a campaign and we wait for the results: RESPONDERS and
NONRESPONDERS.
>
>I would like to learn how to do the following—here is an example so
I can understand. The numbers are made-up but enough realistic.
>
>We sent out a campaign randomly to 537,881 prospects. [Side Note: We also
put aside a control group of 31,982 prospects and they will not be
campaigned to, we will record those who respond on their own.]
>
>After a few months the results come back and out of the 537,881 prospects
only 1,199 respond. Hence a response rate of about 0.23%. The control
group produces 68 responders, which is a 0.24% response.
>
>We see that the campaign materials (creative, offer/product extended,
etc.) didn’t do much of anything compared to the control.
>
>I want to use the 537,881 records above along with the 1,199 RESPONDERS
to built a predictive model to be used in the next campaign of the same
product, creative, etc.
>
>Here is the idea per (-- TMK -- "The Macro Klutz" ):
>
>He suggests that instead of building the model with such a low response
rate of 0.23%, why don’t you use PROC SURVESELECT and tell this
procedure to turn the 0.23% into, say, 20%. This procedure will then
output the appropriate weights to be used in the modeling process.
>
>Can someone please show me how you tell PROC SURVELYSELECT to do this?
>Also, using PROC LOGISTIC, how are the weights from PROC SURVEYSELECT
used in PROC LOGISTIC? Are they used as a class variable? I do not know
what to do with the weights. I do know that the weights must be used
correctly so that the true RESPONDER rate is not 20% but the real one
which is 0.23%. That’s why I need to be careful with the weights to
make sure the model coefficients don’t reflect a 20% response but
rather a 0.23% response.
>
>Thanks.
>
>NICK
>
>
>--
>___________________________________________________
>Play 100s of games for FREE! http://games.mail.com/
|