Date: Wed, 25 Jan 2006 11:21:07 -0500
Reply-To: Pavlo Row <pavlo@INORBIT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Pavlo Row <pavlo@INORBIT.COM>
Subject: Re: SAS Data Step as prelude to logistic modeling
Content-Type: text/plain; charset="iso-8859-1"
Thank you so much Jiann-Shiun for your code to my question. You'rwe very kind.
Pavlo
>>>
Pavlo:
The following will give you what you want and the output is at the
end. A macro variable FCount is defined as the number of fields that
you intend to calculate DIFF and AVG. In the example given, it is 3.
.
.
.
>>>
>>> Pavlo Row <pavlo@INORBIT.COM> 1/20/2006 2:38:43 PM >>>
Hello All,
Please copy and paste the following example data set. In this example
data set I have a MONTH field (MONTH takes on only APRIL & MAY 2005
values), a unique identifier, ID, TARGET variable (takes on NO/YES or
0/1 representing customer response to MAY 2005 campaign). Next I show
only three fields out of very many fields like 400 fields or so. I chose
three representative fields to show here: # of credit cards customer had
in APRIL and MAY 2005, customer bank balance in APRIL and MAY 2005, and
the # of months the customer had lived in his house in APRIL & MAY 2005.
Again, in the real data set I have many such fields.
data foo;
input MONTH $ ID TARGET NUM_CREDIT_CARDS BALANCE MONTHS_RESIDENCE;
cards;
APR05 1 . 1 57103 24
MAY05 1 0 2 1516 25
APR05 2 . 2 2468 92
MAY05 2 0 1 309 93
APR05 3 . 1 7672 74
MAY05 3 0 0 0 .
APR05 4 . 1 53073 127
MAY05 4 1 1 6379 128
APR05 5 . 4 24894 36
MAY05 5 0 0 9859 37
APR05 6 . 0 12 164
MAY05 6 0 1 1699 165
APR05 7 . 2 30248 24
MAY05 7 1 3 1625 25
APR05 8 . 2 45516 345
MAY05 8 1 1 591 346
APR05 9 . 2 5391 216
MAY05 9 0 3 6262 217
APR05 10 . 2 4857 252
MAY05 10 1 0 14457 .
;
run;
I use the above example data set to compute additional fields which I
call DIFF1 & AVG1, DIFF2 & AVG2, DIFF3 & AVG3, ..., DIFF400 & AVG400
(400 since I have 400 such numeric fields)
If you run the following code it will become clear how I come up with
these fields. For DIFF1, for example, I just take the difference between
MAY and APRIL and I assign the answer to MAY becasue that's where I have
the TARGET variable populated. Later on, once I compute DIFF1 & AVG1,
etc., I will delete APRIL and I will be left with only MAY. Then I can
build a logistic model based on DIFF1 & AVG1, etc.
data TEST(drop=TEMP1-TEMP3);
set foo;
by ID;
if last.ID then temp1=NUM_CREDIT_CARDS;
DIFF1=temp1-lag(NUM_CREDIT_CARDS);
AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
if last.ID then temp2=BALANCE;
DIFF2=temp2-lag(BALANCE);
AVG2=(temp2+lag(BALANCE))/2;
if last.ID then temp3=MONTHS_RESIDENCE;
DIFF3=temp3-lag(MONTHS_RESIDENCE);
AVG3=(temp3+lag(MONTHS_RESIDENCE))/2;
run;
This is my question: How can I automate the above step? I mean, can you
imagine sitting here typing the above lines 400 times. That would be 400
times for
DIFF1=temp1-lag(NUM_CREDIT_CARDS);
AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
400 times for
DIFF2=temp2-lag(BALANCE);
AVG2=(temp2+lag(BALANCE))/2;
.
.
.
400 times for
DIFF400=temp400-lag(BALANCE);
AVG400=(temp400+lag(BALANCE))/2;
Again, the answer fields
DIFF1=temp1-lag(BALANCE);
AVG1=(temp1+lag(BALANCE))/2;
and so on will be used in logistic modeling.
Thanks.
P.
--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/