LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Wed, 25 Jan 2006 11:21:07 -0500
Reply-To:   Pavlo Row <pavlo@INORBIT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Pavlo Row <pavlo@INORBIT.COM>
Subject:   Re: SAS Data Step as prelude to logistic modeling
Content-Type:   text/plain; charset="iso-8859-1"

Thank you so much Jiann-Shiun for your code to my question. You'rwe very kind. Pavlo

>>> Pavlo:

The following will give you what you want and the output is at the end. A macro variable FCount is defined as the number of fields that you intend to calculate DIFF and AVG. In the example given, it is 3. . . . >>>

>>> Pavlo Row <pavlo@INORBIT.COM> 1/20/2006 2:38:43 PM >>> Hello All,

Please copy and paste the following example data set. In this example data set I have a MONTH field (MONTH takes on only APRIL & MAY 2005 values), a unique identifier, ID, TARGET variable (takes on NO/YES or 0/1 representing customer response to MAY 2005 campaign). Next I show only three fields out of very many fields like 400 fields or so. I chose three representative fields to show here: # of credit cards customer had in APRIL and MAY 2005, customer bank balance in APRIL and MAY 2005, and the # of months the customer had lived in his house in APRIL & MAY 2005. Again, in the real data set I have many such fields.

data foo; input MONTH $ ID TARGET NUM_CREDIT_CARDS BALANCE MONTHS_RESIDENCE; cards; APR05 1 . 1 57103 24 MAY05 1 0 2 1516 25 APR05 2 . 2 2468 92 MAY05 2 0 1 309 93 APR05 3 . 1 7672 74 MAY05 3 0 0 0 . APR05 4 . 1 53073 127 MAY05 4 1 1 6379 128 APR05 5 . 4 24894 36 MAY05 5 0 0 9859 37 APR05 6 . 0 12 164 MAY05 6 0 1 1699 165 APR05 7 . 2 30248 24 MAY05 7 1 3 1625 25 APR05 8 . 2 45516 345 MAY05 8 1 1 591 346 APR05 9 . 2 5391 216 MAY05 9 0 3 6262 217 APR05 10 . 2 4857 252 MAY05 10 1 0 14457 . ; run;

I use the above example data set to compute additional fields which I call DIFF1 & AVG1, DIFF2 & AVG2, DIFF3 & AVG3, ..., DIFF400 & AVG400 (400 since I have 400 such numeric fields)

If you run the following code it will become clear how I come up with these fields. For DIFF1, for example, I just take the difference between MAY and APRIL and I assign the answer to MAY becasue that's where I have the TARGET variable populated. Later on, once I compute DIFF1 & AVG1, etc., I will delete APRIL and I will be left with only MAY. Then I can build a logistic model based on DIFF1 & AVG1, etc.

data TEST(drop=TEMP1-TEMP3); set foo; by ID; if last.ID then temp1=NUM_CREDIT_CARDS;

DIFF1=temp1-lag(NUM_CREDIT_CARDS);

AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2; if last.ID then temp2=BALANCE; DIFF2=temp2-lag(BALANCE); AVG2=(temp2+lag(BALANCE))/2; if last.ID then temp3=MONTHS_RESIDENCE;

DIFF3=temp3-lag(MONTHS_RESIDENCE);

AVG3=(temp3+lag(MONTHS_RESIDENCE))/2; run;

This is my question: How can I automate the above step? I mean, can you imagine sitting here typing the above lines 400 times. That would be 400 times for DIFF1=temp1-lag(NUM_CREDIT_CARDS); AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2; 400 times for DIFF2=temp2-lag(BALANCE); AVG2=(temp2+lag(BALANCE))/2; . . .

400 times for DIFF400=temp400-lag(BALANCE); AVG400=(temp400+lag(BALANCE))/2;

Again, the answer fields DIFF1=temp1-lag(BALANCE); AVG1=(temp1+lag(BALANCE))/2; and so on will be used in logistic modeling.

Thanks.

P.

-- ___________________________________________________ Play 100s of games for FREE! http://games.mail.com/


Back to: Top of message | Previous page | Main SAS-L page