Date: Wed, 25 Jan 2006 12:09:38 -0600
Reply-To: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jiann-Shiun Huang <Jiann-Shiun.Huang@AMERUS.COM>
Subject: Re: SAS Data Step as prelude to logistic modeling
Content-Type: text/plain; charset=ISO-8859-1
Pavlo:
I like the idea about automation. It will be a good exercise for me to use dictionary table in PROC SQL for the extension after work.
J S Huang
1-515-557-3987
fax 1-515-557-2422
>>> Pavlo Row <pavlo@INORBIT.COM> 1/25/2006 10:58:10 AM >>>
Hi and thanks for the idea Gerhard. Another responder to this question, Jiann-Shiun (below) showed me this way with a macro like you say here. One idea that has also crossed my mind is if you or Jiann-Shiun or someone else could actually advance the code a step higher in terms of automation: Instead of listing all fields in an array (followed by a loop, of course), would it be possible to grab the field name from the dataset then do the computations DIFF1 & AVG1. When done with this one, then grab the next field name from the data set and compute DIFF2 & AVG2, and so on.
This would save additional steps of actually typing (well not actually typing but just a little work needs to be done) the 400 or 1,000 or how ever many field names in the array.
P.
>>>>>>
From: "Gerhard Hellriegel" <ghellrieg@T-ONLINE.DE> Save Address Block Sender This Is Spam
To: SAS-L@LISTSERV.UGA.EDU
CC:
Subject: Re: SAS Data Step as prelude to logistic modeling
Date: Wed, 25 Jan 2006 11:35:55 -0500
Show Full Headers Back To [INBOX]
That seems to be a thing for a macro.
I assume, you have already a dataet with the numeric variables you want to
use in your calculation.
I assume also, you know about the exact number (you can also make that
dynamic, but it's easier if you don't need to...)
So the problem is, how I understand that, to repeat things like
DIFF1=temp1-lag(NUM_CREDIT_CARDS);
AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2
and others 400 times.
Make a macro:
%macro x;
data ...
set ....
if....
...
%do i=1 %to 400;
DIFF&i=temp&i-lag(NUM_CREDIT_CARDS);
AVG&i=(temp&i+lag(NUM_CREDIT_CARDS))/2;
%end;
...
run;
%mend; %x;
That simply does the repeating and numbering of the variables for you.
>>>>>>
On Wed, 25 Jan 2006 11:21:07 -0500, Pavlo Row <pavlo@INORBIT.COM> wrote:
> Thank you so much Jiann-Shiun for your code to my question. You'rwe
> very kind.
> Pavlo
>
>>>>
> Pavlo:
>
> The following will give you what you want and the output is at the
> end. A macro variable FCount is defined as the number of fields that
> you intend to calculate DIFF and AVG. In the example given, it is 3.
> .
> .
> .
>>>>
>
>>>> Pavlo Row <pavlo@INORBIT.COM> 1/20/2006 2:38:43 PM >>>
> Hello All,
>
> Please copy and paste the following example data set. In this example
> data set I have a MONTH field (MONTH takes on only APRIL & MAY 2005
> values), a unique identifier, ID, TARGET variable (takes on NO/YES or
> 0/1 representing customer response to MAY 2005 campaign). Next I show
> only three fields out of very many fields like 400 fields or so. I chose
> three representative fields to show here: # of credit cards customer had
> in APRIL and MAY 2005, customer bank balance in APRIL and MAY 2005, and
> the # of months the customer had lived in his house in APRIL & MAY 2005.
> Again, in the real data set I have many such fields.
>
> data foo;
> input MONTH $ ID TARGET NUM_CREDIT_CARDS BALANCE MONTHS_RESIDENCE;
> cards;
> APR05 1 . 1 57103 24
> MAY05 1 0 2 1516 25
> APR05 2 . 2 2468 92
> MAY05 2 0 1 309 93
> APR05 3 . 1 7672 74
> MAY05 3 0 0 0 .
> APR05 4 . 1 53073 127
> MAY05 4 1 1 6379 128
> APR05 5 . 4 24894 36
> MAY05 5 0 0 9859 37
> APR05 6 . 0 12 164
> MAY05 6 0 1 1699 165
> APR05 7 . 2 30248 24
> MAY05 7 1 3 1625 25
> APR05 8 . 2 45516 345
> MAY05 8 1 1 591 346
> APR05 9 . 2 5391 216
> MAY05 9 0 3 6262 217
> APR05 10 . 2 4857 252
> MAY05 10 1 0 14457 .
> ;
> run;
>
>
> I use the above example data set to compute additional fields which I
> call DIFF1 & AVG1, DIFF2 & AVG2, DIFF3 & AVG3, ..., DIFF400 & AVG400
> (400 since I have 400 such numeric fields)
>
> If you run the following code it will become clear how I come up with
> these fields. For DIFF1, for example, I just take the difference between
> MAY and APRIL and I assign the answer to MAY becasue that's where I have
> the TARGET variable populated. Later on, once I compute DIFF1 & AVG1,
> etc., I will delete APRIL and I will be left with only MAY. Then I can
> build a logistic model based on DIFF1 & AVG1, etc.
>
> data TEST(drop=TEMP1-TEMP3);
> set foo;
> by ID;
> if last.ID then temp1=NUM_CREDIT_CARDS;
>
> DIFF1=temp1-lag(NUM_CREDIT_CARDS);
>
> AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
> if last.ID then temp2=BALANCE;
> DIFF2=temp2-lag(BALANCE);
> AVG2=(temp2+lag(BALANCE))/2;
> if last.ID then temp3=MONTHS_RESIDENCE;
>
> DIFF3=temp3-lag(MONTHS_RESIDENCE);
>
> AVG3=(temp3+lag(MONTHS_RESIDENCE))/2;
> run;
>
>
> This is my question: How can I automate the above step? I mean, can you
> imagine sitting here typing the above lines 400 times. That would be 400
> times for
> DIFF1=temp1-lag(NUM_CREDIT_CARDS);
> AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
> 400 times for
> DIFF2=temp2-lag(BALANCE);
> AVG2=(temp2+lag(BALANCE))/2;
> .
> .
> .
>
> 400 times for
> DIFF400=temp400-lag(BALANCE);
> AVG400=(temp400+lag(BALANCE))/2;
>
>
> Again, the answer fields
> DIFF1=temp1-lag(BALANCE);
> AVG1=(temp1+lag(BALANCE))/2;
> and so on will be used in logistic modeling.
>
> Thanks.
>
> P.
>
>
> --
> ___________________________________________________
> Play 100s of games for FREE! http://games.mail.com/
As AttachmentInline Text MyFolder Previous | Next | Back To [INBOX]
--------------------------------------------------------------------------------
© Copyright 2006 mail.com Corp. All rights reserved.
--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/