Date: Wed, 25 Jan 2006 11:36:47 -0500
Reply-To: Pavlo Row <pavlo@INORBIT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Pavlo Row <pavlo@INORBIT.COM>
Subject: Re: SAS Data Step as prelude to logistic modeling (Conditional
Logistic Regression ?)
Content-Type: text/plain; charset="iso-8859-1"
Thank you for your comments and the attached sas code Zack. But, just to say, I haven't received any previous comment from you pertaining to this post. I was out for a few days, so I don't know what was going on with SAS-L list. This is the first time I get a note for this post from you. Maybe there was another person with a similar post or maybe you've answered to my previous posts very related to this post. Also, my real data set is as shown below (foo) but it has like 1 million records in it and like 400 numeric fields and almost all fields have some data missing. What I posted below is anly 20 obs as for example only. But, what you say below about using PROC PHREG to perform
conditional logistic regression on my data sounds very educational to me. I will study your code and I will repost simce I've never used PROC PHREG before in my life. Again, thank you very much for your input. It is so helpful in terms of enhancing my stat. knowledge.
Pavlo
>>>>>>
From: "Zack, Matthew M." <mmz1@cdc.gov>
To: "Pavlo Row" <pavlo@INORBIT.COM>
CC:
Subject: RE: SAS Data Step as prelude to logistic modeling
Date: Sun, 22 Jan 2006 17:03:57 -0500
As I e-mailed you previously, you can use PROC PHREG to perform
conditional logistic regression on your data (cf., attachments).
You do not have to create difference variables for this to work.
Because your data has so few observations, and because some of these
had missing values, PROC PHREG had difficulty converging if all
three variables were specified as independent variables; however,
it would converge if only one of these variables were specified
as an independent variable. Adding more observations would permit
convergence with all three independent variables.
Matthew Zack
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Pavlo Row
Sent: Friday, January 20, 2006 3:39 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: SAS Data Step as prelude to logistic modeling
Hello All,
Please copy and paste the following example data set. In this example
data set I have a MONTH field (MONTH takes on only APRIL & MAY 2005
values), a unique identifier, ID, TARGET variable (takes on NO/YES or
0/1 representing customer response to MAY 2005 campaign). Next I show
only three fields out of very many fields like 400 fields or so. I chose
three representative fields to show here: # of credit cards customer had
in APRIL and MAY 2005, customer bank balance in APRIL and MAY 2005, and
the # of months the customer had lived in his house in APRIL & MAY 2005.
Again, in the real data set I have many such fields.
data foo;
input MONTH $ ID TARGET NUM_CREDIT_CARDS BALANCE MONTHS_RESIDENCE;
cards;
APR05 1 . 1 57103 24
MAY05 1 0 2 1516 25
APR05 2 . 2 2468 92
MAY05 2 0 1 309 93
APR05 3 . 1 7672 74
MAY05 3 0 0 0 .
APR05 4 . 1 53073 127
MAY05 4 1 1 6379 128
APR05 5 . 4 24894 36
MAY05 5 0 0 9859 37
APR05 6 . 0 12 164
MAY05 6 0 1 1699 165
APR05 7 . 2 30248 24
MAY05 7 1 3 1625 25
APR05 8 . 2 45516 345
MAY05 8 1 1 591 346
APR05 9 . 2 5391 216
MAY05 9 0 3 6262 217
APR05 10 . 2 4857 252
MAY05 10 1 0 14457 .
;
run;
I use the above example data set to compute additional fields which I
call DIFF1 & AVG1, DIFF2 & AVG2, DIFF3 & AVG3, ..., DIFF400 & AVG400
(400 since I have 400 such numeric fields)
If you run the following code it will become clear how I come up with
these fields. For DIFF1, for example, I just take the difference between
MAY and APRIL and I assign the answer to MAY becasue that's where I have
the TARGET variable populated. Later on, once I compute DIFF1 & AVG1,
etc., I will delete APRIL and I will be left with only MAY. Then I can
build a logistic model based on DIFF1 & AVG1, etc.
data TEST(drop=TEMP1-TEMP3);
set foo;
by ID;
if last.ID then temp1=NUM_CREDIT_CARDS;
DIFF1=temp1-lag(NUM_CREDIT_CARDS);
AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
if last.ID then temp2=BALANCE;
DIFF2=temp2-lag(BALANCE);
AVG2=(temp2+lag(BALANCE))/2;
if last.ID then temp3=MONTHS_RESIDENCE;
DIFF3=temp3-lag(MONTHS_RESIDENCE);
AVG3=(temp3+lag(MONTHS_RESIDENCE))/2;
run;
This is my question: How can I automate the above step? I mean, can you
imagine sitting here typing the above lines 400 times. That would be 400
times for
DIFF1=temp1-lag(NUM_CREDIT_CARDS);
AVG1=(temp1+lag(NUM_CREDIT_CARDS))/2;
400 times for
DIFF2=temp2-lag(BALANCE);
AVG2=(temp2+lag(BALANCE))/2; .
.
.
400 times for
DIFF400=temp400-lag(BALANCE);
AVG400=(temp400+lag(BALANCE))/2;
Again, the answer fields
DIFF1=temp1-lag(BALANCE);
AVG1=(temp1+lag(BALANCE))/2; and so on will be used in
logistic modeling.
Thanks.
P.
--
___________________________________________________
Play 100s of games for FREE! http://games.mail.com/