Date: Fri, 4 Aug 2006 10:41:21 -0400
Reply-To: Jonas Bilenas <jonas.bilenas@CHASE.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Jonas Bilenas <jonas.bilenas@CHASE.COM>
Subject: Re: Multiple regression help-dummy variables vs class variables
You can try to taking number stopped per year and week and try some time
series analysis on that variable. A good reference for Box-Jenkins is a
book by Alan Pankratz titled "Forecasting with Univariate Box - Jenkins
Models: Concepts and Cases (Wiley Series in Probability and Statistics)."
On Fri, 4 Aug 2006 09:57:45 -0400, Karen Intrachat <intrachat@GMAIL.COM>
wrote:
>I have five years, and that is all i have, but i have around 10,000
>inididual accounts that i am researching. In addition, the reason i am
>using weeks, is because I need to project out in weeks, how many people are
>going to stop for the year.
>
>and the thing is too...the data that i am looking at is not a time series
>data...I am basically looking over 5 years worht of data, for indidual
>accounts. Therefore i can't necessarily do lags in my series could i? not
>sure how to go about the box jenkins approach...
>
>
>
>On 8/4/06, Jonas Bilenas <jonas.bilenas@chase.com> wrote:
>>
>> Have you tried a Box-Jenkins approach to the data? Have you tired to
look
>> at month of duration as opposed to week?
>>
>> For seasonality, you will need a couple of years of data. One year will
>> not tell you anything about seasonality.
>>
>> On Fri, 4 Aug 2006 09:07:12 -0400, Karen Intrachat <intrachat@GMAIL.COM>
>> wrote:
>>
>> >Yes, I have looked at the relationship between duration and my dependent
>> >variable. I am making duration a dummy variable to see when the
patterns
>> or
>> >movements of the stop rates in each week, and when the are significant.
>> >Basically i want to look at the patterns, to see when the stop rate
>> >increases for a certain week, and when they are lower. It is basically
>> >saying, I want to look at seasonality, treating months as a dummy
>> variable.
>> >
>> >however, my problem is that I seem to be getting insignificant values
for
>> >certain duration wks, and therefore when I graph the movements or
>> patterns,
>> >it does not correctly mimic what is actually going on...in other words,
>> when
>> >the coefficient is insignificant, then the value is zero...and my graph
>> is
>> >just flat lined during those weeks..
>> >
>> >is there another way to go about this model that would more accurately
>> >portray stop rates over 52 weeks, given the price and term...etc?
>> >
>> >Karen
>> >
>> >On 8/4/06, Jonas Bilenas <jonas.bilenas@chase.com> wrote:
>> >>
>> >> First questions I have is why do you want to convert duration, which
>> seems
>> >> to be an interval or ratio number, into 52 dummy variables? Have you
>> >> looked at the relationship between duration and your dependent
>> variable?
>> >>
>> >> Jonas Bilenas
>> >> JP Morgan Chase
>> >>
>> >>
>> >> On Thu, 3 Aug 2006 17:00:46 -0400, Karen Intrachat
<intrachat@GMAIL.COM
>> >
>> >> wrote:
>> >>
>> >> >I am trying to predict the a stop rate given the Price Term Year and
>> >> >duration of the promotion. I am using duration as a dummy variable.
>> >> >
>> >> >If i did a multiple regression using a variable named "dur" that has
>> >> values
>> >> >0-52.
>> >> >
>> >> >If i make "dur" into a class variable and run my regression, will its
>> >> >coefficient estimates be different if did a regression and make "dur"
>> a
>> >> >dummy variable "dur0-dur52".
>> >> >
>> >> >Is this true? what is the difference?
>> >> >
>> >> >also what if i want to use an interaction term...
>> >> >how do i use it in my model...ie...
>> >> >
>> >> >proc glm data=data;
>> >> > model stop= price term dur1 dur2 dur1*term dur2*term
>> >> >end;
>> >> >
>> >> >where dur1 and dur2 are dummy variables....i am given coeffiecient
>> >> >estimates...and lets say price=1 term=2
>> >> >
>> >> >is this what i do?
>> >> >
>> >> >stop for dur1=1: stop = b1*(1) + b2*(2) + b3*(1) + b4*(0) +
>> b5*(1)*(2)
>> +
>> >> >b6*(0)*(2)
>> >> >
>> >> >or for the dur1*term interaction terms...would i have to on the side
>> >> >multiply duration*term...so would the equation be this given the
>> table
>> >> >below
>> >> >
>> >> >duration term --> duration*term
>> >> >1 2 (1*2)
>> >> >2 2 (2*2)
>> >> >
>> >> >then would my equation be
>> >> >
>> >> >stop= b1*(1) + b2*(2) + b3*(1) + b4*(0) + b5*(1*2) + b6*(2*2)?
>> >> >
>> >> >For some reason when i have been getting a very bad Rsquared value
>> when
>> i
>> >> do
>> >> >these regressions...and most of the time the coefficient estimates
>> have
>> >> high
>> >> >pvalues. if I set their coefficient estimates to zero for duration,
>> then
>> >> my
>> >> >model is not very reflective of what is going on...does anyone have
>> >> >suggestions?
>> >>
>>
|