|Date: ||Sat, 27 Dec 2008 07:26:37 -0500|
|Reply-To: ||Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>|
|Sender: ||"SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>|
|From: ||Muthia Kachirayan <muthia.kachirayan@GMAIL.COM>|
|Subject: ||Re: How to save the slope of a regression model to a dataset?|
|Content-Type: ||text/plain; charset=WINDOWS-1252|
On Fri, Dec 26, 2008 at 11:04 PM, Arthur Tabachneck <email@example.com>wrote:
> For the OP's requirements, your suggested solution is clearly the most
> efficient in terms of the computational resources required.
> However, if the request had required multiple regression, then I would opt
> for Howard's originally quoted approach. I think it would be a lot less
> prone to potential computational error, as one would only have to sum the
> betas produced by outtest to get the slope. Howard's solution, if it
> hasn't already been posted in this thread, was:
> *create sample data for 10,000 days;
> data sample (keep = date x y z);
> do date = '10AUG1981'd to '25DEC2008'd;
> format date date9.;
> x = ceil( 10 * ranuni(123) );
> y = 2 + 3 * x + round( rannor(123) );
> *Suppose that each individual regression is to operate on at most 20 days;
> %let ndays=20;
> *Expand the data;
> data expanded(drop = nn);
> length span $ 19;
> format start_date date9.;
> set sample;
> if _n_ lt 20 then start=date-_n_+1;
> else start=date-19;
> do nn = start to date;
> span = catx( '-'
> , put(nn,date9.)
> , put(nn+19,date9.)
> proc sort data=expanded;
> by start_date date;
> *Finally, run all of the regressions independently in a single step;
> proc reg data=expanded noprint
> by start_date;
> model y = x;
> On Thu, 25 Dec 2008 12:06:50 -0500, Muthia Kachirayan
> <muthia.kachirayan@GMAIL.COM> wrote:
> >On Thu, Dec 25, 2008 at 11:08 AM, Bill West <firstname.lastname@example.org> wrote:
> >> On Wed, 24 Dec 2008 18:42:30 -0500, Muthia Kachirayan
> >> <muthia.kachirayan@GMAIL.COM> wrote:
> >> >On Wed, Dec 24, 2008 at 2:11 PM, ST <email@example.com> wrote:
> >> >
> >> >> Hi there,
> >> >>
> >> >> I have a dataset which has 10000 observations. For each obs, I want
> >> >> create a regression model (based on variable x and y in the dataset)
> >> using
> >> >> previous 20 observations and save the estimated slope as a new
> >> in
> >> >> the dataset. Totally, I will have 9980 slopes.
> >> >>
> >> >> Can someone show me how to do this efficiently?
> >> >>
> >> >> Many thanks,
> >> >>
> >> >> Hu
> >> >>
> >> >
> >> Hi Hu ,
> >> I see you've received several suggested methods and I may be showing my
> >> ignorance but I'm not sure what you mean by “ save the estimated slope
> >> a new variable.
> >> if you mean a predicted value
> >> I believe the regout dataset below will include a predicted value for
> >> each obs. Is that what you want?
> >> data expanded.;
> >> set file;
> >> if 1< =_N_ <=20 then span=1;
> >> if 21< =_N_ <=30 then span=2;
> >> if 31< =_N_ <=40 then span=3;
> >> etc up to 500;
> >> proc reg data=expanded noprint outest=regout ;
> >> by span;
> >> model x =y;
> >The OP wanted the slope for each of the set of obsevations (_N_ ) between
> >1 to 20,
> >2 to 21,
> >3 to 22,
> >4 to 23,
> > 10,081 to 10,000.
> >This is similar to finding Moving Averages, having overlapping
> >leaving one at the left and adding one to the right .
> >Thus there will be 9980 slopes(10,000 - 20).
> >The use of
> >if 1< =_N_ <=20 then span=1;
> >if 21< =_N_ <=30 then span=2;
> >if 31< =_N_ <=40 then span=3;
> > etc up to 500;
> >will result in slopes for segmented observations and essentially not a
> >moving average slope.
> >My code does the moving average slope by computing 4 sums (Sum(X), Sum(Y),
> >SUM(X * X) and Sum(X * Y) on a sliding scale.
> >Hope this clarifies.
> >Kind regards,
> >Muthia Kachirayan
As noted, I did not give a general solution to deal with Multiple
Regression. Howard's solution using Proc Reg with BY statement is far better
to deal with Multiple Regression. If required any regression analysis can be
done if applicable like Logistic Regression (needs binary dependent
In the OP's context(MOVING AVERAGE), use of Proc Reg prepares the four sums
, (Sum(X), SUM(Y), Sum(X **2) and SUM(X * Y), besides other computations( I
do not know exactly what they are) AFRESH from beginning to end for each
segment of data ignoring that at any point only one observation leaves and
one observation enters the segment. My code takes note of this and ensures
that sums are adjusted that way. Thus it saves time to the user.
< I think it would be a lot less
prone to potential computational error, as one would only have to sum the
betas produced by outtest to get the slope >
is true for potential computational error at the hands of user who does not
know the mechanics of Linear Regression or that the user has no freedom
other than to use SAS Proc.
But other part of your statement, getting to sum the betas to get slope from
outtest , needs your clarification.
I understand that betas are standardized regression coefficients, and slope
is another name for such coefficients and summing the betas to get slope is