Date: Tue, 23 Feb 2010 20:40:06 -0800
Reply-To: Tong Wang <wangtong@USC.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Tong Wang <wangtong@USC.EDU>
Subject: Re: an efficiency issue in proc sql
In-Reply-To: <445d9dbe1002231946u1e3f0285sf4fd2ee447011d88@mail.gmail.com>
Content-type: text/plain; charset=us-ascii
Hi, Thanks so much for the help.
It seems I was mislead to believe that sql is the best way to do this.
Actually it would be great if someone can should me how to do this with array, because I would like to run a regression on delayed returns which I can't do in sql. For example:
ID date return1 return2
......
I would like to create a new column whose value is the regression coefficient (slope) of return1 on return2 using
the past k days returns for the same ID.
Thanks Again !!
----- Original Message -----
From: Yu Zhang <zhangyu05@gmail.com>
Date: Tuesday, February 23, 2010 7:46 pm
Subject: Re: an efficiency issue in proc sql
To: Tony Wang <wangtong@usc.edu>
Cc: SAS-L@listserv.uga.edu
> Hi, Tony,
>
> this solution will eliminate the duplicates issue. the subquery
> will be
> executed for every record in your table. I think it is a bit more
> efficientthan the one you proposed. IMHO, I think you can be more
> satisfied by using
> Array in a datastep. I am sure some one on the list will show you
> how to do
> it.
>
>
> proc sql;
> select *,(select sum(return) from test as ins
> where 0<=out.date-ins.date<=1) from test as out;
> quit;
>
>
> HTH
>
> Yu
>
> On Tue, Feb 23, 2010 at 8:14 PM, Tony Wang <wangtong@usc.edu> wrote:
>
> > Hi,all
> > I am looking for a more efficient method to do the following:
> >
> > I have a data set (called mydata) with columns date and return
> > and I would like to create a new column that contains the sum of
> past k
> > returns
> > for the simplest case (k=2):
> > date return
> > 2000/1/1 1
> > 2000/1/2 1.1
> > 2000/1/3 .9
> > then the new data set should be
> > date return totalret
> > 2000/1/2 1.1 2.1 (namely 1+1.1)
> > 2000/1/3 .9 2
> > (note that the sum operation and the range of the data in sum are
> only> provided as an simple example, in my work, I actually have to
> perform more
> > complicated computation and over some range that satisfies more
> conditions,> thus a method only works for this particular case is
> not quite helpful)
> >
> > I used the following sql to do this
> >
> > proc sql;
> > create table newdata
> > as select distinct a.date, a.return, sum(b.return) as totalret
> > from mydata as a , mydata as b
> > where 0<=intck('day',b.date,a.date)<=1
> > having count(b.return)=2;
> > quit;
> >
> >
> > This does the job, but I realized that it probably does it twice
> (k times
> > in
> > general) !
> > Because if I drop 'distinct', I see two copies of the desired
> result.> I am guessing that ,with 'distinct' ,the program still
> compute everything
> > twice, then delete redundent copies.
> >
> > As you can image, this is really wasteful when k is large.
> >
> >
> > Is there anyway to get around this ? Any help would be
> appreciated. thanks
> > .
> >
>
|