Date: Tue, 23 Feb 2010 23:00:30 0600
ReplyTo: Yu Zhang <zhangyu05@GMAIL.COM>
Sender: "SAS(r) Discussion" <SASL@LISTSERV.UGA.EDU>
From: Yu Zhang <zhangyu05@GMAIL.COM>
Subject: Re: an efficiency issue in proc sql
InReplyTo: <ddd0ae2133a98.4b843d26@usc.edu>
ContentType: text/plain; charset=ISO88591
there were some similar topics discussed before on the list. here is the one
might help.
http://www.listserv.uga.edu/cgibin/wa?A2=ind0812a&L=sasl&D=0&P=8754
Yu
On Tue, Feb 23, 2010 at 10:40 PM, Tong Wang <wangtong@usc.edu> wrote:
> Hi, Thanks so much for the help.
> It seems I was mislead to believe that sql is the best way to do this.
>
> Actually it would be great if someone can should me how to do this with
> array, because I would like to run a regression on delayed returns which I
> can't do in sql. For example:
>
> ID date return1 return2
> ......
>
> I would like to create a new column whose value is the regression
> coefficient (slope) of return1 on return2 using
> the past k days returns for the same ID.
>
>
> Thanks Again !!
>
>  Original Message 
> From: Yu Zhang <zhangyu05@gmail.com>
> Date: Tuesday, February 23, 2010 7:46 pm
> Subject: Re: an efficiency issue in proc sql
> To: Tony Wang <wangtong@usc.edu>
> Cc: SASL@listserv.uga.edu
>
> > Hi, Tony,
> >
> > this solution will eliminate the duplicates issue. the subquery
> > will be
> > executed for every record in your table. I think it is a bit more
> > efficientthan the one you proposed. IMHO, I think you can be more
> > satisfied by using
> > Array in a datastep. I am sure some one on the list will show you
> > how to do
> > it.
> >
> >
> > proc sql;
> > select *,(select sum(return) from test as ins
> > where 0<=out.dateins.date<=1) from test as out;
> > quit;
> >
> >
> > HTH
> >
> > Yu
> >
> > On Tue, Feb 23, 2010 at 8:14 PM, Tony Wang <wangtong@usc.edu> wrote:
> >
> > > Hi,all
> > > I am looking for a more efficient method to do the following:
> > >
> > > I have a data set (called mydata) with columns date and return
> > > and I would like to create a new column that contains the sum of
> > past k
> > > returns
> > > for the simplest case (k=2):
> > > date return
> > > 2000/1/1 1
> > > 2000/1/2 1.1
> > > 2000/1/3 .9
> > > then the new data set should be
> > > date return totalret
> > > 2000/1/2 1.1 2.1 (namely 1+1.1)
> > > 2000/1/3 .9 2
> > > (note that the sum operation and the range of the data in sum are
> > only> provided as an simple example, in my work, I actually have to
> > perform more
> > > complicated computation and over some range that satisfies more
> > conditions,> thus a method only works for this particular case is
> > not quite helpful)
> > >
> > > I used the following sql to do this
> > >
> > > proc sql;
> > > create table newdata
> > > as select distinct a.date, a.return, sum(b.return) as totalret
> > > from mydata as a , mydata as b
> > > where 0<=intck('day',b.date,a.date)<=1
> > > having count(b.return)=2;
> > > quit;
> > >
> > >
> > > This does the job, but I realized that it probably does it twice
> > (k times
> > > in
> > > general) !
> > > Because if I drop 'distinct', I see two copies of the desired
> > result.> I am guessing that ,with 'distinct' ,the program still
> > compute everything
> > > twice, then delete redundent copies.
> > >
> > > As you can image, this is really wasteful when k is large.
> > >
> > >
> > > Is there anyway to get around this ? Any help would be
> > appreciated. thanks
> > > .
> > >
> >
>
