LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 23 Feb 2010 23:00:30 -0600
Reply-To:     Yu Zhang <zhangyu05@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Yu Zhang <zhangyu05@GMAIL.COM>
Subject:      Re: an efficiency issue in proc sql
Comments: To: Tong Wang <wangtong@usc.edu>
In-Reply-To:  <ddd0ae2133a98.4b843d26@usc.edu>
Content-Type: text/plain; charset=ISO-8859-1

there were some similar topics discussed before on the list. here is the one might help. http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0812a&L=sas-l&D=0&P=8754

Yu

On Tue, Feb 23, 2010 at 10:40 PM, Tong Wang <wangtong@usc.edu> wrote:

> Hi, Thanks so much for the help. > It seems I was mislead to believe that sql is the best way to do this. > > Actually it would be great if someone can should me how to do this with > array, because I would like to run a regression on delayed returns which I > can't do in sql. For example: > > ID date return1 return2 > ...... > > I would like to create a new column whose value is the regression > coefficient (slope) of return1 on return2 using > the past k days returns for the same ID. > > > Thanks Again !! > > ----- Original Message ----- > From: Yu Zhang <zhangyu05@gmail.com> > Date: Tuesday, February 23, 2010 7:46 pm > Subject: Re: an efficiency issue in proc sql > To: Tony Wang <wangtong@usc.edu> > Cc: SAS-L@listserv.uga.edu > > > Hi, Tony, > > > > this solution will eliminate the duplicates issue. the subquery > > will be > > executed for every record in your table. I think it is a bit more > > efficientthan the one you proposed. IMHO, I think you can be more > > satisfied by using > > Array in a datastep. I am sure some one on the list will show you > > how to do > > it. > > > > > > proc sql; > > select *,(select sum(return) from test as ins > > where 0<=out.date-ins.date<=1) from test as out; > > quit; > > > > > > HTH > > > > Yu > > > > On Tue, Feb 23, 2010 at 8:14 PM, Tony Wang <wangtong@usc.edu> wrote: > > > > > Hi,all > > > I am looking for a more efficient method to do the following: > > > > > > I have a data set (called mydata) with columns date and return > > > and I would like to create a new column that contains the sum of > > past k > > > returns > > > for the simplest case (k=2): > > > date return > > > 2000/1/1 1 > > > 2000/1/2 1.1 > > > 2000/1/3 .9 > > > then the new data set should be > > > date return totalret > > > 2000/1/2 1.1 2.1 (namely 1+1.1) > > > 2000/1/3 .9 2 > > > (note that the sum operation and the range of the data in sum are > > only> provided as an simple example, in my work, I actually have to > > perform more > > > complicated computation and over some range that satisfies more > > conditions,> thus a method only works for this particular case is > > not quite helpful) > > > > > > I used the following sql to do this > > > > > > proc sql; > > > create table newdata > > > as select distinct a.date, a.return, sum(b.return) as totalret > > > from mydata as a , mydata as b > > > where 0<=intck('day',b.date,a.date)<=1 > > > having count(b.return)=2; > > > quit; > > > > > > > > > This does the job, but I realized that it probably does it twice > > (k times > > > in > > > general) ! > > > Because if I drop 'distinct', I see two copies of the desired > > result.> I am guessing that ,with 'distinct' ,the program still > > compute everything > > > twice, then delete redundent copies. > > > > > > As you can image, this is really wasteful when k is large. > > > > > > > > > Is there anyway to get around this ? Any help would be > > appreciated. thanks > > > . > > > > > >


Back to: Top of message | Previous page | Main SAS-L page