LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2010, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 23 Feb 2010 01:46:22 -0500
Reply-To:     Francois van der Walt <francoisw@GJI.COM.AU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Francois van der Walt <francoisw@GJI.COM.AU>
Subject:      Re: SAS SQL subquery efficiency
Comments: To: "Sigurd W. Hermansen" <HERMANS1@WESTAT.COM>

Hi Claudia,

I believe the following code will give you what you want:

data x; input dealno date ddmmyy10. id; datalines; 1 01/01/1995 77 1 01/01/1995 22 1 01/01/1995 33 1 01/01/1995 44 1 01/01/1995 55 2 01/01/1998 77 2 01/01/1998 22 2 01/01/1998 17 2 01/01/1998 3 3 01/01/1999 77 3 01/01/1999 22 3 01/01/1999 33 3 01/01/1999 4 ; run; proc sort data=x out=xs; by dealno id; data xsc; set xs; run; PROC SQL; CREATE TABLE xs2 AS SELECT t1.dealno, t1.date, t1.id, t2.id AS id1 FROM WORK.XS AS t1, WORK.XSC AS t2 WHERE (t1.dealno = t2.dealno AND t1.id >= t2.id) AND t1.id NOT = t2.id; QUIT; PROC SQL; CREATE TABLE xs3 AS SELECT t1.dealno, t1.date, t1.id, t1.id1, /* COUNT_of_dealno */ (COUNT(t1.dealno)) AS COUNT_of_dealno FROM xs2 AS t1, XS2 AS t2 WHERE (t1.dealno >= t2.dealno AND t1.id = t2.id AND t1.id1 = t2.id1) GROUP BY t1.dealno, t1.date, t1.id, t1.id1; QUIT;

Kind Regards Francois van der Walt

www.gji.com.au

On Tue, 23 Feb 2010 00:14:39 -0500, Sigurd Hermansen <HERMANS1@WESTAT.COM> wrote:

>In > >Proc sql; >Create table count_past as >select ...... > ...... >from table1 a, table1 b >where a.CIE_ID <> b.CIE_ID >''''''' >; >Quit; > > the constraint, where a.CIE_ID <> b.CIE_ID, doesn't do what you likely have in mind. Because almost all of the pairings of rows will fail to match on two key values, the condition will almost always succeed and the query as written will generate a near Cartesian product. That will use up a lot of disk space. > >I'd worry first about the logic of the query. You are probably looking something more along the lines of > Select ... from table1 where NOT CIE_ID IN (select CIE_ID from table2) > >Neither the query or the subquery appear to have correct syntax (although difficult to follow as formatted), so it may have made it past syntax checking by accident. Perhaps you could begin with simple statements of the logical rules that you are attempting to declare in a SQL query. >S > >-----Original Message----- >From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Claudia Champagne >Sent: Monday, February 22, 2010 11:31 PM >To: SAS-L@LISTSERV.UGA.EDU >Subject: SAS SQL subquery efficiency > >Hi everyone, > >I'm trying to use a subquery to count past observations common to variables. > However, I must be doing something wrong because I run out of disk space >even though I have over 60 Go available! > >I have 3 variables in table1 (and > 1 million lines): DEAL_ID, DATE and >CIE_ID. Since the same deal can involve more than 1 cie, the same DEAL_ID >can be repeated on more than 1 line with a different CIE_ID: > >Deal_ID Date CIE_ID >1 01/01/1995 77 >1 01/01/1995 22 >1 01/01/1995 33 >1 01/01/1995 44 >1 01/01/1995 55 >2 01/01/1998 77 >2 01/01/1998 22 >2 01/01/1998 17 >2 01/01/1998 3 >3 01/01/1999 77 >3 01/01/1999 22 >3 01/01/1999 33 >3 01/01/1999 4 > > >For a deal on a given date, I want to count the number of past deals between >the cies involved in the deal. I want past deals in the previous 5-year >period. For example, for DEAL_ID = 3, I need to obtain the following: > >Deal_ID CIE_ID CIE_ID2 Count >3 77 22 2 >3 77 33 1 >3 77 4 0 >3 22 33 1 >3 22 4 0 >... > >At the time of DEAL_ID = 3, Cies 77 & 22 were in 2 previous deals together >over the past 5 years, #77 & 33 were in 1 previous deal together and 77 & 4 >were in 0 past deal. > >I tried using the following sql command but it uses too much disk space! > >Proc sql; >Create table count_past as >select distinct a.PackageID, a.CIE_ID, b.CIE_ID as CIE_ID2, > (select (count(distinct b.PackageID)) from table1 b where > a.CIE_ID <> CIE_ID and b.PackageID in (select b.PackageID from > table1 b where b.CIE_ID = a.CIE_ID) and > 0 < intck('DAY', b.Date, a.Date)<= 1825 ) >from table1 a, table1 b >where a.CIE_ID <> b.CIE_ID >group by a.PackageID, a.CIE_ID, b.CIE_ID; >quit; > > >Any ideas on how to make it more efficient? > >Thanks!


Back to: Top of message | Previous page | Main SAS-L page