LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2007, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 9 Nov 2007 11:38:06 -0500
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Which is quicker: view on view or several lookups?
Comments: To: Sander Burggraaff <info@burgie-T.nl>
In-Reply-To:  <1194608493.926273.77270@o3g2000hsb.googlegroups.com>
Content-Type: text/plain; charset="us-ascii"

Sander: I don't have any detail on how DI Studio works. Views in SAS PROC SQL run behind the scene, so much depends on sources of data (Oracle, OLEdb, DB2 SAS/Access engines as well as SAS datasets). Judging from what I have seen from the outside, the SAS SQL compiler determines how data pipe through a series of stored and in-line views. Views execute in sequence, much like nested sub-queries, so SAS caches data in a way that fits within an execution plan.

In SAS SQL the LEFT JOIN almost always forces sorting operations. Hash object equivalents almost always work much quicker, often dramatically so. Why not test what you are trying to do in SAS PROC SQL? The invaluable SAS-L Archives contain many postings on SQL performance, including the LEFT JOIN, hash object look-up and summarization, and data transformations. Some of the implementations of declarative programs come very close to the theoretical O(N) limit of performance on current platforms.

If you are depending on DI to optimize declarative programs, you'll pretty much have to settle for whatever execution plan DI writes. SAS PROC SQL tests could show whether it makes sense to depend on DI, or whether intervening might improve performance significantly. S

-----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Sander Burggraaff Sent: Friday, November 09, 2007 6:42 AM To: sas-l@uga.edu Subject: Which is quicker: view on view or several lookups?

Example: We have one dataset containing 10 million records. We also have 3 other datasets that contain product information. Data of each of those 3 datasets has to be added to the 10 million records by not only a single key variable but a date as well. So we have to check if a certain date is between 2 dates in each record of those 3 datasets. Which method would be better for using in DI Studio?

We can create 3 views with SQL left joins. View 2 would be based on view 1 and view 3 would be based on view 2. The idea behind this is that we can reduce I/O operations by using views. SQL joins don't have a great performance however.

Another way would be to use the lookup transformation in DI Studio which uses hashing. This is a lot quicker than SQL join but the downside is that we would be reading those 10 million records three times instead of only once (!?!) when using the 3 views.

Now it's that last statement that has got us thinking. Will those 10 million records really be read only once when using views? We have our doubts...

Unfortunately we haven't been able to find any information that clarifies this point. We have found a thread in this newsgroup and we read the SUGI 19-27 paper and we found out about the spill file but we're still not convinced. Is there anyone with a definite answer to this question?


Back to: Top of message | Previous page | Main SAS-L page