LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2005, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 21 Sep 2005 08:43:56 -0700
Reply-To:     "Pardee, Roy" <pardee.r@GHC.ORG>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Pardee, Roy" <pardee.r@GHC.ORG>
Subject:      Re: huge sql join, fast way to sample first?
Comments: To: Pic Martin <pmart@NOSPAM.CBC.TV>
Content-Type: text/plain; charset="US-ASCII"

I think the trouble w/the OBS, INOBS and OUTOBS options is that, if your query has a WHERE clause, the WHERE conditions will get processed (against the entire dset) first before either of those options kicks in.

I've seen people recommend using PROC SURVEYSELECT to create random subsamples of tables that are stratified on relevant fields--not sure if that would take so long as to make it not worth it. No doubt others could give advice on how to use that proc most efficiently.

FWIW--if you specify the NOEXEC option on the call to proc sql, your basic syntax will be evaluated--that's good for catching typos. You can likely also get an execution plan w/the _method or _tree options, if that's any use. (I don't know how to interpret those, so they wouldn't be much use to me).

HTH,

-Roy

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Pic Martin Sent: Wednesday, September 21, 2005 8:29 AM To: SAS-L@LISTSERV.UGA.EDU Subject: huge sql join, fast way to sample first?

I have a ginormous sql join job (data comes from 2,3 and 4 tables)

what's the easiest way to first test the data, before actually letting it rip? the goal would be to first look at a very small piece (maybe thousand rows) but the actual job is dependent on joining tens of millions of rows from multiple tables

subsetting may also (?) create a condition that nothing is created in output with too little data in the join to evaluate

ie. proc sql (obs=1000); ...lots of create, select, where, having and containing statements

or options obs=1000; proc sql ; ...lots of create, select, where, having and containing statements

something totally different?


Back to: Top of message | Previous page | Main SAS-L page