Date: Wed, 21 Sep 2005 08:43:56 -0700
Reply-To: "Pardee, Roy" <pardee.r@GHC.ORG>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Pardee, Roy" <pardee.r@GHC.ORG>
Subject: Re: huge sql join, fast way to sample first?
Content-Type: text/plain; charset="US-ASCII"
I think the trouble w/the OBS, INOBS and OUTOBS options is that, if your
query has a WHERE clause, the WHERE conditions will get processed
(against the entire dset) first before either of those options kicks in.
I've seen people recommend using PROC SURVEYSELECT to create random
subsamples of tables that are stratified on relevant fields--not sure if
that would take so long as to make it not worth it. No doubt others
could give advice on how to use that proc most efficiently.
FWIW--if you specify the NOEXEC option on the call to proc sql, your
basic syntax will be evaluated--that's good for catching typos. You can
likely also get an execution plan w/the _method or _tree options, if
that's any use. (I don't know how to interpret those, so they wouldn't
be much use to me).
HTH,
-Roy
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Pic
Martin
Sent: Wednesday, September 21, 2005 8:29 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: huge sql join, fast way to sample first?
I have a ginormous sql join job (data comes from 2,3 and 4 tables)
what's the easiest way to first test the data, before actually letting
it rip?
the goal would be to first look at a very small piece (maybe thousand
rows) but the actual job is dependent on joining tens of millions of
rows from multiple tables
subsetting may also (?) create a condition that nothing is created in
output with too little data in the join to evaluate
ie.
proc sql (obs=1000);
...lots of create, select, where, having and containing statements
or
options obs=1000;
proc sql ;
...lots of create, select, where, having and containing statements
something totally different?
|