| Date: | Tue, 2 Sep 2008 11:09:09 -0700 |
| Reply-To: | Jay Soule <jay.soule@GMAIL.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Jay Soule <jay.soule@GMAIL.COM> |
| Subject: | Re: Hash lookup not quicker than SQL inner join? |
| In-Reply-To: | <98634D42B37B2E4E96CF3A3BD3CD9AE60254E93F@EX1VS2.nyced.org> |
| Content-Type: | text/plain; charset=ISO-8859-1 |
On Tue, Sep 2, 2008 at 9:00 AM, Bucher Scott <SBucher@schools.nyc.gov> wrote:
> You could try the "PROC SQL _METHOD;" or "PROC SQL _TREE;" options to
> get more details on how SQL is actually performing the joins. In some
> cases, SQL actually performs a hash join. I believe there are one or two
> conference papers on this topic and the aforementioned options.
Right - if the note in the SAS log after the SQL step shows that the
method used was "sqlxjhsh" then SAS used a hash join.
- Jay Soule
> Sander Burggraaff
> Sent: Tuesday, September 02, 2008 11:42 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: Hash lookup not quicker than SQL inner join?
>
> I always assumed that a hash lookup would be a quicker way to join two
> tables than a SQL join. But I recently did some testing and the
> results were not what I expected.
>
> Let's say we have two tables: Table_1 and Table_2. These two tables
> have 4 variables in common: B, C, D and E. B is a character variable
> with a length of 20 and the other 3 are all numeric with a length of
> 8. Table_1 also has a variable called A1 and Table_2 has a variable
> called A2.
>
> Of course the variables B, C, D and E in both tables are keys we want
> to join the tables on. The result table will hold the variables A1 and
> A2.
>
> Table_1 holds the same variables as Table_2. At first we tried
> 25.000.000 observations but we ran out of memory when using the Hash
> lookup. Apparently the 1.6Gb memory available was not enough to store
> the hash table in. So we lowered the amount of observations to
> 20.000.000.
>
> To our surprise the inner join performed just as quickly as the hash
> lookup. I expected SAS to start sorting when using the inner join but
> apparently it doesn't. During some tests it even looked like the inner
> join was quicker than the hash lookup.
>
> Can anyone explain why the inner join is just as quick? No sorting?
> Why?
>
> I searched this discussion group but couldn't find any postings about
> it. Have older posts been deleted?
>
|