LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (November 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 18 Nov 2004 08:58:52 -0500
Reply-To:     Peter Crawford <peter.crawford@BLUEYONDER.CO.UK>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Peter Crawford <peter.crawford@BLUEYONDER.CO.UK>
Subject:      Re: proc format vs. merge
Comments: To: Jack Hamilton <JackHamilton@FIRSTHEALTH.COM>

On Fri, 12 Nov 2004 19:00:34 -0700, Jack Hamilton <JackHamilton@FIRSTHEALTH.COM> wrote:

> >I suppose there are too many lookup records to put them all into a v9 >associative array?

Absolutely... the table is 14GB and growing daily

> > >Enhanced generic lookup capabilities using indexes in the data step >would be welcome, yes. > >What about > >===== > proc sql; > create table claims2 as > select claims1.* > from claims1, elig > where elig.policy_no = claims1.policy_no > ; > quit; >===== > >Doesn't that give the same result as your SQL? I think it would be >faster. >-- >JackHamilton@FirstHealth.com

When I first tried this, it still took far too long. Adding _method shows that "where" was still optimised to a hash join. 24 create table userd.select_gios_policy_sql as 25 select contract, version, partition, pono ............... more columns 29 , dtechang, firstday, lastday 30 from gios.policy 31 , gios.changedpolicies( rename=( contract=changed )) 32 where contract = changed /********* COMMENTS *********/ 36 order by contract, firstday, version 37 ; NOTE: SQL execution methods chosen are: sqxcrta sqxsort sqxjhsh sqxsrc( GIOS.POLICY ) sqxsrc( GIOS.CHANGEDPOLICIES )

Setting a low value for buffersize, stopped the hash join, but then it tried to sort ! 99 proc sql _method _tree buffersize=10000; 100 create table userd.select_gios_policy_sql as 101 select contract, version, partition, pono ................. 105 , dtechang, firstday, lastday 106 from gios.policy as p 107 join gios.changedpolicies( rename=( contract=changed )) 108 on contract = changed 112 order by contract, firstday, version 113 ;

NOTE: SQL execution methods chosen are: sqxcrta sqxsort sqxjm sqxsort sqxsrc( GIOS.CHANGEDPOLICIES ) sqxsort sqxsrc( GIOS.POLICY(alias = P) )

The changedPolicies table is in order, so I added sortedby gios.changedpolicies( rename=( contract=changed )sortedby = changed ) The _method picture became NOTE: SQL execution methods chosen are: sqxcrta sqxsort sqxjm sqxsrc( GIOS.CHANGEDPOLICIES ) sqxsort sqxsrc( GIOS.POLICY(alias = P) )

The Policy table is indexed on POLID, a composite of contract and two other columns, which aren't all in the changedPolicies table. However, Contract comes first in the key, so I can try gios.policy( idxname= polid sortedby= contract) as p Then _method shows NOTE: SQL execution methods chosen are: sqxcrta sqxsort sqxjm sqxsrc( GIOS.CHANGEDPOLICIES ) sqxsrc( GIOS.POLICY(alias = P) )

So I can eliminate the sql sorts, but it just forces a merge join.

So it is no surprise that it still seems to take a very long time !

Many thanks for the ideas, about partial key set statement support.

I'll be adding my support on the ballot, and persuading my clients' developers how much they should involve themselves, too.

Regards Peter

> >Doesn't that give the same result as your SQL? I think it would be >faster. > > > > >-- >JackHamilton@FirstHealth.com >Manager, Technical Development >Metrics Department, First Health >West Sacramento, California USA > >>>> "Peter Crawford" <peter.crawford@BLUEYONDER.CO.UK> 11/12/2004 12:38 >PM >>> >On Fri, 12 Nov 2004 18:51:28 GMT, Matthew Wilson ><matt@OVERLOOK.HOMELINUX.NET> asks about using user formats >for 1 to 1 look-ups rather than use a data step merge. >Matt's full post follows my sig lines below. > >Matt >On the internals of user formats you've had the definitions from Roger >Lustig and Jack Hamilton, so may I discuss data step merges and their >shortcomings for look-up processing. > >I'm working on replacing some data step merges with almost anything I >can >find........ Unfortunately in my context, formats are not suitable.. > >Imagine collecting details from a large(40m rows) denormalised table, >for a few hundred (or even thousand) cases. >Anywhere I can obtain index keys for the selection list, an sql join >wins, >especially when the key is non-unique. >Sadly, new business analysts are attracted by the simple syntax >(apol's for adopting -most- of your sample code )


Back to: Top of message | Previous page | Main SAS-L page