Date: Wed, 6 Aug 2003 03:16:00 +0000
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Dorfman <paul_dorfman@HOTMAIL.COM>
Subject: Re: Proc Format performance question
Content-Type: text/plain; format=flowed
You hit it right on the head. However, why would you think that the number
of ranges in the format (I assume this is what you meant by "members') would
*not* affect performance?
Underlying format structure is an AVL tree, whose search performance
deteriorates as O(log2(N)), so in your case, you should expect it to
decrease log2(300000/20000) ~ 4 times. In reality, the search time
difference will not be that stark but still appreciable.
(In)formats have actually *not* been originally designed to serve as huge
lookup tables. It is the resourcefullness of a SAS programmer than has
pushed formats to this extreme. Nor the AVL tree is the fastest lookup
structure; I suspect it has been chosen because it scales very well and
guarantees the average O(log2(N)) searching behavior *regardless* of the
input key distribution. Hence the reasons for the advent of the hash object
in the V9's Data step. If you are still running V8 and your performance is a
paramount consideration, try hand-coded hash schemes. You can find plenty of
them already SAS-implemented and tested in SUGI 26-28 papers by Gregg Snell
and yours truly; the one from 26 gets the most nitty-gritty with the
Paul M. Dorfman
>From: RICH0850 <rich0850@AOL.COM>
>Poor performance question.
>I've got a format library. It's got a single member "$ARRDECO". It used to
>small (less than 20k entries). Now it's 300k plus entries.
>Now, referencing this format takes a lot of time (both CPU and real time).
>Such as BOB=PUT(JANE,$ARRDECO.);
>Prior to this it was fast, now it's slow.
>Is the number of members in a format a problem... or what?
Add photos to your messages with MSN 8. Get 2 months FREE*.