| Date: | Tue, 10 Jan 2006 11:46:41 -0800 |
| Reply-To: | David L Cassell <davidlcassell@MSN.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | David L Cassell <davidlcassell@MSN.COM> |
| Subject: | Re: Hash Table Memory Limitations |
| In-Reply-To: | <200601101450.k0ABkDgS009578@mailgw.cc.uga.edu> |
| Content-Type: | text/plain; format=flowed |
|---|
sezzy@IHCIS.COM wrote back:
>Thank you everyone for your comments on this. I need to apologize for
>the inadequacy of my example to model the real-life application. The
>example is deficient in two aspects:
>
>1. I didn't make the size of the data table (med_test1) nearly large
>enough in my example. The real data set has about 5,000,000 keys to
>look up, not 5,000.
Perhaps you could explain the full size of your problem. If you have 5M
keys
to look up, then how big is your base data set? If you have a
trillion-record
base dta set and 5 million keys to look up, then you may benefit from
re-designing
your process. For now, if this is more your problem, then try something
simple, like split-and-combine.
Split your 'keys' data set into 5 pieces, each of a million keys, assuming
your
system will handle this. Run the code you have already designed, assigning
keys and satellite data to the huge data set each time, until you have
assigned
all 5M keys using hashes. Now you have 5 data steps instead of one, but no
more hideous crashes.
>2. Although the lookup table is sorted for this lookup, we actually
>have two more look-ups to do with the data set (with fields other than
>member & dtsc_cd); in order to use Paul's technique, these lookups
>would require the data set to be sorted two additional times. This
>might be the way to go, depending on how long it would take to do the
>sorting and look up 5M keys.
>
>The way the program is written, it would be difficult to switch the
>tables, but, we are considering scaling-down the hash table.
Well, try the above idea and see if it saves you any grief. If not, then
throw
it away.
>FYI, I just learned from the folks at SAS tech support that "... when a
>large amount of memory is requested by the hash object, the hash object
>structure which requests the memory can be overloaded. This cause[s]
>the premature out of memory error to occur." They said that there will
>be a fix in SAS 9.1.3 service pack 4 (should be available in March)
>which prevents the overload. The fix will include a new error message
>which tells how many keys were loaded into the hash table before the
>out of memory condition occured.
Great. Then I can know how big a hash I got before the hideous crash.
No, actually that could be helpful. As in my above weird suggestion, you
would know how big the pieces of your lookup table could be and still
maintain a (close to) stable system.
HTH,
David
--
David L. Cassell
mathematical statistician
Design Pathways
3115 NW Norwood Pl.
Corvallis OR 97330
_________________________________________________________________
Don’t just search. Find. Check out the new MSN Search!
http://search.msn.click-url.com/go/onm00200636ave/direct/01/
|