Date: Fri, 10 Sep 2010 15:43:06 -0400
Reply-To: Andy Arnold <awasas@COX.NET>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Andy Arnold <awasas@COX.NET>
Subject: Hash - 6 Questions
I apologize if this is a multiple post. I think I sent this earlier today,
but I can't find it on the list. So here it is again.
I considered making this into multiple posts, but decided to keep it all
together. The overall goal is to improve the performance of jobs that have
suffered from a recent, massive increase in data volume (from a few million
to 50 million and growing). I've already made significant reductions in run
time, but I need more. So hash tables are my next learning curve. (BTW,
the 50M records get exploded into 300M records, which then go through a Proc
Summary/nway with 6 classes.) I've done some test code and experiments, but
I still have a few questions about hash objects; so here they are.
Thanks for your reading time.
1. How large is large? Many Hash Object discussions on the web indicate
that large datasets should see performance improvements when using Hash
Objects instead of Proc Summary or a dataset merge. I've converted 3 steps
(2 Summary and a merge) to a single step using 3 Objects, and the usage
numbers show little or no improvement. However, I see a definite
improvement using a Proc Summary/nway replacement to reduce 50M records into
2. Do Hash Objects perform better with a single, large key than with a
composite key of comparable size? If so, is the improvement enough to
offset the 'cost' of concatenating and de-catenating the components?
3. How do Hash Objects handle null/missing values in a component key? I
expect a completely missing key will fail to store. Will a 3-component key
store the item if the keys are A=1 B=missing C=3?
4. Does HITER get lost/confused when it REMOVEs an item from the Object? I
assume the answer is no, but I need to be told. I've found nothing on the
web indicating that I need to take special care, so I assume that after
removing the 4th item in the hash the HI_mail.next will look at the new 4th
item (old 5th). I know that I usually get lost and screw up the REMOVE
function when I write my own iterators.
5. Does REPLACE also do an ADD if the key/item is missing from the hash
object?Is this always true or are there special cases?
6. Given that the Program Data Vector and a hash object are separate
storage areas, are all of the following hash function descriptions correct?
Check: Verify that item in exists in the hash; PDV and hash are both
Find: Locate item in hash; if the item is found, copy its data fields to
the PDV, otherwise make no change to the PDV.
Add: Create item in the hash and copy PDV data to the item.
Replace: Locate item in the hash and copy PDV data to the item.
Remove: Locate the item in hash and remove its key and data from the hash.