LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 10 Sep 2010 15:43:06 -0400
Reply-To:     Andy Arnold <awasas@COX.NET>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Andy Arnold <awasas@COX.NET>
Subject:      Hash - 6 Questions

Greetings, All.

I apologize if this is a multiple post. I think I sent this earlier today, but I can't find it on the list. So here it is again.

I considered making this into multiple posts, but decided to keep it all together. The overall goal is to improve the performance of jobs that have suffered from a recent, massive increase in data volume (from a few million to 50 million and growing). I've already made significant reductions in run time, but I need more. So hash tables are my next learning curve. (BTW, the 50M records get exploded into 300M records, which then go through a Proc Summary/nway with 6 classes.) I've done some test code and experiments, but I still have a few questions about hash objects; so here they are.

Thanks for your reading time. --Andy

1. How large is large? Many Hash Object discussions on the web indicate that large datasets should see performance improvements when using Hash Objects instead of Proc Summary or a dataset merge. I've converted 3 steps (2 Summary and a merge) to a single step using 3 Objects, and the usage numbers show little or no improvement. However, I see a definite improvement using a Proc Summary/nway replacement to reduce 50M records into 50K.

2. Do Hash Objects perform better with a single, large key than with a composite key of comparable size? If so, is the improvement enough to offset the 'cost' of concatenating and de-catenating the components?

3. How do Hash Objects handle null/missing values in a component key? I expect a completely missing key will fail to store. Will a 3-component key store the item if the keys are A=1 B=missing C=3?

4. Does HITER get lost/confused when it REMOVEs an item from the Object? I assume the answer is no, but I need to be told. I've found nothing on the web indicating that I need to take special care, so I assume that after removing the 4th item in the hash the HI_mail.next will look at the new 4th item (old 5th). I know that I usually get lost and screw up the REMOVE function when I write my own iterators.

5. Does REPLACE also do an ADD if the key/item is missing from the hash object?Is this always true or are there special cases?

6. Given that the Program Data Vector and a hash object are separate storage areas, are all of the following hash function descriptions correct? Check: Verify that item in exists in the hash; PDV and hash are both unchanged. Find: Locate item in hash; if the item is found, copy its data fields to the PDV, otherwise make no change to the PDV. Add: Create item in the hash and copy PDV data to the item. Replace: Locate item in the hash and copy PDV data to the item. Remove: Locate the item in hash and remove its key and data from the hash.


Back to: Top of message | Previous page | Main SAS-L page