LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2003, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 3 Jul 2003 16:21:06 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: A computation question
Content-type: text/plain; charset=us-ascii

Ken Keung <1800okla@HANMAIL.NET> wrote [in part]: > OBS C1 C2 C3 HOW_MANY TEMP > 1 1 0 0 1 4 > 2 0 1 0 1 1 > 3 0 0 1 1 3 > 4 1 1 0 2 16 > 5 1 0 1 2 13 > 6 0 1 1 2 9 > 7 1 1 1 3 22 > > . . . . > > Now, I want to produce the sas output like this. > > OBS C1 C2 C3 HOW_MANY > 1 4 . . 1 > 2 . 1 . 1 > 3 . . 3 1 > 4 15 12 . 2 > 5 10 . 9 2 > 6 . 6 8 2 > 7 13 9 6 3 > > Now the challenge is that I have 20 components, NOT 3 as shown above. > That means there are 1,048,575 observations > 2 to the power of 20 minus 1) in the dataset. > After approximately 20 hours (YES! hours) waiting, my computer (1.7GHZ and > 512MB memory) couldn't produce the output.

Umm, I hate to sound too critical, but you have made this more difficult than you desire.

[1] You did not state your problem clearly enough. I cannot see precisely how you plan to get from your first data set to your second. Do you actually have all (2**20-1) * K entries in a single data set, and just want to do the subtractions?

[2] You did *not* show your code, so we cannot see what is going wrong. You didn't, by chance, try to do a cartesian join to do all the matching, thus making your problem increase vastly in size? Trying to merge a million records with a milion records is going to take a *LOT* longer than trying to merge 2**15-1 records with itself.

[3] You are making this increase exponentially with the number of components, so of course it is taking significantly longer as the number of components goes up. Your algorithm rapidly becomes crucial.. and you didn't show it to us.

[4] Your algorithm does not appear to have unique solutions, since the differences are not constant. You appear to have interactions between components which your 'difference' table is omitting. The interactions may be the most important part of the process. Are you sure you really want to do the analysis like this?

[5] There is no way of telling from your description what your process is, or what you are trying to achieve, or why you need these differences, or what good they will do you when you are looking at over a million of them.

Please write back to the list (not to me personally) with soem answers to the above questions, and try to help us out, so that we may try to help you out.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page