LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 1996, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 28 Oct 1996 10:58:49 -0800
Reply-To:     matwood@ix.netcom.com
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Max Atwood <matwood@IX.NETCOM.COM>
Subject:      Interesting Problem (Long)

Hi Y'all:

I am running a SAS program (v6.11) on a sparc1000 under Solaris 5.5. The data set I am trying to summarize has 5.9 million obs and is about 400Megs is size. Originally I wrote the main data step to take advantage of the array features in SAS. Basically the data step reads an observation, does a couple of tests to see if the obs. can be used, looks up a couple of parameters from a user defined format, then buckets a numeric field into an array. Later, when it has gone through all the obs., it outputs the array in the form of a SAS data set.

I originally wrote the program last spring. At the time, it ran well on a data set that was that same structure but had 7.2 million obs. At that time the CPU time for the entire program was between 2 and 3 hours. The code and results were throughly tested (at least as much as humanly possible.) All was fine.

Now when I run the process, on a new data set, the cpu time has jumped to over 20 hours. In addition, the final output (summary file) has zero observations. I have checked the code, and the data and the problem is not there. (The code is identical and the data has the same structure and characteristics as the old data.) There have been some changes to the system however. Another area of our company has been working more intensly with SYBASE and have made a few unspecified "modifications" to improve the SYSBASE proformance. (The SYSOP assures me that these should not effect me or my SAS applications.) The operating system has also been up-graded recently from the old Solaris 5.3 to the newer Solaris 5.5. We have added disk arrays, a total of 50 gig in new storage. (During one of the hardware up-grades, a controller card failed and required replacement.) Finally, the system was physically moved from San Francisco to Sacramento. (I now access it through a LAN to a T1 telephone connection to a token ring. Formally, the server was connected directly to the LAN.)

There are two changes in the process that I have noticed over the old versus new run. First is the enormous change in the run time. I am at a total loss to explain this. The entire process is run on the server. I have access via an X-window set up through eXceed 5.1. (My local system acts strickly as a dumb terminal.) If anything the resources to the process have improved. I do estimate that the array in the process is huge. My calculations place at about 10 to 15 megs. I have used the -memsize option to increase the available RAM from the default 32 megs to 64 megs with no effect. (There are 288 megs of RAM available to the system in total.) There are more users logged into the system than there generally was last spring. The average seems to have increased from about 4 at any one time to 6-10 now. (However, running the process at night when most other users are not logged in does not seem to help.) I have also spent some time looking at the code in detail. I have used the "debugger" to step through the process under very tightly controlled conditions (test data, etc.). The code runs as expected with no unusually long processes.

The other result is the lack of data in the summary file. My first throughts about this problem had to do with data and code. But as I said above, checking this situation out proved fruitless. To make the situation more confusing, the lack of data seems more dependent on the amount of data it reads than anything else. When I reduced the data to the first 1000 lines, the program produced 128 lines in the summary. I then increased the number of lines to 10,000 and only got 97 lines of output. I increased the number of lines to the first 50,000 lines and got 908 lines of output. The implication is that there are less levels in 10000 lines than in 1000. (???) (I expect about 30,000 lines of output for the entire data set.)

Finally, I did not over look the obvious test of trying to rerun the old data set under the more recent system conditions. However, somewhere in time, my old data file was corrupted and I am unable to replace it. So, I have to live without it for now.

After having racked my brain on this from some time now, I have run out of theories and options to test. I am hoping that by describing the situation in this kind of detail, that it will strike a familar cord with someone out there is SAS-land. I am now in the process of re-writing the program so that it breaks up the data into to smaller segments and processes it over many itriations. This has improved the situation. My running is now down to less than 5 hours and the number of lines in the output is up to about 16,000 (about half of what is expected.)

I believe I am operating within the limits of SAS (???). Are there any parameters I should look at to improve the processing? Is anyone aware of any parameters that might be limiting me from Solaris (or UNIX in general)? (I did check to see that there are no limits placed on the amount of RAM I can access, both using the UNIX command "limit" and talking to the SYSOP.) Does SAS have any limits on how big an array can get or how much data it can hold? Any clues that can be sent my way would be very much appreciated!!!

I appreciate your attention and look forward to the wonderful response I usually get from this group.

Thanks,

Max Atwood (matwood@ix.netcom.com)


Back to: Top of message | Previous page | Main SAS-L page