LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2000, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 20 Oct 2000 04:34:48 -0400
Reply-To:     Gerhard Hellriegel <ghellrieg@T-ONLINE.DE>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Gerhard Hellriegel <ghellrieg@T-ONLINE.DE>
Subject:      Re: Any MVS efficiency tips (for large files, etc.)?

On Fri, 20 Oct 2000 02:52:23 GMT, Brad <b_branford@HOTMAIL.COM> wrote:

>Hi, > >I'd like to compile a list of MVS efficiencies which allow for faster >file processing when dealing with HUGE datasets. I'd appreciate it if >the people on this newsgroup, with their extensive experience, would >share their tips on various aspects of using SAS with large datasets. >Once I have a list I'll post it for everybody's benefit. > >Thanks for sharing. > >Brad > > >Sent via Deja.com http://www.deja.com/ >Before you buy.

There are many of that things!! First of all, there is a SAS book with a lot of tips. This was written as V6 was actual, but most of the tips are actual with V8. There are many things, you can do with program logic. Often more efficient than all other tuning tips. That is depending on the things you have to do. Sometimes you have to try, e.g. it is sometimes more efficient to use PROC APPEND instead of DATA ...; SET file1 file2; On the other hand you can possibly avoid a sort with the second method. It depends on the logic and the data, which is more efficient. All in all: there are no tips which are "universal". Always all is depending on the data, what you do with the data and where you do it.

The main thing is: CPU operations are fast, I/O operations are slow. So if you can avoid I/O operations that will be much more worth than avoiding CPU resource consumption. First thing is the BLKSIZE for SAS datasets. Depending on your device and on your data, half-track is the best choice (27648 for 3390). For special applications that may be other. Depending on your data it is useful to try it out, e.g. try a multiple of the record length, below half track. If you have more than one dataset, put them each in another lib on another device (have a look on the special volumes: how much traffic is there?). If you can, use only primary allocation, allocating extents cost time for the OS. If you can, use hiperspace in expanded memory for temporary datasets! Try out to work with compressed datasets. You can transport more data in one EXCP with that (depending on the compression rate). The cost is a higher CPU consumption. You can experiment with the BUFNO option to get more buffers, but you will see not big efforts. Between 4 and 10 buffers maybe it will be a bit faster (elapsed time), with bigger numbers it's decreasing again, because the overhead for the buffer-handling increases. Most of that is true for DASD devices like 3380, 3390, ... But not for HIPERSPACE in memory.

For big datasets and sorts you should always use external sort utilities, like SYNCSORT, DFSORT, ... Also that utilities can use hiperspace for there temporary buffers (SORT-WORK-datsets!).

Give the SAS region much memory. You can avoid i/o operations for SAS modules and have more data in memory. SAS uses it and the efficiency of many PROCs increases!

Programming tips there are too many to list them all. They are almost the same as in other programming environments: use CPU, avoid IOs, use the available memory... In SAS it means: throw away all what you don't need as early as possible (KEEP lists for input datasets, WHERE instead of IF, ...) - avoid unnecessary steps (sorts, data-steps, ...) - avoid the use of "mighty" PROCs if you can do it with a small DATA step: some aggregations are much faster to do with a DATA step than with PROC SUMMARY. - keep the code in loops (remember: each DATA - step IS a kind of loop!) as small as possible.

%let n=0; data a.b; set x.y; call symput("n",_n_); .... run;

brings you in the macro variable &n the number of obs in a.b (besides other things). Better:

%let n=0; data a.b; set x.y; /* call symput("n",_n_); */ .... run;

data _null_; set a.b nobs=n; call symput("n",n); stop; run;

because in the first solution the call symput is executed as much as you have obs in a.b. Ok, you could force it to be executed only once:

%let n=0; data a.b; set x.y nobs=n; if _n_=1 then call symput("n",n); .... run;

but in this case you have the branch executed very often.

Ok, you can say you want MVS tips for big datasets without any thoughts about programming. I mean, that is not the only right way! The sort-problem is a good example: a quicksort or mergesort is always more efficient as a simple bubblesort. Always? No, only for big datasets and only if you do it more than once! I'd prefer always the bubblesort which I can write in 5 minutes to sort a dataset once, even if I have to wait 1 hour until it's ready. If I need 2 hours to construct a sort which sorts my dataset in 5 minutes it is for me only needful, if I can use it more than once! When I have to do something in SAS, I always use the "quick and dirty way" first. If I have to do the same thing often, I try to optimize it a bit. If I have to do it very often and it is important for me to get it fast, I investigate time to optimize it more! Not if I'm payed for producing results and it's cheaper for my site to buy a bigger machine than to pay me for reducing the resource consumption! In this environment I see the things above: e.g. using the half-track size for DASD without any experiments to get out another millisecond is ok. Using WHERE instead of IF is ok also and have some rules in the background to make it not unnecessary inefficient. A limit for me is for example: do I use the slow SASHELP - views, or do I use a utility PROC (PROC CATALOG, DATASETS, ...) to get some infos. First I ALWAYS use the SASHELP view. If I need that program in production and it is running every day, I replace the information extraction with a faster solution (if I have time). Only if someone tells me, that my program have only 5 minutes to run, but it needs 10 minutes, I'll do that in a program which runs once a month. So you see: always be careful with the expensive resources, not with the cheap ones! Gerhard


Back to: Top of message | Previous page | Main SAS-L page