LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2000, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 12 Jan 2000 12:40:28 -0500
Reply-To:     Douglas Dame <dougdame@HPE.UFL.EDU>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Douglas Dame <dougdame@HPE.UFL.EDU>
Subject:      Re: Anyone with a macro to split file (w a by statement)?
Comments: To: Howard Schreier <Howard_Schreier@ITA.DOC.GOV>
Comments: cc: cruguel@DMS.UMONTREAL.CA
In-Reply-To:  <s87c663a.064@ita.doc.gov>
Content-Type: text/plain; charset="us-ascii"

Howard Schreier wrote:

> > I would try to avoid running any SAS step or > series of steps that many times. > > <snip snip> > > You did not give a lot of particulars, and perhaps something > rules out this approach. But it's always worth a look before > getting into something messy. >

In the words of (?) Oscar Wilde, I wish I had said that.

While running through a bunch of stuff repetitively in a macro-powered loop sometimes works well, bear in mind that the CPU (and clock) time associated with running any proc or data step has two main parts: a more-or-less linear variable component associated the size of the dataset you're processing, and a more or less fixed up-front "overhead" cost of loading the proc into memory so it's ready for use. (And there's also some overhead hardware-related latency time associated with the disk drive getting its head/s to the data, I'll ignore that, I try to live way above the hardware level.)

The cumulative time is takes to invoke proc summary, as an example, 1700+ times is not inconsequential by any means. At a guess, say the overhead cost of 0.7 CPU per invocation. The extra overhead needed to invoke that proc alone 1700 times, instead of once on a much larger dataset, is 19.8 MINUTES, with exactly the same amount of data being processed. Throw a few more procs or data steps of various kinds into the loop, and you're easily looking an hour or two of additional CPU time being chewed up. Elapsed wall-time would depend on your computing environment, maybe 1.2 to 1.5 times the CPU seconds on a one-person workstation, perhaps as bad as 10 times as much for a medium/low priority job in a busy batch mainframe environment where it got swapped out a lot for fairly extended periods.

If you're forced to loop through some section of code 100's or 1000's of times to deal individually with subsets of your data, it pays to think hard about how much of your pre-processing can be accomplished PRIOR to MrMacroLoop, using "by-group" processing.

As an extreme but real example, I once re-engineered the macro-loops in a production job that was generating complaints due to a long run time. By moving pre-processing to by-groups, and leaving only the bare minimum of stuff in MrMacroLoop that absolutely had to be there, an 6-hour-plus run-time was reduced to 9 minutes. (Following that, my reputation was sterling for the rest of the day. Ah, those were the good days.)

(The floor will now entertain discussion on today's debate topic:" SAS macro do/end loops are available to the community of SAS programmers due to a diabolical long-standing conspiracy between Mr. Watson, Mr. Grove, Mr. Gates, Dr. Goodnight, and the National Association of Public Utilities, with the express secret objective of increasing sales of computer hardware and usage of electricity; True or False?" </little_joke>)

HTH/somebody/someplace/sometime

Douglas Dame Shands HealthCare Gainesville FL


Back to: Top of message | Previous page | Main SAS-L page