LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous (more recent) messageNext (less recent) messagePrevious (more recent) in topicNext (less recent) in topicPrevious (more recent) by same authorNext (less recent) by same authorPrevious page (June 2003, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 20 Jun 2003 10:31:10 -0700
Reply-To:     cassell.david@EPAMAIL.EPA.GOV
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject:      Re: How to speed up a SAS process?
Content-type: text/plain; charset=iso-8859-1

SAS User <sasuser@GUILDENSTERN.DYNDNS.ORG> sagely replied: > > on Thu, Jun 19, 2003 at 05:22:54PM -0700, Annie Chang (chang5a@YAHOO.COM) wrote: > > When I deal with large datasets, say with 10 millions of observations > > of 200 variables, and try to sort it, sometimes it works reasonably > > fine (I used option "tagsort"), sometime for the same datasets, it > > takes forever to run and never finishes. I was sort of puzzled since > > the CPU doesn't look busy at all and somehow SAS just decided to take > > a break in this kind of situations. > > > > Is something I could do in addition to get a better computer? (I do > > have one with very large HD and 500 MB though ).

> Don't sort your data. > > Revisit your processing. Identify what you need to do with the dataset. > SAS offers a number of tools (CLASS processing, KEEP & DROP statements, > WHERE= dataset options) which can reduce dataload or eliminate need for > sorts. Array processing is another trick (there are examples in the > SAS-L archives of people creating arrays with tens of millions of of > elements). Unconventional thinking can have impressive payoffs. > Googling "sas efficiency" will provide a number of references. > > Tagsort in particular is optimized for "fat" datasets -- fewer rows, > many columns.

Exactly. The eponymous "SAS User" knows of what s?he speaks.

If you're hitting a boundary like this, then you may be in a situation where you have just barely enough work room on your hard drive for sorting the file. SAS likes to have roughly 3 times the size of the file for optimal sorting. And the work of the sorting is done primarily as read/write on that drive. So you would expect to see minimal CPU usage while the disk drives struggle mightily to cope with writing and re-writing data.

If you really need to do complex sorting, may I suggest my paper

SUGI 26: A Sort of a Mess -- Sorting Large Datasets on Multiple Keys Paper 121-261 A Sort of a Mess ?Sorting Large Datasets on Multiple Keys David L. Cassell http://www2.sas.com/proceedings/sugi26/p121-26.pdf - 105.2KB -

Alternatively, I really recommend that you consider re-structuring your entire process. If you find yourself sorting and re-sorting your data, then don't. Look at indexing instead. If you find yourself sorting in order to pull out small pieces based on some combination of variables, then don't. Look at indexing, or DATA step programming as an alternative. There are tons of ways out of this bind if you have the time to sit and think. I wrote the above paper when I was faced with sorting Gigabytes of data and adding random keys, then re-sorting, and... So, when faced with doing 13 consecutive sort-and-DATA-step processes, I was forced to take the old code and revisit the process. It now runs as one PROC DATASETS (to do an index), one DATA step, and then another indexing. No sorting.

HTH, David -- David Cassell, CSC Cassell.David@epa.gov Senior computing specialist mathematical statistician


Back to: Top of message | Previous page | Main SAS-L page