Date: Fri, 20 Jun 2003 10:31:10 -0700
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "David L. Cassell" <cassell.david@EPAMAIL.EPA.GOV>
Subject: Re: How to speed up a SAS process?
Content-type: text/plain; charset=iso-8859-1
SAS User <sasuser@GUILDENSTERN.DYNDNS.ORG> sagely replied:
> > on Thu, Jun 19, 2003 at 05:22:54PM -0700, Annie Chang
> > When I deal with large datasets, say with 10 millions of
> > of 200 variables, and try to sort it, sometimes it works reasonably
> > fine (I used option "tagsort"), sometime for the same datasets, it
> > takes forever to run and never finishes. I was sort of puzzled since
> > the CPU doesn't look busy at all and somehow SAS just decided to
> > a break in this kind of situations.
> > Is something I could do in addition to get a better computer? (I do
> > have one with very large HD and 500 MB though ).
> Don't sort your data.
> Revisit your processing. Identify what you need to do with the
> SAS offers a number of tools (CLASS processing, KEEP & DROP
> WHERE= dataset options) which can reduce dataload or eliminate need
> sorts. Array processing is another trick (there are examples in the
> SAS-L archives of people creating arrays with tens of millions of of
> elements). Unconventional thinking can have impressive payoffs.
> Googling "sas efficiency" will provide a number of references.
> Tagsort in particular is optimized for "fat" datasets -- fewer rows,
> many columns.
Exactly. The eponymous "SAS User" knows of what s?he speaks.
If you're hitting a boundary like this, then you may be in a situation
you have just barely enough work room on your hard drive for sorting the
SAS likes to have roughly 3 times the size of the file for optimal
And the work of the sorting is done primarily as read/write on that
So you would expect to see minimal CPU usage while the disk drives
mightily to cope with writing and re-writing data.
If you really need to do complex sorting, may I suggest my paper
SUGI 26: A Sort of a Mess -- Sorting Large Datasets on Multiple Keys
Paper 121-261 A Sort of a Mess ?Sorting Large Datasets on Multiple Keys
David L. Cassell
http://www2.sas.com/proceedings/sugi26/p121-26.pdf - 105.2KB -
Alternatively, I really recommend that you consider re-structuring your
entire process. If you find yourself sorting and re-sorting your data,
don't. Look at indexing instead. If you find yourself sorting in order
pull out small pieces based on some combination of variables, then
Look at indexing, or DATA step programming as an alternative. There are
tons of ways out of this bind if you have the time to sit and think. I
the above paper when I was faced with sorting Gigabytes of data and
random keys, then re-sorting, and... So, when faced with doing 13
sort-and-DATA-step processes, I was forced to take the old code and
the process. It now runs as one PROC DATASETS (to do an index), one
and then another indexing. No sorting.
David Cassell, CSC
Senior computing specialist