Date: Tue, 31 Aug 2004 14:06:06 -0700
Reply-To: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: "Choate, Paul@DDS" <pchoate@DDS.CA.GOV>
Subject: Re: Conserving cpu & real time in datasteps involving large datas
ets
Keith -
Toby just reminded me of loading a SAS dataset into memory (if it fits and
you need to read it more than once).
sasfile dsname open;
data stuff;
set stuff;
run;
sasfile dsname close;
This avoids repetitive IO.
Hth!
Paul Choate
DDS Data Extraction
(916) 654-2160
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dunn,
Toby
Sent: Tuesday, August 31, 2004 1:49 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Conserving cpu & real time in datasteps involving large
datasets
I just remembered if you are running on the big iron you might want to
look into using hyperspace.
Toby
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
Dennis Diskin
Sent: Tuesday, August 31, 2004 6:17 AM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Re: Conserving cpu & real time in datasteps involving large
datasets
Keith,
The second data step (with the KEEP as an option on the input dataset is
the one you want.
WHERE definitly is more efficient than a subsetting IF. It shouldn't
matter if it an option on the input dataset or a separate statement.
You can use both dataset options together (including in a PROC SQL).
HTH,
Dennis Diskin
Keith Dunnigan <dunnigan_k@YAHOO.COM> wrote:
Hi all,
I don't usually work with such large datasets that time is an issue,
but I am presently working on a project that deals with hundreds of
millions of wide records, hence time is important.
Any advice on how to run a few basic datasteps, merges, etc more time
efficiently is appreciated.
For instance, in the case of the reading in of data from a large
permanent dataset into a temporary one. Let's say we have a 100
million observation permanent dataset, call it perm.dat. Let's say it
has one thousand variables, call them var1, var2, ..., var1000. If I
want to only read in 13 variables into a work dataset, what's the
quickest way to do that? Possibly one of the following:
Data dat;
set temp.dat;
keep var1-var13;
run;
Data dat;
set temp.dat (keep = var1-var13);
run;
Data dat(keep = var1-var13);
set temp.dat;
run;
... Or are there others? Also would using a proc sql statement be
quicker than using a data statment? If so, what form would work the
quickest?
On a similar take, if I want to read in only a subset of the
observations, I take it a 'where' statement works quicker than an 'if'
statement. Where should it be placed (again, in the data line, the set
line, or below?).
Similar comments on match merges would be welcomed also.
Alternately, if there is a section online are in the common sas
documentation that deals with this, perhaps you could refer me to it.
Many thanks in advance!
Keith Dunnigan
Consulting Statistician
Systems Seminar Consultants
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail