LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2004, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 31 Aug 2004 15:48:36 -0500
Reply-To:     "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         "Dunn, Toby" <Toby.Dunn@TEA.STATE.TX.US>
Subject:      Re: Conserving cpu & real time in datasteps involving large
              datasets
Content-Type: text/plain; charset="us-ascii"

I just remembered if you are running on the big iron you might want to look into using hyperspace.

Toby

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Dennis Diskin Sent: Tuesday, August 31, 2004 6:17 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Conserving cpu & real time in datasteps involving large datasets

Keith,

The second data step (with the KEEP as an option on the input dataset is the one you want.

WHERE definitly is more efficient than a subsetting IF. It shouldn't matter if it an option on the input dataset or a separate statement.

You can use both dataset options together (including in a PROC SQL).

HTH, Dennis Diskin

Keith Dunnigan <dunnigan_k@YAHOO.COM> wrote: Hi all,

I don't usually work with such large datasets that time is an issue, but I am presently working on a project that deals with hundreds of millions of wide records, hence time is important.

Any advice on how to run a few basic datasteps, merges, etc more time efficiently is appreciated.

For instance, in the case of the reading in of data from a large permanent dataset into a temporary one. Let's say we have a 100 million observation permanent dataset, call it perm.dat. Let's say it has one thousand variables, call them var1, var2, ..., var1000. If I want to only read in 13 variables into a work dataset, what's the quickest way to do that? Possibly one of the following:

Data dat; set temp.dat; keep var1-var13; run;

Data dat; set temp.dat (keep = var1-var13); run;

Data dat(keep = var1-var13); set temp.dat; run;

... Or are there others? Also would using a proc sql statement be quicker than using a data statment? If so, what form would work the quickest?

On a similar take, if I want to read in only a subset of the observations, I take it a 'where' statement works quicker than an 'if' statement. Where should it be placed (again, in the data line, the set line, or below?).

Similar comments on match merges would be welcomed also. Alternately, if there is a section online are in the common sas documentation that deals with this, perhaps you could refer me to it.

Many thanks in advance!

Keith Dunnigan Consulting Statistician Systems Seminar Consultants

__________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage! http://promotions.yahoo.com/new_mail


Back to: Top of message | Previous page | Main SAS-L page