LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2004, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Tue, 31 Aug 2004 17:43:38 -0400
Reply-To:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:   Re: Conserving cpu & real time in datasteps involving large datas ets
Comments:   To: Keith Dunnigan <dunnigan_k@YAHOO.COM>
Content-Type:   text/plain

Keith: Absolutely the fastest way to subset from N to n column variables?

proc sql; create view subsetVW as select var1,var2,.... from superset where ..... ; quit;

This view definition does not read data from superset. It defines a 'virtual dataset' that SAS SQL, other SAS procedures, or Data step's can read as subsetVW. One can even define new column variables in a view.

Certain limitations apply. In general one would not want to execute the view more than once. Sig

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Keith Dunnigan Sent: Monday, August 30, 2004 3:49 PM To: SAS-L@LISTSERV.UGA.EDU Subject: Conserving cpu & real time in datasteps involving large datasets

Hi all,

I don't usually work with such large datasets that time is an issue, but I am presently working on a project that deals with hundreds of millions of wide records, hence time is important.

Any advice on how to run a few basic datasteps, merges, etc more time efficiently is appreciated.

For instance, in the case of the reading in of data from a large permanent dataset into a temporary one. Let's say we have a 100 million observation permanent dataset, call it perm.dat. Let's say it has one thousand variables, call them var1, var2, ..., var1000. If I want to only read in 13 variables into a work dataset, what's the quickest way to do that? Possibly one of the following:

Data dat; set temp.dat; keep var1-var13; run;

Data dat; set temp.dat (keep = var1-var13); run;

Data dat(keep = var1-var13); set temp.dat; run;

... Or are there others? Also would using a proc sql statement be quicker than using a data statment? If so, what form would work the quickest?

On a similar take, if I want to read in only a subset of the observations, I take it a 'where' statement works quicker than an 'if' statement. Where should it be placed (again, in the data line, the set line, or below?).

Similar comments on match merges would be welcomed also. Alternately, if there is a section online are in the common sas documentation that deals with this, perhaps you could refer me to it.

Many thanks in advance!

Keith Dunnigan Consulting Statistician Systems Seminar Consultants

__________________________________ Do you Yahoo!? New and Improved Yahoo! Mail - 100MB free storage!

Back to: Top of message | Previous page | Main SAS-L page