LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2004, week 5)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Tue, 31 Aug 2004 14:42:07 +0100
Reply-To:     steve.wills@BARCLAYS.CO.UK
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Steve Wills <steve.wills@BARCLAYS.CO.UK>
Subject:      Re: Conserving cpu & real time in datasteps involving large datas
              ets
Content-Type: text/plain; charset="iso-8859-1"

How do I un-subscribe from this list server.

-----Original Message----- From: Dunn, Toby [mailto:Toby.Dunn@TEA.STATE.TX.US] Sent: 31 August 2004 14:38 To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Conserving cpu & real time in datasteps involving large datasets

I would have to add that I h=find it is better to drop or keep vars on the set statement unless I have need of those variables for some processing then if I no longer need them I drop them on the data set side. I have found that in our system it does make a difference when processing large wide datasets. Also always drop, keep, and sort as soon as possible and sort to the highest level granularity as possible (I think that is right). There have been several papers one written by Micheal Raithel that covers sorting large data sets efficiently. And as Chang so wisely said never try to do in more than one data step what can be done in one.

HTH Toby Dunn

-----Original Message----- From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Chang Y. Chung Sent: Tuesday, August 31, 2004 8:29 AM To: SAS-L@LISTSERV.UGA.EDU Subject: Re: Conserving cpu & real time in datasteps involving large datasets

On Tue, 31 Aug 2004 04:16:43 -0700, Dennis Diskin <popdiskin- sas1@YAHOO.COM> wrote:

>Keith, > >The second data step (with the KEEP as an option on the input dataset is the one you want. > >WHERE definitly is more efficient than a subsetting IF. It shouldn't matter if it an option on the input dataset or a separate statement. > >You can use both dataset options together (including in a PROC SQL). > >HTH, >Dennis Diskin > > >Keith Dunnigan <dunnigan_k@YAHOO.COM> wrote: >Hi all, > >I don't usually work with such large datasets that time is an issue, >but I am presently working on a project that deals with hundreds of >millions of wide records, hence time is important. > >Any advice on how to run a few basic datasteps, merges, etc more time >efficiently is appreciated. > >For instance, in the case of the reading in of data from a large >permanent dataset into a temporary one. Let's say we have a 100 >million observation permanent dataset, call it perm.dat. Let's say it >has one thousand variables, call them var1, var2, ..., var1000. If I >want to only read in 13 variables into a work dataset, what's the >quickest way to do that? Possibly one of the following: > >Data dat; >set temp.dat; >keep var1-var13; >run; > >Data dat; >set temp.dat (keep = var1-var13); >run; > >Data dat(keep = var1-var13); >set temp.dat; >run; > >... Or are there others? Also would using a proc sql statement be >quicker than using a data statment? If so, what form would work the >quickest? > >On a similar take, if I want to read in only a subset of the >observations, I take it a 'where' statement works quicker than an 'if' >statement. Where should it be placed (again, in the data line, the set >line, or below?). > >Similar comments on match merges would be welcomed also. >Alternately, if there is a section online are in the common sas >documentation that deals with this, perhaps you could refer me to it. > >Many thanks in advance!

Hi, Keith,

I agree with Dennis. Drop= or keep= data set options with set statement keeps unneeded variables out of the pdf. This is also what the Tip 4.6 in page 27 in the book titled, "SAS Programming Tips: A Guide to Efficient SAS Processing" by SAS (ISBN 1-55544-431-8).

Frankly, though, I would say the choice among the three may make very small difference compared to the main bottleneck of just going through the hundred million records at least once, which you cannot avoid in any case.

So, the biggest issue is to make sure that you plan ahead and do this the least number of times -- The goal should be that you do this once and after that you live with the extract, the goal I have never achieved myself though, unfortunately.... :-/

Good luck.

Cheers, Chang

Internet communications are not secure and therefore the Barclays Group does not accept legal responsibility for the contents of this message. Although the Barclays Group operates anti-virus programmes, it does not accept responsibility for any damage whatsoever that is caused by viruses being passed. Any views or opinions presented are solely those of the author and do not necessarily represent those of the Barclays Group. Replies to this email may be monitored by the Barclays Group for operational or business reasons.

Barclays Bank PLC. Authorised and regulated by the Financial Services Authority.


Back to: Top of message | Previous page | Main SAS-L page