| Date: | Tue, 31 Aug 2004 17:43:38 -0400 |
| Reply-To: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Sender: | "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU> |
| From: | Sigurd Hermansen <HERMANS1@WESTAT.COM> |
| Subject: | Re: Conserving cpu & real time in datasteps involving large datas
ets |
|
| Content-Type: | text/plain |
|---|
Keith:
Absolutely the fastest way to subset from N to n column variables?
proc sql;
create view subsetVW as
select var1,var2,....
from superset
where .....
;
quit;
This view definition does not read data from superset. It defines a 'virtual
dataset' that SAS SQL, other SAS procedures, or Data step's can read as
subsetVW. One can even define new column variables in a view.
Certain limitations apply. In general one would not want to execute the view
more than once.
Sig
-----Original Message-----
From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of Keith
Dunnigan
Sent: Monday, August 30, 2004 3:49 PM
To: SAS-L@LISTSERV.UGA.EDU
Subject: Conserving cpu & real time in datasteps involving large datasets
Hi all,
I don't usually work with such large datasets that time is an issue, but I
am presently working on a project that deals with hundreds of millions of
wide records, hence time is important.
Any advice on how to run a few basic datasteps, merges, etc more time
efficiently is appreciated.
For instance, in the case of the reading in of data from a large permanent
dataset into a temporary one. Let's say we have a 100 million observation
permanent dataset, call it perm.dat. Let's say it has one thousand
variables, call them var1, var2, ..., var1000. If I want to only read in 13
variables into a work dataset, what's the quickest way to do that? Possibly
one of the following:
Data dat;
set temp.dat;
keep var1-var13;
run;
Data dat;
set temp.dat (keep = var1-var13);
run;
Data dat(keep = var1-var13);
set temp.dat;
run;
... Or are there others? Also would using a proc sql statement be quicker
than using a data statment? If so, what form would work the quickest?
On a similar take, if I want to read in only a subset of the observations,
I take it a 'where' statement works quicker than an 'if' statement. Where
should it be placed (again, in the data line, the set line, or below?).
Similar comments on match merges would be welcomed also. Alternately, if
there is a section online are in the common sas documentation that deals
with this, perhaps you could refer me to it.
Many thanks in advance!
Keith Dunnigan
Consulting Statistician
Systems Seminar Consultants
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail
|