Date: Thu, 19 Mar 1998 01:14:20 -0800
Reply-To: "email@example.com" <firstname.lastname@example.org>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Karsten M. Self" <kmself@IX.NETCOM.COM>
Organization: Self Analysis
Subject: Re: Overview of creating and using indexes
Content-Type: text/plain; charset="us-ascii"
I also ran some experimentation of different methods of updating indexed
and non-indexed datasets. While both SQL (INSERT INTO) and PROC APPEND
will automatically update an existing SAS index as records are added to an
existing dataset, the process is very inefficient when dealing with large
appends (append obs are approximately the same as base), compared with
breaking and recreating the index.
Using sample data (two variables, 100,000 records, first var sequential,
second random integers 1 <= var <= 1000), timings were approximately as
(Timings on an un-loaded Sun Enterprise 3000, approximate
average-from-memory of three runs)
Create data: 1.5 seconds
Copy data (data new; set old; run;): 0.75 seconds
Index data (simple index on random var): 3 seconds
APPEND or INSERT into indexed dataset: 12 minutes
APPEND or INSERT into non-indexed dataset: 3 seconds
- for large datasets to which records are added transactionally (append
size is very small fraction of base), inserting or appending to an indexed
dataset makes sense.
- for bulk loading (append size is a significant fraction of base),
breaking and rebuilding the index makes more sense. This is particularly
true if loading new data can be scheduled during regular (nightly,
weekly,...) system downtime.
This is consistent with general guidelines and recommendations for most
On Monday, March 16, 1998 10:54 AM, Jack Hamilton
> The use of indexes to subset data has been discussed recently. Some
> sample code to create and use indexes is shown below.
<snip (-- see the original post, it's good) >
Karsten M. Self (email@example.com)
What part of "gestalt" don't you understand?
(Welchen Teil von "Gestalt" verstehen Sie nicht?)