LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (June 2006, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Jun 2006 13:29:41 -0400
Reply-To:     Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject:      Re: Another question about indexes
Comments: To: Eric Eyre <eeyre@concentric.net>
Content-Type: text/plain; charset="us-ascii"

Erico: Guess that you are referring to what I said about indexes in this thread: 'As for 1., indexes on subsetting variables may speed up selection of subsets, but will probably not be needed on the date and foo variables ...'

While I agree that indexing can in some situations greatly improve efficiency of data access, I don't see that indexing has a very good chance of speeding up searches in the situation given: 1. five different datasets that would have to be indexed individually or read row by row into a dataset prior to indexing; 2. three search constraints on three different keys; 3. filters can be applied as the five datasets are being read initially.

If the three subsetting indexes already exist on the five datasets, then SAS will likely use them as an alternative to sorting on three different orders; nonetheless, creating the indexes for one application will likely cost more than scanning and filtering of the five datasets. Should the proportion of rows that elude the filter turn out to greater than 20%, SAS will likely not use indexes in subsetting operations. I'd start by using WHERE statements to subset each of the five datasets individually, then combine the yields.

When datasets have many columns and an application defined on them requires far fewer columns, then a KEEP option or SELECT clause in a UNION of queries may also reduce I/O, sometimes dramatically. Only when searches on subsidiary keys occur often would indexing of many datasets improve efficiency. Sig -----Original Message----- From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu] On Behalf Of Eric Eyre Sent: Sunday, June 18, 2006 4:16 AM To: sas-l@uga.edu Subject: Re: Another question about indexes

Are you sure about that SH?

If the input is a sas data set (as opposed to a raw data file) as in

data two;

set one;

where x<5;

The WHERE will select qualifying logical obs for movement to the PDV after the data has reached the page buffer in memory, thus ultimately saving cpu resources, but no i/o.

Backing up a little bit to the i/o department, if DATA ONE residing in disk storage has an index x on var x, and sas ends up using the index, then sas will use the index to move only the PAGES of data that contain qualifying logical obs into the page buffer in memory, thus conserving i/o, since sas has fewer pages to load. This is commensurate with the fact that sas data sets are arranged internally in pages and in sas page size is conceptually and often literally aligned with buffer size.

So with judicious index use you can often get a 2fer - cpu savings from the where, i/o savings from the index

Erico

"Scott Bass" <sas_l_739@yahoo.com.au> wrote in message news:449161a9$0$7147$afc38c87@news.optusnet.com.au... > Hi, > > Say I have datasets data_2001 data_2002 data_2003 data_2004 data_2005.

> I index them on the more commonly used subsetting variables. They are

> simple (not composite) indexes. > > 1. If a where clause is something like: > > where date between start and end and foo in ("foo", "bar", "blah") and

> code=5 > > should I assume that the indexes will greatly speed up the where > clause processing? Sorry I know this is a bit vague - does anyone > know where to look in the doc for details on how indexes are used with

> where clauses? > > 2. If I create a data step view such as: > > data total / view=total; > set data_2001 data_2002 data_2003 data_2004 data_2005; run; > > then use the same where clause as above, but on the DSV, will the > indexes help performance? > > Thanks, > Scott >


Back to: Top of message | Previous page | Main SAS-L page