Date: Mon, 23 Apr 2007 23:55:21 +0000
Reply-To: toby dunn <tobydunn@HOTMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: toby dunn <tobydunn@HOTMAIL.COM>
Subject: Re: Index file too large- is this a problem?
In-Reply-To: <200704232300.l3NMCkhw023847@mailgw.cc.uga.edu>
Content-Type: text/plain; format=flowed
Sounds like you shouldnt use ID as part of your index. ID will yeild no
better results than if you just sorted and merged off of Id because it is
too discrete. So I doubt you will find much of an increase in speed.
Toby Dunn
You can see a lot by just looking. ~Yogi Berra
Do not seek to follow in the footsteps of the wise. Seek what they sought.
~Matsuo Basho
You never know what is enough, until you know what is more than enough.
~William Blake, Proverbs of Hell
From: Raj <rajasekhargo@YAHOO.COM>
Reply-To: Raj <rajasekhargo@YAHOO.COM>
To: SAS-L@LISTSERV.UGA.EDU
Subject: Index file too large- is this a problem?
Date: Mon, 23 Apr 2007 19:00:15 -0400
Hello,
We have a SAS dataset with just 3 variables- an ID, date and a return
field. There are about 20,000 unique IDs and each of them has a return
value on each date (ranging over the last 30 years). We stored this data
cumulatively. So the table has 20,000 * 280 business days = 5.6 million
records. We usually query a small percentage of data from this table (e.g.
1 year of data for one ID ~ 280 records). So I thought it would make sense
to create a composite index on ID and date. But since there are so many
unique key values, the index file size turned out to be quite huge (~
1GB). The size of original dataset is not significantly larger than this
(because there is just one additional variable).
When a query is submitted in SAS, I know that the index file is first
loaded into memory and then the selected records are displayed. But since
our index file is so huge, will this cause any problem? FYI, we have a
client-server setting where all SAS processing happens on one central
server. If multiple clients are querying the same dataset simultaneously,
will SAS load multiple copies of this index file per user, OR can it use
the same index file already loaded into memory?
If there is any better design of table that can store the same data, I
welcome any suggestions.
Thanks in advance,
Raj
_________________________________________________________________
The average US Credit Score is 675. The cost to see yours: $0 by Experian.
http://www.freecreditreport.com/pm/default.aspx?sc=660600&bcd=EMAILFOOTERAVERAGE