Date: Thu, 9 Sep 2010 08:50:39 -0700
Reply-To: Sterling Paramore <gnilrets@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sterling Paramore <gnilrets@GMAIL.COM>
Subject: Re: monthly update
Content-Type: text/plain; charset=ISO-8859-1
Good advice in general. However, I'd just like to add a few counterpoints
from my own experience.
I maintain a very large table (10^8 rows) that has a similar structure,
e.g., something like:
item stuff m0 m1 m2 m3
1 stuff blah blah blah blah
2 stuff blah blah blah blah
3 stuff blah blah blah blah
My problem is that "stuff" is fairly long (300 bytes) and a new "m" column
comes along every month. If I were to put the m's in rows rather than
columns, my table would grow by 10^8 rows every month and most of it would
be redundant ("stuff" does not change with "m").
I could normalize the data and put it in two tables, one with stuff and a
key linking to a table of m's. The problem with that solution is (1) users
very frequently have to select rows based on one of the m's so a merge would
be required nearly every time this table was used (2) my user base does not
like merging and (3) any merge would require a sort, and sorting a 30 GB
table is not fun.
On Thu, Sep 9, 2010 at 8:30 AM, Terjeson, Mark <Mterjeson@russell.com>wrote:
> Yes, renaming variables is possible, however,
> when data handling starts requiring repetitive
> manual intervention or gets awkward and clumsy
> to physically re-arrange it or to update it,
> then it is most likely a poor design or scheme
> to maintain or update your data.
> A much more elegant, efficient, and easier to
> use approach would be to accumulate and store
> the data in a vertical more normalized fashion
> instead of horizontally like a spreadsheet. So
> instead of
> item m0 m1 m2 m3
> 1 blah blah blah blah
> 2 blah blah blah blah
> 2 blah blah blah blah
> you would have
> date item attributes
> 01FEB2010 1 blahblah
> 01FEB2010 2 blahblah
> 01FEB2010 3 blahblah
> 01MAR2010 1 blahblah
> 01MAR2010 2 blahblah
> 01MAR2010 3 blahblah
> 01APR2010 1 blahblah
> 01APR2010 2 blahblah
> 01APR2010 3 blahblah
> This is for several reasons:
> 1. there is no physical restructuring necessary.
> 2. just append each month with its date.
> 3. the SAS datastep, SAS procs, and SQL are
> designed to process vertically stored data
> much faster and easier, by merely aggregating
> on different values such as date or item, etc.,
> or subsetting on selected values.
> 4. i.e. you can do a lot more with it this way.
> 5. IF, for output requirements or for display
> purposes, you can always use PROC TRANSPOSE
> to turn it into horizontal ledger columns,
> but always keep it stored vertically.
> 6. You can very easily get your m0 m1 m2 m3
> increments computed on the fly for display
> or for computation by using the INTCK()
> function to compute the increment from the
> date field. This automatically keeps the
> dynamic adjustment for each month intact.
> 7. etc., etc., etc.
> Hope this is helpful.
> Mark Terjeson
> Investment Business Intelligence
> Investment Management & Research
> Russell Investments
> Global Leaders in Multi-Manager Investing
> -----Original Message-----
> From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
> Chang Chung
> Sent: Thursday, September 09, 2010 7:52 AM
> To: SAS-L@LISTSERV.UGA.EDU
> Subject: monthly update
> An interesting question posted somewhere else. It goes, in part:
> "I've got a data table with text variables named m0, m1 up to m62. Each
> this dataset is updated, with the latest month's data stored in m0 while
> the rest need to move up on index, i.e. m0 should be renamed m1, m1
> should be
> renamed m2, etc. up to m62 which should be renamed m63."