LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2010, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 9 Sep 2010 08:50:39 -0700
Reply-To:     Sterling Paramore <gnilrets@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Sterling Paramore <gnilrets@GMAIL.COM>
Subject:      Re: monthly update
In-Reply-To:  <16FD64291482A34F995D2AF14A5C932C09632173@MAIL002.prod.ds.russell.com>
Content-Type: text/plain; charset=ISO-8859-1

Good advice in general. However, I'd just like to add a few counterpoints from my own experience.

I maintain a very large table (10^8 rows) that has a similar structure, e.g., something like:

item stuff m0 m1 m2 m3 1 stuff blah blah blah blah 2 stuff blah blah blah blah 3 stuff blah blah blah blah

My problem is that "stuff" is fairly long (300 bytes) and a new "m" column comes along every month. If I were to put the m's in rows rather than columns, my table would grow by 10^8 rows every month and most of it would be redundant ("stuff" does not change with "m").

I could normalize the data and put it in two tables, one with stuff and a key linking to a table of m's. The problem with that solution is (1) users very frequently have to select rows based on one of the m's so a merge would be required nearly every time this table was used (2) my user base does not like merging and (3) any merge would require a sort, and sorting a 30 GB table is not fun.

-Sterling

On Thu, Sep 9, 2010 at 8:30 AM, Terjeson, Mark <Mterjeson@russell.com>wrote:

> Hi, > > Yes, renaming variables is possible, however, > when data handling starts requiring repetitive > manual intervention or gets awkward and clumsy > to physically re-arrange it or to update it, > then it is most likely a poor design or scheme > to maintain or update your data. > > A much more elegant, efficient, and easier to > use approach would be to accumulate and store > the data in a vertical more normalized fashion > instead of horizontally like a spreadsheet. So > instead of > item m0 m1 m2 m3 > 1 blah blah blah blah > 2 blah blah blah blah > 2 blah blah blah blah > > you would have > date item attributes > 01FEB2010 1 blahblah > 01FEB2010 2 blahblah > 01FEB2010 3 blahblah > 01MAR2010 1 blahblah > 01MAR2010 2 blahblah > 01MAR2010 3 blahblah > 01APR2010 1 blahblah > 01APR2010 2 blahblah > 01APR2010 3 blahblah > > > > This is for several reasons: > 1. there is no physical restructuring necessary. > > 2. just append each month with its date. > > 3. the SAS datastep, SAS procs, and SQL are > designed to process vertically stored data > much faster and easier, by merely aggregating > on different values such as date or item, etc., > or subsetting on selected values. > > 4. i.e. you can do a lot more with it this way. > > 5. IF, for output requirements or for display > purposes, you can always use PROC TRANSPOSE > to turn it into horizontal ledger columns, > but always keep it stored vertically. > > 6. You can very easily get your m0 m1 m2 m3 > increments computed on the fly for display > or for computation by using the INTCK() > function to compute the increment from the > date field. This automatically keeps the > dynamic adjustment for each month intact. > > 7. etc., etc., etc. > > > > Hope this is helpful. > > > Mark Terjeson > Investment Business Intelligence > Investment Management & Research > Russell Investments > 253-439-2367 > > > Russell > Global Leaders in Multi-Manager Investing > > > > > > > -----Original Message----- > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of > Chang Chung > Sent: Thursday, September 09, 2010 7:52 AM > To: SAS-L@LISTSERV.UGA.EDU > Subject: monthly update > > An interesting question posted somewhere else. It goes, in part: > > "I've got a data table with text variables named m0, m1 up to m62. Each > month > this dataset is updated, with the latest month's data stored in m0 while > all > the rest need to move up on index, i.e. m0 should be renamed m1, m1 > should be > renamed m2, etc. up to m62 which should be renamed m63." >


Back to: Top of message | Previous page | Main SAS-L page