Date: Thu, 9 Sep 2010 09:58:57 -0700
Reply-To: Sterling Paramore <gnilrets@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sterling Paramore <gnilrets@GMAIL.COM>
Subject: Re: monthly update
In-Reply-To: <AANLkTikc6O0pfG5u6NvAOUaC9FftNx-QMfpXksWr727g@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
A significant portion of my user base are basic EG or AMO-only users, so a
macro is not an option. I've tried creating views several times, but never
been successful. The view version is always significantly slower than the
dataset version. Sure, the view will save disk space, but disk space is
cheap (although the bureaucracy that goes into getting the IT department to
give us new space is not!).
-Sterling
On Thu, Sep 9, 2010 at 9:07 AM, Joe Matise <snoopy369@gmail.com> wrote:
> Just use an index on 'item' and 'month' (a composite index)? What you're
> describing sounds like a very common database structure to me - two
> normalized tables, both indexed on the key that joins them, join them
> together to find which from one table you need based on value in other
> table. Not difficult and not expensive time-wise. Assuming you are
> typically filtering down significantly (ie, a common query takes 20% or
> something of the data, or even less) then the two table structure and join
> is actually faster than the one table structure, as filtering the one table
> that's just m0 m1 etc. is easier than filtering the combined table (again,
> if you index correctly).
>
> Then write a view or a macro or something (Depending on how these queries
> work) to do the 'merge' for your users so they don't have to do the work
> themselves, if they're that lazy. Or kick them in the rear and tell them to
> grow up and join the big boy's world if they want to use large data
> structures ;)
>
> -Joe
>
>
> On Thu, Sep 9, 2010 at 10:50 AM, Sterling Paramore <gnilrets@gmail.com>wrote:
>
>> Good advice in general. However, I'd just like to add a few counterpoints
>> from my own experience.
>>
>> I maintain a very large table (10^8 rows) that has a similar structure,
>> e.g., something like:
>>
>> item stuff m0 m1 m2 m3
>> 1 stuff blah blah blah blah
>> 2 stuff blah blah blah blah
>> 3 stuff blah blah blah blah
>>
>> My problem is that "stuff" is fairly long (300 bytes) and a new "m" column
>> comes along every month. If I were to put the m's in rows rather than
>> columns, my table would grow by 10^8 rows every month and most of it would
>> be redundant ("stuff" does not change with "m").
>>
>> I could normalize the data and put it in two tables, one with stuff and a
>> key linking to a table of m's. The problem with that solution is (1)
>> users
>> very frequently have to select rows based on one of the m's so a merge
>> would
>> be required nearly every time this table was used (2) my user base does
>> not
>> like merging and (3) any merge would require a sort, and sorting a 30 GB
>> table is not fun.
>>
>> -Sterling
>>
>> On Thu, Sep 9, 2010 at 8:30 AM, Terjeson, Mark <Mterjeson@russell.com
>> >wrote:
>>
>> > Hi,
>> >
>> > Yes, renaming variables is possible, however,
>> > when data handling starts requiring repetitive
>> > manual intervention or gets awkward and clumsy
>> > to physically re-arrange it or to update it,
>> > then it is most likely a poor design or scheme
>> > to maintain or update your data.
>> >
>> > A much more elegant, efficient, and easier to
>> > use approach would be to accumulate and store
>> > the data in a vertical more normalized fashion
>> > instead of horizontally like a spreadsheet. So
>> > instead of
>> > item m0 m1 m2 m3
>> > 1 blah blah blah blah
>> > 2 blah blah blah blah
>> > 2 blah blah blah blah
>> >
>> > you would have
>> > date item attributes
>> > 01FEB2010 1 blahblah
>> > 01FEB2010 2 blahblah
>> > 01FEB2010 3 blahblah
>> > 01MAR2010 1 blahblah
>> > 01MAR2010 2 blahblah
>> > 01MAR2010 3 blahblah
>> > 01APR2010 1 blahblah
>> > 01APR2010 2 blahblah
>> > 01APR2010 3 blahblah
>> >
>> >
>> >
>> > This is for several reasons:
>> > 1. there is no physical restructuring necessary.
>> >
>> > 2. just append each month with its date.
>> >
>> > 3. the SAS datastep, SAS procs, and SQL are
>> > designed to process vertically stored data
>> > much faster and easier, by merely aggregating
>> > on different values such as date or item, etc.,
>> > or subsetting on selected values.
>> >
>> > 4. i.e. you can do a lot more with it this way.
>> >
>> > 5. IF, for output requirements or for display
>> > purposes, you can always use PROC TRANSPOSE
>> > to turn it into horizontal ledger columns,
>> > but always keep it stored vertically.
>> >
>> > 6. You can very easily get your m0 m1 m2 m3
>> > increments computed on the fly for display
>> > or for computation by using the INTCK()
>> > function to compute the increment from the
>> > date field. This automatically keeps the
>> > dynamic adjustment for each month intact.
>> >
>> > 7. etc., etc., etc.
>> >
>> >
>> >
>> > Hope this is helpful.
>> >
>> >
>> > Mark Terjeson
>> > Investment Business Intelligence
>> > Investment Management & Research
>> > Russell Investments
>> > 253-439-2367
>> >
>> >
>> > Russell
>> > Global Leaders in Multi-Manager Investing
>> >
>> >
>> >
>> >
>> >
>> >
>> > -----Original Message-----
>> > From: SAS(r) Discussion [mailto:SAS-L@LISTSERV.UGA.EDU] On Behalf Of
>> > Chang Chung
>> > Sent: Thursday, September 09, 2010 7:52 AM
>> > To: SAS-L@LISTSERV.UGA.EDU
>> > Subject: monthly update
>> >
>> > An interesting question posted somewhere else. It goes, in part:
>> >
>> > "I've got a data table with text variables named m0, m1 up to m62. Each
>> > month
>> > this dataset is updated, with the latest month's data stored in m0 while
>> > all
>> > the rest need to move up on index, i.e. m0 should be renamed m1, m1
>> > should be
>> > renamed m2, etc. up to m62 which should be renamed m63."
>> >
>>
>
>
|