Date: Sat, 18 Apr 1998 19:30:04 GMT
Reply-To: cbbrowne@hex.net
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Christopher B. Browne" <cbbrowne@NEWS.BROWNES.ORG>
Organization: Hex.Net Superhighway, DFW Metroplex 817-329-3182
Subject: Re: Linux version of SAS -- Technical issues
On 17 Apr 1998 22:28:50 GMT, Andreas Dilger <adilger@enel.ucalgary.ca> posted:
>Contrary to popular belief, you DON'T need a 64-bit CPU to work with
>64-bit numbers, it's just a lot easier to do. AIX has supported > 2GB
>files since AIX 4.2 came out (2 years?), and it most definitely uses
>a 32-bit processor. I'm not positive, but even with ext2 on Alphas,
>they may be limited to 2GB files because that's how ext2 was DEFINED.
>However, there is a lot of work being done to improve on ext2 (eg
>journalled filesystem, logical volume manager), and I'm sure that > 2GB
>files will fit in there somehow.
"easier" is not quite the right word; the two terms that cover it nicely
are:
a) "More natural" and
b) "More efficient."
On a 64 bit processor, you can "naturally" manipulate 64 bit values
with normal instructions, and don't have to do use a multiplicity
of instructions to load, save, add, and test values. Which makes
algorithms a little simpler and easier to make correct, and likely
a fair bit faster.
It's not clear whether there is *in fact* a lot of work being done
to improve on ext2; there has been a lot of *TALK* about LVMs and
logging/journalled file systems, and a lot of purported projects started,
but about the only one that seems to be seeing active continuing work
is the Reiserfs, which is designed more for efficient use of *small*
files than for handling huge file systems...
>It surprises me that SAS would NEED > 2GB files, unless their database
>is designed in such a way that it holds all of its tables in a single
>file. Bad design IMHO.
Highly arguable whether or not it's "bad design."
The "the database is this enormous ``blob'' of filespace" approach is
quite typical in relational database designs these days.
Recently arguments have been going on between proponents of FreeBSD
and Linux concerning the semantics of the way file system updates take
place, particularly relating to the handling of metadata (e.g. -
the information *about* the file, such as its name, and location on
disk).
I will stay well out of who's right and who's wrong; what is a given
is that on different platforms, different components of files get updated
on slightly different bases.
If you drop all of the data into one big "blob file," then that gives
some degree of guarantee of what is getting updated when that will
be pretty portable across platforms.
If, on the other hand, you are off updating 500 separate files "all
at once," the "metadata synchronization" can take place in all sorts
of different orders and fashions depending on what system you're on.
Which makes it much more difficult to maintain the integrity of a
transactional update log. Supporting many platforms rapidly gets
really complex. And the AIX version of the "integrity subsystem" is
coded substantially differently from the Digital UNIX version is coded
substantially differently from ... Hopefully you get the picture.
Part of the point is that SAS is no longer being primarily sold as
"just another statistical package" (which is what I always thought of
it as). It is apparently being increasingly sold as a relational
database system for data warehousing.
Arguably this is something that Linux advocates should get *real*
interested in pushing for; data warehouse applications are not things
that come in high quantity, but they *do* result in the building of very
large, powerful computer systems. And those sorts of applications require
beefing up support for large/robust file systems, which substancially
enhances Linux...
A neat option would be for a data warehousing vendor to sponsor the
creation of the "64 bit" file system support, possibly with combination
of LVM/Journalling/Logging support. An idle thought would be for SAS
(or some such organization) to pay people such as Hans Reiser or Theodore
T'so to work full time for a year on this. (I can name those two as
being "people that understand how to implement file systems"; there are
probably others that don't come automatically to mind.)
--
Those who do not understand Unix are condemned to reinvent it, poorly.
-- Henry Spencer <http://www.hex.net/~cbbrowne/lsf.html>
cbbrowne@hex.net - "What have you contributed to Linux today?..."