LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 1998, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Sat, 18 Apr 1998 19:30:04 GMT
Reply-To:     cbbrowne@hex.net
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         "Christopher B. Browne" <cbbrowne@NEWS.BROWNES.ORG>
Organization: Hex.Net Superhighway, DFW Metroplex 817-329-3182
Subject:      Re: Linux version of SAS -- Technical issues

On 17 Apr 1998 22:28:50 GMT, Andreas Dilger <adilger@enel.ucalgary.ca> posted: >Contrary to popular belief, you DON'T need a 64-bit CPU to work with >64-bit numbers, it's just a lot easier to do. AIX has supported > 2GB >files since AIX 4.2 came out (2 years?), and it most definitely uses >a 32-bit processor. I'm not positive, but even with ext2 on Alphas, >they may be limited to 2GB files because that's how ext2 was DEFINED. >However, there is a lot of work being done to improve on ext2 (eg >journalled filesystem, logical volume manager), and I'm sure that > 2GB >files will fit in there somehow.

"easier" is not quite the right word; the two terms that cover it nicely are: a) "More natural" and b) "More efficient."

On a 64 bit processor, you can "naturally" manipulate 64 bit values with normal instructions, and don't have to do use a multiplicity of instructions to load, save, add, and test values. Which makes algorithms a little simpler and easier to make correct, and likely a fair bit faster.

It's not clear whether there is *in fact* a lot of work being done to improve on ext2; there has been a lot of *TALK* about LVMs and logging/journalled file systems, and a lot of purported projects started, but about the only one that seems to be seeing active continuing work is the Reiserfs, which is designed more for efficient use of *small* files than for handling huge file systems...

>It surprises me that SAS would NEED > 2GB files, unless their database >is designed in such a way that it holds all of its tables in a single >file. Bad design IMHO.

Highly arguable whether or not it's "bad design."

The "the database is this enormous ``blob'' of filespace" approach is quite typical in relational database designs these days.

Recently arguments have been going on between proponents of FreeBSD and Linux concerning the semantics of the way file system updates take place, particularly relating to the handling of metadata (e.g. - the information *about* the file, such as its name, and location on disk).

I will stay well out of who's right and who's wrong; what is a given is that on different platforms, different components of files get updated on slightly different bases.

If you drop all of the data into one big "blob file," then that gives some degree of guarantee of what is getting updated when that will be pretty portable across platforms.

If, on the other hand, you are off updating 500 separate files "all at once," the "metadata synchronization" can take place in all sorts of different orders and fashions depending on what system you're on. Which makes it much more difficult to maintain the integrity of a transactional update log. Supporting many platforms rapidly gets really complex. And the AIX version of the "integrity subsystem" is coded substantially differently from the Digital UNIX version is coded substantially differently from ... Hopefully you get the picture.

Part of the point is that SAS is no longer being primarily sold as "just another statistical package" (which is what I always thought of it as). It is apparently being increasingly sold as a relational database system for data warehousing.

Arguably this is something that Linux advocates should get *real* interested in pushing for; data warehouse applications are not things that come in high quantity, but they *do* result in the building of very large, powerful computer systems. And those sorts of applications require beefing up support for large/robust file systems, which substancially enhances Linux...

A neat option would be for a data warehousing vendor to sponsor the creation of the "64 bit" file system support, possibly with combination of LVM/Journalling/Logging support. An idle thought would be for SAS (or some such organization) to pay people such as Hans Reiser or Theodore T'so to work full time for a year on this. (I can name those two as being "people that understand how to implement file systems"; there are probably others that don't come automatically to mind.) -- Those who do not understand Unix are condemned to reinvent it, poorly. -- Henry Spencer <http://www.hex.net/~cbbrowne/lsf.html> cbbrowne@hex.net - "What have you contributed to Linux today?..."


Back to: Top of message | Previous page | Main SAS-L page