LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 1999, week 2)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 12 Mar 1999 10:52:41 +1100
Reply-To:     Tim CHURCHES <TCHUR@DOH.HEALTH.NSW.GOV.AU>
Sender:       "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From:         Tim CHURCHES <TCHUR@DOH.HEALTH.NSW.GOV.AU>
Subject:      SAS, BASS, and Linux -Reply
Comments: To: kmself@IX.NETCOM.COM
Content-Type: text/plain; charset=US-ASCII

(Karsten's post is reproduced below)

Karsten,

I agree almost entirely with your summary of the most useful bits of SAS (we can argue over details later..), and I agree that it would be better to clone the functionality of SAS rather than the exact syntax and features. I also agree that it would be vital to provide a bridge from existing SAS code (and expertise) to any free SAS alternative. A bridge from other products such as SPSS would also be possible, which means that the PSPP (free SPSS clone) project might join forces with a alternative to SAS project. And yes, there is a hell of a lot of good statistical code and procedures out there which could be incorporated into (or interfaced to) a SAS alternative, and yes, the monolithic approach is not the way to go. I think that this is the lesson to be learnt from Linux, which started out as just a lot of free Unix components assembled around a new (free) Unix-like kernel - the Linux project didn't slavishly clone an existing Unix but it didn't try to totally re-invent the wheel either, so it could build on and incorporate what went before. That's one of the reasons why Linux is growing so rapidly while BeOS and other totally new operating systems are not. A free alternative to SAS could follow the same path.

And yes, it is a shame that the BASS system was written in Pascal and assembler, but having sources available would still be a help. Hmmm, 10 person years of effort shared between 120 people is only 1 month, or more realistically, about 2 years worth of rainy Sundays afternoons...

I don't think that SI need worry about such an initiative - just as people are not ripping out their IBM mainframes and replacing them with Linux boxes, I don't think that any of SI's big clients would replace SAS with a free alternative, at least not for quite a while!

Tim Churches

>>> Karsten Self <kmself@IX.NETCOM.COM> 12/March/1999 07:54am >>> I just spoke with Jeff Bass of Amgen, creator of the BASS system. This was a 1980's implementation of the SAS language and several procedures on PCs, regarding any possibilities this might lead to for a free SAS on Linux. There's mixed news.

First, Jeff would be interested in seeing SAS-like capabilities on Linux. He would be willing to help by way of providing the BASS sources, and providing some guidance in their interpretation. He would not be interested in doing the development directly.

That said, there are some aspects of BASS which both help and hinder:

- SAS is based on publicly available foundations -- the original NCSU project was an FDA funded research project, and SAS through about SAS 74 or 76 are available with sources, AFAIK (though this would be PL/I and MVS assembler).

- BASS implemented the DATA step and about 20 commonly used procedures.

- Many of the algorithms used in BASS are based on documentation of early versions of the SAS system, or other published algorithms. It should be possible to reimplement these or newer, improveder versions.

- Due to PC limitations of the time, BASS was coded in Microsoft Pascal, and assembler, about 80% and 20% respectively. BASS is probably less portable than SAS itself. I don't know what language support there is for cross-compiling or porting pascal or MS pascal to gcc or related. The resulting code would probably be unmaintainable, even if it ran. However, GNU does provide a number of porting tools. I have no experience in this area.

- BASS was a code-compatible, but not a data-compatible system. Transport format was ASCII files. These were sneakernet days, and the possibility of widescale data distribution was not anticipated.

- The sources are available from Jeff. The algorithms used are frequently documented in the source. Some work may be required to pull the sources from archival media.

- The DATA step and basic I/O were a fairly elementary coding effort. The full BASS system represented about 4 man-years of development. Jeff anticipates a similar project today would require 10 man years.

My own comments follow.

What I find most useful about SAS are:

- A simple but powerful procedural data language with a decent function library, raw data I/O abilities (format/informat), and convenient methods for working with sorted data (FIRST., LAST.), and other miscellaneous features: SET/UPDATE/MERGE/ MODIFY, FILE and INFILE options, SET options, etc.

- Process accounting -- resource utilization, record, and variable reporting following process steps.

- Persistent data attribute associations: name, type, length, format/informat, label, and metadata about these attributes (DICTIONARY tables).

- A set of integrated PROCs which provide trivial access to basic data manipulation and reporting functions. I could accomplish virtually all my work with SQL, DATA STEP, FREQ, MEANS/SUMMARY, PRINT, UNIVARIATE, FORMAT, COMPARE, SORT, and CONTENTS. Of this list, DATASETP, FREQ, MEANS/SUMMARY, SORT, and CONTENTS largely roll easily into a sufficiently featured SQL. PRINT can be accomplished in a DATA _NULL_. This leaves DATA, SQL, and a statistics library.

...I realize other users' needs differ. Additional features include graphics and statistical procedures, database connectivity, remote connectivity, OS hooks, code generation (CALL EXECUTE, MACRO), data browsing (the _only_ reason to use interactive SAS). The remaining features of SAS provide less than 1% of my needs.

What I'm disenchanted with are:

- Macro. As much as I use and appreciate it, it is a kludge. It is a preprocessor, not a true programming language. Debug support is horrible. This is addressed to an extent by SCL. I'd much prefer seeing a real control language, along the lines of Perl.

- Disconnect with other development tools. It is relatively difficult to wed SAS with other programming tools or environments. The fact that SAS is monolithic does not help much in this regard. Using SAS as a server is somewhat better, but it certainly doesn't fall into the Unix shell tools model. It doesn't have to, but many very powerful tools do.

The SAS NIH syndrome has lead to a monolithic tool incorporating a data language, a macro/scripting language, an SQL implementation, a statistical library, an application development environment, a graphics generation facility, an integrated development environment, a data browsing/editing environment, ... _none_ of which are of any use outside of SAS, and all of which require an annual investment in SAS products in order to be used. My own use of SAS tools (above) is geared largely toward what is required to get work done in SAS, and what translates more broadly into other areas of programming application. Hence, data step, SQL, fundamental utilities. I've rather pointedly neglected learning tools such as TABULATE and REPORT due to their limited and idiosyncratic aspects.

- Lack of user definable functions / procedures (addressed to an extent by SAS/Toolkit).

- Lack of long variable names (added in v7).

- Lack of access to higher programming features: finer grained use of arrays, more data types (boolean, integer, long character), better (or more standardized) regular expression tools.

- Standalone/runtime capability.

- Integration with third-party tools.

- Current level of Linux support.

I'll say again that I'm not particularly interested in building yet another SAS; I'd rather work with existing tools available for Unix/Linux.

Still, one approach which might be worth exploration is to come up with a language translation utility which would translate SAS code into an equivalent, say, Perl. The addition of a module to provide the type of accounting and persistent data attributes available with SAS would be a plus. Procedures could be mapped to close equivalents in existing statistical languages. A colleague suggests that many of the SAS statistical procedures are validated in IML, it might be possible to use an existing matrix language as the basis for rapid development of a statistical procedure library. Not being particularly versed in matrix languages or advanced statistics, I can't comment on viability, but it sounds interested.

What would really help a project like this along would be an identified sponsor or sponsors. Again, I'm playing the role of data conduit here, not advocate. I'd be interested but not obsessed with such a project.

-- Karsten M. Self (kmself@ix.netcom.com)

What part of "gestalt" don't you understand?


Back to: Top of message | Previous page | Main SAS-L page