Date: Mon, 19 Apr 2004 14:19:45 -0600
Reply-To: Alan Churchill <EmailDirect@erratix.us>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Alan Churchill <EmailDirect@ERRATIX.US>
Subject: Re: Translating SAS code to a C extension module forPython
I fully agree with this sentiment and I have been gen'ing SAS code for a
couple of decades...no problem there. From what I read, it seemed that there
was a decompilation going on to get to the 'C' code that SAS generates
Perhaps I misunderstood the application. if the person is in fact able to
get to the generated 'C' code and if someone then used this code to create a
generic 'proc sort' for example and discarded SAS, that is where my concern
"Sigurd Hermansen" <HERMANS1@WESTAT.COM> wrote in message
> Though no fan of the legal weaselling that appears in software license
> agreements, I would agree that reverse engineering of, say, PROC GAM,
> be illegal, unethical, and not very smart to boot (since SAS began with
> published algorithms for the statistical procedure). Nonetheless, I see no
> basis for a legal or ethical claim that would prohibit converting the
> expressed in the text of any SAS program to another program. That would be
> akin to claiming that formula in one system of notation cannot be
> to formula in another system of notation. Further, a software vendor could
> not defend a claim that someone is violating a copyright of a programming
> language (effectively '... stealing the mind of an author ...') because
> can easily demonstrate that, for example, the SAS Data step has deep roots
> in PL/1.
> -----Original Message-----
> From: Alan Churchill
> To: SAS-L@LISTSERV.UGA.EDU
> Sent: 4/17/2004 6:36 AM
> Subject: Re: Translating SAS code to a C extension module forPython
> My guess is that there is a high probability that this approach violates
> licensing restrictions. Also, regardless of whether it is technically
> possible, it strikes me as unethical.
> "Tim Churches" <email@example.com> wrote in message
> On Sat, 2004-04-17 at 10:14, Jack Hamilton wrote:
> > The really interesting thing to know would be "Did SAS and this new
> > provide the same answers?"
> The author notes in his article
> " Some 700 more lines later and 430 lines of regression tests and I'm
> I am familiar with some of the author's work in bio-informatics, and he
> always writes nearly as much test code as actual business logic code.
> > If you don't care about correct answers, I
> > can get you results really fast and cheap. We don't know whether the
> > results from SAS are correct, but at least we know that they have a
> > quality control staff. It's possible that the Python module was
> > by professional statisticians with computer training (or professional
> > programmers with statistical training), but we can't assume that
> > knowing more than we do.
> Obviously everyone cares about getting the correct answers, or the best
> possible answers (since we are talking about statistical models here).
> I think that the SAS code in question was just SAS data step code which
> implemented a fairly complex predictive model - in other words, it
> implements an equation or set of equations containing various estimated
> parameters which constitute the predictive model - and it just processes
> values from records through the model to derive a prediction for each
> record. Thus, the development of the statistical model is still done
> using SAS, but running on a PC. It is just the deployment of that model
> on a large production server which is done using the results of the
> SAS-to-Python C module converter.
> So, if we assume that SAS and the statistical model are error-free, then
> the other sources of error are the SAS-to-C module converter, the C
> compiler used, and Python. Errors in the converter are most likely, but
> it sounds like the author has been defensive in his programming and
> thorough in his automated testing of the results. Almost certainly the
> free, open source GNU C compiler was used - after 15 years of very
> widespread use, it is very unlikely to be a source of error. Python is
> also free and open source and has been in continuous and widespread use
> fro well over a decade. Again, an unlikely source of error. As an aside,
> the team behind Python strike me as top-notch computer scientists,
> mathematicians and programmers: Python is widely used by computer
> science departments in universities everywhere, and such users tend to
> be very discerning. Python does not claim any great statistical prowess
> out of the box (you need to look to R for that), although many people
> have written statistical routines in Python - but as discussed, all the
> statistics in this project were handled by SAS.
> > We don't know, from what the article said, how complex the model was,
> > or what SAS procedure was being duplicated, so you might be
> > the abilities of the conversion process. (You might be correct, on
> > other hand.)
> My understanding is that the model was non-trivial and that only SAS
> data step code was involved.
> > In general, though, I think that such a conversion effort is
> I suspect that this particular effort saved the client many tens of
> thousands of dollars in SAS license fees per annum. Whether that is
> misguided depends on your point of view. However, I agree that such
> efforts are at best tactical, rather than strategic.
> > I'm not saying that converting from SAS to Python is a bad idea, only
> > that automated conversions are a bad idea. It would be much better to
> > start over, writing code that is designed with Python's strengths in
> > mind.
> Sure, but remember that the article describes the automated conversion
> of automatically generated SAS code. That's easier than automatic
> conversion of hand-written SAS code. Nevertheless, a useful trick, and
> as I said, it is probably not a huge leap to automatic conversion of a
> constrained subset of hand-written SAS code - which would be an even
> better trick.
> > That assumes that you understand what the program is intended to
> > accomplished, of course - but if you don't know *that*, you shouldn't
> > doing any kind of conversion.
> Of course. In this particular case, the correct approach would be for
> the SAS predictive modelling program to emit C and Python code directly,
> rather than SAS code. However, I gather that the SAS predictive
> modelling program was proprietary (it may have been the SAS data mining
> product, for instance), and that such changes were not possible.
> Tim C
> > >>> "Tim Churches" <firstname.lastname@example.org> 04/16/2004 4:13 PM >>>
> > On Sat, 2004-04-17 at 09:38, Jack Hamilton wrote:
> > > Interesting article, but it doesn't sound like the program is
> > flexible -
> > > it handles one particular type of SAS program written in a
> > particular
> > > style, and might not be generalizable. Also, it sounds like the
> > author
> > > doesn't really understand SAS.
> > Correct on all counts. However, it does illustrate:
> > a) That it is possible to deploy even complex predictive models
> > developed on PCs and written in SAS on large, production servers
> > without
> > needing to license SAS on those servers, thus saving substantial
> > amounts
> > of money. It sounds like the consultancy described involved perhaps
> > one
> > or two weeks of work. The result was a tool written in Python which
> > can
> > re-convert the SAS predictive module to a Python C module each time
> > the
> > model (and hence the SAS code) is updated. Compare that to the initial
> > and annual cost of licensing even just Base SAS on a large Unix or
> > other
> > server.
> > b) If the fairly specific SAS-to-C module converter could be written
> > in
> > a person-week or two, then it may well be possible, with a bit more
> > effort, to build a more general SAS-to-C module converter. A
> > completely
> > general SAS-to-C module converter would be a huge effort, but even
> > somewhat general albeit constrained data step code converter would
> > save
> > a lot of SAS users a great deal of money when it comes to deploying
> > production code on large production servers. Note that both Python and
> > C
> > are of course free and cross-platform - they can run anywhere SAS can
> > run.
> > Food for thought. I suspect that Andrew Dalke is available for hire...
> Tim C
> PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere
> or at http://members.optushome.com.au/tchur/pubkey.asc
> Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0