LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (April 2004, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Mon, 19 Apr 2004 14:19:45 -0600
Reply-To:     Alan Churchill <EmailDirect@erratix.us>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Alan Churchill <EmailDirect@ERRATIX.US>
Subject:      Re: Translating SAS code to a C extension module forPython

I fully agree with this sentiment and I have been gen'ing SAS code for a couple of decades...no problem there. From what I read, it seemed that there was a decompilation going on to get to the 'C' code that SAS generates internally.

Perhaps I misunderstood the application. if the person is in fact able to get to the generated 'C' code and if someone then used this code to create a generic 'proc sort' for example and discarded SAS, that is where my concern was.

Alan

"Sigurd Hermansen" <HERMANS1@WESTAT.COM> wrote in message news:446DDE75CFC7E1438061462F85557B0F221B00@remail2.westat.com... > Alan: > Though no fan of the legal weaselling that appears in software license > agreements, I would agree that reverse engineering of, say, PROC GAM, would > be illegal, unethical, and not very smart to boot (since SAS began with > published algorithms for the statistical procedure). Nonetheless, I see no > basis for a legal or ethical claim that would prohibit converting the logic > expressed in the text of any SAS program to another program. That would be > akin to claiming that formula in one system of notation cannot be converted > to formula in another system of notation. Further, a software vendor could > not defend a claim that someone is violating a copyright of a programming > language (effectively '... stealing the mind of an author ...') because one > can easily demonstrate that, for example, the SAS Data step has deep roots > in PL/1. > Sig > > -----Original Message----- > From: Alan Churchill > To: SAS-L@LISTSERV.UGA.EDU > Sent: 4/17/2004 6:36 AM > Subject: Re: Translating SAS code to a C extension module forPython > > My guess is that there is a high probability that this approach violates > licensing restrictions. Also, regardless of whether it is technically > possible, it strikes me as unethical. > > Alan > > > "Tim Churches" <tchur@optushome.com.au> wrote in message > news:1082161063.1217.143.camel@emilio... > On Sat, 2004-04-17 at 10:14, Jack Hamilton wrote: > > The really interesting thing to know would be "Did SAS and this new > tool > > provide the same answers?" > > The author notes in his article > (http://www.dalkescientific.com/writings/diary/archive/2004/04/12/site_u > pdat > ed.html): > > " Some 700 more lines later and 430 lines of regression tests and I'm > done." > > I am familiar with some of the author's work in bio-informatics, and he > always writes nearly as much test code as actual business logic code. > > > If you don't care about correct answers, I > > can get you results really fast and cheap. We don't know whether the > > results from SAS are correct, but at least we know that they have a > > quality control staff. It's possible that the Python module was > written > > by professional statisticians with computer training (or professional > > programmers with statistical training), but we can't assume that > without > > knowing more than we do. > > Obviously everyone cares about getting the correct answers, or the best > possible answers (since we are talking about statistical models here). > > I think that the SAS code in question was just SAS data step code which > implemented a fairly complex predictive model - in other words, it > implements an equation or set of equations containing various estimated > parameters which constitute the predictive model - and it just processes > values from records through the model to derive a prediction for each > record. Thus, the development of the statistical model is still done > using SAS, but running on a PC. It is just the deployment of that model > on a large production server which is done using the results of the > SAS-to-Python C module converter. > > So, if we assume that SAS and the statistical model are error-free, then > the other sources of error are the SAS-to-C module converter, the C > compiler used, and Python. Errors in the converter are most likely, but > it sounds like the author has been defensive in his programming and > thorough in his automated testing of the results. Almost certainly the > free, open source GNU C compiler was used - after 15 years of very > widespread use, it is very unlikely to be a source of error. Python is > also free and open source and has been in continuous and widespread use > fro well over a decade. Again, an unlikely source of error. As an aside, > the team behind Python strike me as top-notch computer scientists, > mathematicians and programmers: Python is widely used by computer > science departments in universities everywhere, and such users tend to > be very discerning. Python does not claim any great statistical prowess > out of the box (you need to look to R for that), although many people > have written statistical routines in Python - but as discussed, all the > statistics in this project were handled by SAS. > > > We don't know, from what the article said, how complex the model was, > > or what SAS procedure was being duplicated, so you might be > overstating > > the abilities of the conversion process. (You might be correct, on > the > > other hand.) > > My understanding is that the model was non-trivial and that only SAS > data step code was involved. > > > In general, though, I think that such a conversion effort is > misguided. > > I suspect that this particular effort saved the client many tens of > thousands of dollars in SAS license fees per annum. Whether that is > misguided depends on your point of view. However, I agree that such > efforts are at best tactical, rather than strategic. > > > I'm not saying that converting from SAS to Python is a bad idea, only > > that automated conversions are a bad idea. It would be much better to > > start over, writing code that is designed with Python's strengths in > > mind. > > Sure, but remember that the article describes the automated conversion > of automatically generated SAS code. That's easier than automatic > conversion of hand-written SAS code. Nevertheless, a useful trick, and > as I said, it is probably not a huge leap to automatic conversion of a > constrained subset of hand-written SAS code - which would be an even > better trick. > > > That assumes that you understand what the program is intended to > > accomplished, of course - but if you don't know *that*, you shouldn't > be > > doing any kind of conversion. > > Of course. In this particular case, the correct approach would be for > the SAS predictive modelling program to emit C and Python code directly, > rather than SAS code. However, I gather that the SAS predictive > modelling program was proprietary (it may have been the SAS data mining > product, for instance), and that such changes were not possible. > > Tim C > > > >>> "Tim Churches" <tchur@optushome.com.au> 04/16/2004 4:13 PM >>> > > On Sat, 2004-04-17 at 09:38, Jack Hamilton wrote: > > > Interesting article, but it doesn't sound like the program is > > flexible - > > > it handles one particular type of SAS program written in a > > particular > > > style, and might not be generalizable. Also, it sounds like the > > author > > > doesn't really understand SAS. > > > > Correct on all counts. However, it does illustrate: > > > > a) That it is possible to deploy even complex predictive models > > developed on PCs and written in SAS on large, production servers > > without > > needing to license SAS on those servers, thus saving substantial > > amounts > > of money. It sounds like the consultancy described involved perhaps > > one > > or two weeks of work. The result was a tool written in Python which > > can > > re-convert the SAS predictive module to a Python C module each time > > the > > model (and hence the SAS code) is updated. Compare that to the initial > > and annual cost of licensing even just Base SAS on a large Unix or > > other > > server. > > > > b) If the fairly specific SAS-to-C module converter could be written > > in > > a person-week or two, then it may well be possible, with a bit more > > effort, to build a more general SAS-to-C module converter. A > > completely > > general SAS-to-C module converter would be a huge effort, but even > > somewhat general albeit constrained data step code converter would > > save > > a lot of SAS users a great deal of money when it comes to deploying > > production code on large production servers. Note that both Python and > > C > > are of course free and cross-platform - they can run anywhere SAS can > > run. > > > > Food for thought. I suspect that Andrew Dalke is available for hire... > -- > > Tim C > > PGP/GnuPG Key 1024D/EAF993D0 available from keyservers everywhere > or at http://members.optushome.com.au/tchur/pubkey.asc > Key fingerprint = 8C22 BF76 33BA B3B5 1D5B EB37 7891 46A9 EAF9 93D0


Back to: Top of message | Previous page | Main SAS-L page