Date: Sat, 16 Jan 1999 16:42:43 +0000
Reply-To: John Whittington <medisci@POWERNET.COM>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: John Whittington <medisci@POWERNET.COM>
Subject: Re: SUGGESTION: Re: Reordering variables in a new data set
Content-Type: text/plain; charset="us-ascii"
At 17:17 14/01/99 -0700, Jack Hamilton wrote:
>So, here's my suggestion: ...
>Add a new dataset option which specifies order of variables in an output
>dataset. For example:
> data x (order=a b c)
>...
>I'm not sure that this functionality is really needed, but a lot of people
>ask for it.
Jack, yes, I agree with you. As we are incessantly discussing here, the
functionality can be achieved in plenty of other ways - but the fact that
the question gets asked so often suggests to me that it would make sense to
have it as an explicit feature of the SAS language.
The argument that functionality can be achieved in other ways is, IMO, a
very poor reason for omiting an explicit feature of a 4GL - for example, I
find it perfectly reasonable that we have a SQRT() function! For the same
reason, I would also support a request for a FACTORIAL() function - even
though we have GAMMA(), the question gets asked often enough - and it's not
a good language which causes people to waste time asking questions! This
overlaps to some extent with the question of quality (and particularly,
indexing) of documenation (particularly 0n-screen Help etc.). Many
questions with very simple answers get somewhat unjustified 'RTFM' responses
- if someone looking for the functionality of a (non-existant) 'FACTORIAL()'
function cannot find the answer by looking up 'factorial' in the index, then
the documentation (if not the language) has failed them - it's no use having
to know the answer ('gamma' in this case) before one can use the index!
I do, however, anticipate a potential technical problem with what you
suggest. As I understand it, dataset options relating to output datasets
are currently only applied at the time of writing the PDV to the output file
- i.e. after the PDV has been created (and ordered) - and do not make any
changes to the PDV, nor alter anything during the course of DATA/PROC step
execution. Hence, using your idea, if this 'convention' were not to change,
variables would not be re-ordered within the DATA step, and would hence not
come out 'in the desired order' with PUT statements etc. (which is one of
the common reason for re-ordering). Also, I suspect the process of writing
the dataset would slow a little, since the process would have to 'jump
around' the PDV in order to do the re-ordering at that stage. Maybe, in
terms of current 'conventions', it would be more appropriate to implement
your idea as statements rather than dataset options - e.g. :
data x ;
order(x) a b c ;
OR
order x = a b c ; * or something like that ;
... or, indeed, as with DROP, KEEP, RENAME etc at present, allow *either*
dataset options or statements, with slightly different behaviours. The
statement would probably have to be positioned 'at the top' of a DATA step -
or, at least, before any other statements which would establish variable
order in the PDV.
As an alternative or addition to this idea, I have often suggested that it
would make a lot of sense (to me) to increase the ability within SAS to do
all sorts of dataset manipulations (not just re-ordering variables) which
could be done solely within the header without having to actually 'process'
the data part of the dataset at all. Proc DATASETS seems an obvious vehicle
for this.
Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: medisci@powernet.com
Buckingham MK18 4EL, UK mediscience@compuserve.com
----------------------------------------------------------------
|