Date: Wed, 15 Jan 1997 12:20:00 EST
Reply-To: "Seeman, G. Matthew" <gcs7@ASI.EM.CDC.GOV>
Sender: "SAS(r) Discussion" <SAS-L@UGA.CC.UGA.EDU>
From: "Seeman, G. Matthew" <gcs7@ASI.EM.CDC.GOV>
Subject: Re: Changing the order/Use ARRAY not RETAIN
In regards to the HORRENDOUS snag in my approach........
In my last email I mentioned that the RETAIN, ARRAY, and LENGTH
statements all have uses in reordering the dataset vector.......
I admit the array statement would give a default length to vars that are
being newly created while existing variables would keep their
lengths.......
If I was actually reordering new variables that didn't already have a
pre-defined length.....guess which one of the above SAS statements I
would use to reorder the vector......(hint: it isn't ARRAY)
If I was actually reordering variables that were numerous and constantly
alternating between CHAR and NUM.....guess which one I would use (hint:
still not ARRAY)
Back to the HORRENDOUS snag.....John - Please reread your own words "It
is the last four words there that matter!!" referring to
"when re-ordering the variables in an existing dataset"
Correct me if I am wrong but ARRAY statements don't reset the length of
"variables in an existing dataset"
Aren't we beating a Dead horse........
(he says in order to get the last word)
- Matthew Seeman
----------
From: John Whittington[SMTP:johnw@MAG-NET.CO.UK]
Sent: Wednesday, January 15, 1997 3:53 AM
To: Multiple recipients of list SAS-L
Subject: Re: Changing the order/Use ARRAY not RETAIN
Date: Tue, 14 Jan 1997 14:47:00 -0500
Reply-To: Robert Schechter <robert_schechter@MERCK.COM>
>You owe me a beer.
>
>The problem with RETAIN is caused when you list all the variables you
>need in the new (re-arranged) data set in the order you need them AND
>then it turns out some of these variables were calculated within the
>data step, THEN the RETAIN statement COULD affect your result. I know,
>it just happened to me.
No - I don't owe anyone any beers yet. You have moved the goalpost. You
will recall that what I said was:
JW|As I keep trying to say, there are NO 'potential RETAIN problems' (at
least
JW|one beer at SUG22 for anyone who can prove me wrong!) when re-ordering
the
JW|variables in an existing dataset ....
It is the last four words there that matter!! I agree that there are
potential problems in terms of variables CREATED WITHIN the DATA step in
question, but then we are not talking about what I said, and we are not
even
talking abour RE-ordering variables.
In this new scenario you have posed, I reckon it is still often/usually
sensible to use RETAIN, but with manual resetting to missing at the 'top'
of
the DATA step of any variables created within that DATA step . If there
are
just a few of them, one can just do this explicitly with assignment
statements; more generally, one can do it with arrays - although one has
to
do the resetting at the 'bottom' of the DATA step, after an OUTPUT
statement, in order for all variables created within the step to be
included
in the arrays). For example:
data new ;
retain char1 num1 char2 num2 char3 num3 etc. ;
set old ;
[ other statements, including some creating variables]
output ;
array char _character_ ;
array num _numeric_ ;
do over char ; char = '' ; end ;
do over num ; num = . ; end ;
run ;
One never needs more than these two arrays and two DO OVER loops (maybe
only
one, if one knows that all variables created in the DATA step are
character,
or all are numeric). In contrast, Matthew's approach (using ARRAY
statements to do the ordering), could need as many ARRAY statements as
variables if the desired ordering alternated character and numeric ones.
In any event, I think that there is one HORRENDOUS snag of Matthew's
approach which has not yet been mentioned (I had overlooked it when I
responded to Matthew). Unlike RETAIN (with no initial value stated), an
ARRAY statement irrevocably defines the length of variables (defaulting
to
8, for either numeric or character). This means that one has to define,
in
the ARRAY statements (which *must* precede SET if they are to re-order)
the
lengths of all character variables (unless one accepts default of 8),
INCLUDING those that have come from a previous dataset via SET. If one
does
not do this, any character variables, including those from the input
dataset, will be truncated to length 8 if longer. This, to my mind,
represents at least as much a potential problem and 'hazard' as anything
about RETAIN !!
The beer offer is still on, but only for those who leave the goalposts
unmoved!!
Regards
John
-----------------------------------------------------------
Dr John Whittington, Voice: +44 1296 730225
Mediscience Services Fax: +44 1296 738893
Twyford Manor, Twyford, E-mail: johnw@mag-net.co.uk
Buckingham MK18 4EL, UK CompuServe: 100517,3677
-----------------------------------------------------------