Date: Wed, 7 May 2003 15:19:42 -0400
Reply-To: Richard Ristow <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <firstname.lastname@example.org>
Subject: RECODE techniques
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 11:47 AM 5/7/2003 -0400, Sheryl Horowitz wrote, in posting "Re: SPSS
syntax for SF-12":
>This is not very elegant but it is a direct translation from the SAS program.
I'm taking the code as a springboard to comment on some techniques for
using RECODE. This is NOT a comment on using the code that Sheryl
Horowitz posted. There are very good reasons to run code that works,
but has modest deficiencies, rather than doing major rewriting and
debugging for an improvement that may not actually benefit you much.
>* Cleaning /Reverse scoring.
> RP2 RP3 RE2 RE3 (1=1) (2=2) (3 thru Highest=SYSMIS) (Lowest thru
> .9999=SYSMIS) .
> PF02 PF04 (1=1) (2=2) (3=3) (Lowest thru .99 =SYSMIS) (3.01
> Highest=SYSMIS) .
First off, it is *NOT* a good idea to put an EXECUTE after each RECODE.
Every EXECUTE causes the whole data file to be read again, and that's
one of the slower operations you encounter. If you just put in the
RECODEs, they'll all be performed with *one* reading of the data file,
and that can be enough faster to notice a lot.
Second, if you're trying to recode a range but leave out an endpoint, as in
"(Lowest thru .9999=SYSMIS)", using a value like .9999 to mean "the
smallest number less than 1" is inelegant, and can cause errors -- what
if .99995 occurs? or .99999999987?
RECODE clauses are taken in order written, so you can use the whole
range even if its endpoint has been recoded before. In the statement
> RP2 RP3 RE2 RE3 (1=1) (2=2) (3 thru Highest=SYSMIS)
> (Lowest thru .9999=SYSMIS) .
the apparent intent is to keep values 1 and 2 as valid, and make most
others system-missing. Since 1 has already been recoded (to itself, but
that doesn't matter), you can make all smaller numbers missing by clause
"(Lowest thru 1 = SYSMIS).
That still leaves fractional values between 1 and 3. If you want to
keep values 1 and 2, and eliminate ALL others, the best form is
RP2 RP3 RE2 RE3 (1=1) (2=2)
(ELSE = SYSMIS).
This also comes up in range recodings, for example for age. It's
common, having calculated age as years and fractions, to write something like
RECODE AGE (1 THRU 14.9 = 1)
(15 THRU 19.9 = 2)
[etc.] INTO AGE_RNG.
However, you can recode with no gaps, while including the low point of
each range within the range, by writing
RECODE AGE (65 THRU HI = 9)
(50 THRU 65 = 8)
(40 THRU 50 = 7)
[etc.] INTO AGE_RNG.