LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 1 Nov 2006 18:07:46 -0500
Reply-To:     Richard Ristow <wrristow@mindspring.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <wrristow@mindspring.com>
Subject:      Re: using scratch variables
Comments: To: Anton Balabanov <Anton.Balabanov@fup.unn.ru>
In-Reply-To:  <000001c6fd05$1c64a6e0$03d6a8c0@pcip>
Content-Type: text/plain; charset="us-ascii"; format=flowed

At 10:56 AM 10/31/2006, Anton Balabanov wrote, following up concerning scratch variables and LAG. Text from that posting is quoted where it is pertinent in the following discussion.

Before I start, thank you, Anton. You've raised deep and interesting questions in your earlier postings and here. I took several days to post back because I took several days to work it out - as far as I did.

To readers in general: this is long and technical, though I've made it as clear as I could manage. There are two sections:

Regarding LAG and scratch variables (Example I): Regarding LAG, permanent, and scratch variables (Example II):

.............................................. Regarding LAG and scratch variables (Example I):

>I saw 2 explanations [of "LAG" for scratch variables] in your posting. >But ... neither of your explanations seems satisfactory, IMHO.

See the following discussion. I think they're consistent, and accurate. In the analysis below, I argue that they're consistent both with SPSS documentation and with the behavior observed.

I think you fall into trouble when you think of re-initialization of scratch variables. From both the documentation and observed behavior, that re-initialization does not happen.

>1. "It sounds like the implementation is something like 'the value of >the variable just before the Nth preceding [overall] >re-initialization'".

Example I: The following SPSS draft output; discussion follows. In this file, variable "b.1" and "b.2" are entered as data, from your posting.

* ...... Example I: Post from here on ............. . NUMERIC #A (F2). NUMERIC @#A_BFR @#A_AFT B (F2). COMPUTE @#A_BFR = #A. * "the following syntax for 10-case file would return . * [b.1]; instead, we have [b.2]:" . . IF $casenum=1 #a=1. . IF RANGE($casenum,5,7) #a=$casenum. . COMPUTE b=LAG(#a,2). COMPUTE @#A_AFT = #A. LIST. |-----------------------------|---------------------------| |Output Created |01-NOV-2006 15:03:52 | |-----------------------------|---------------------------| LINE_NUM b.1 b.2 @#A_BFR @#A_AFT B

01 0 . 0 1 . 02 0 . 1 1 . 03 0 1 1 1 1 04 0 1 1 1 1 05 0 1 1 5 1 06 1 1 5 6 1 07 5 5 6 7 5 08 6 6 7 7 6 09 6 7 7 7 7 10 6 7 7 7 7

Number of cases read: 10 Number of cases listed: 10 * ...... Example I: End ............. .

In the above, B.1 is what you expected to see, and B.2 is what you saw. B is what was calculated, and matches B.2. Variables @#A_BFR and @#A_AFT record the values of scratch variable #A at the beginning and end of the transformation program, for that case.

Your reasoning: >Zeros are [predicted] because scratches are initialized to 0, not to >SYSMIS.

This doesn't apply for cases 01 and 02. You have . COMPUTE b=LAG(#a,2). For cases 01 and 02, that's the value of #A from cases "-1" and "00", neither of which exist; so the result is missing. #A is initialized to 0, but only when it comes into existence, i.e. in case 01.

>Zeros up to the 6th case are because only at 6th case we have 2 >re-initializations of #a.

#A is initialized to 0 at case 1; you then compute it as 1. As shown, #A is 0 at the start of the input program for that case, and 1 at the end.

But you don't have "2 re-initializations of #a": "SPSS does not reinitialize scratch variables when reading a new case. Their values are always carried across cases." (SPSS 14 Command Syntax Reference, p.33).

In cases 02, 03, and 04 you don't change #A, so it keeps the value it had had the end of case 01: namely, 1. (See @#A_BFR and @#A_AFT for those cases.)

>Instead, we have [B.2, which matches variable B in the above listing].

Your code is . COMPUTE b=LAG(#a,2). In the output,B.2, and B, are missing for the first two cases, as discussed above. In later cases, they have the value of @#A_AFT from two cases before: "the value of the variable [#A] just before the Nth preceding [overall] re-initialization."

>2. The second explanation "variable "#a", at the start of the case, >has the value it had at the end of the preceding case"

I believe that's correct. As you can see, above, @#A_BFR for cases 2 and following, matches @#A_AFT for the immediately preceding case. In case 1, @#A_BFR is 0, which is the value of #A the ONLY time it is initialized.

>[This] is OK for LAG(#a) or LAG(#a,1)

It doesn't, that I can see, have anything to do with LAG; notice that it doesn't mention LAG. As noted above, b=LAG(#a,2) is what my hypothesis about LAG predicts.

>According to the CSR for SPSS13: "In a series of transformation >commands without any intervening EXECUTE commands or other commands >that read the data, lag functions are calculated after all other >transformations, regardless of command order.", that is, #a in the >current case had been already re-initialized

That's the mistake. As previously noted, #a is not re-initialized.

>Raynald Levesque [writes] "...if you assign ... a value to a scratch >variable in case 1, then that value will remain the same for all >subsequent cases UNLESS YOU change it yourself by syntax" brought me >to another understanding of the process how SPSS works with scratch >variables. The key word in the quotation above is "subsequent". It >seems, SPSS REMEMBERS past values of scratch variables for each case.

Yes. You'll see that's exactly what is stated above.

>...just like it does with permanent variables,

Not quite; permanent variables are handled differently. Permanent variables are "remembered" in that they're written to the working file; scratch variables are "remembered" in that the values they had at the end of one case's computations, are available at the beginning of the next case's computation. (Permanent variables for which LEAVE is specified are "remembered" in both senses.)

>SPSS keeps the last [calculated, not] initialized value for every >subsequent case, unless it will be [calculated] via syntax next time. >That is, scratch variable exists only [until] the first EXECUTE [or >other procedure, or SAVE] and only in RAM of the computer.

I believe that is correct.

>But it is not a scalar, and it is not an array of serial >re-initialized values. Instead, it is a column vector just like the >permanent variable, but with different mode of re-initialization.

It's not possible to tell from the documentation or the observed behavior, but I think scratch variables are probably scalars, i.e. not written even temporarily as column vectors to the working file. ("Column vector" is not standard SPSS terminology, but it is accurate.) That's OK for LAG, if it's implemented "in time" (your terminology), i.e. "counts re-initializations." Which I think is accurate, except that the re-initialization it the *global* re-initialization, from which scratch variables, and permanent variables with LEAVE, are exempt.

>This 'hypothesis' explains why LAG works well with scratch variables >with any lag order. What do you think?

I think so. But I don't know whether "this 'hypothesis'," as I've expounded it, should be considered consistent with yours, or not.

.............................................. Regarding LAG, permanent, and scratch variables (Example II):

>That is, LAG operates "in space" with permanent variable (i.e., in >file sort order) and "in time" with scratch variable (i.e. counts >re-initializations).

I would expect that both are implemented the same way, because it would be very awkward to maintain two different implementations of LAG. If I understand you, and interpret the following test correctly, both implementations are "in time", as you put it. That is, LAG (VAR,N) returns the value of variable 'VAR' from just before the Nth previous global initialization, where a global initialization take place at the close of the transformation program, just before a new case is begun.

To review: at a global initialization, numeric variables are generally set to SYSMIS, and string variables to blank. However, this is not done for scratch variables, or for permanent variables for which LEAVE has been specified.

The following is SPSS draft output. Variables LINE_NUM and A have their values at the start of this input program. All other computations are shown.

* ...... Example II: Post from here on ............ . NUMERIC #A ##A (F2). NUMERIC @#A_BFR @#A_AFT (F2). NUMERIC B_PERM B_SCRTCH (F2).

COMPUTE @#A_BFR = #A.

* "That is, LAG operates 'in space. with permanent . * variable (i.e., in file sort order) and "in time" with . * scratch variable (i.e. counts re-initializations)." . . COMPUTE B_PERM = LAG(A,2). . COMPUTE B_SCRTCH = LAG(#A,2). . COMPUTE #A = A.

* Drop cases 5 and 7 (original numbering) . . SELECT IF NOT ANY(LINE_NUM,5,7).

COMPUTE @#A_AFT = #A. LIST. |-----------------------------|---------------------------| |Output Created |01-NOV-2006 17:20:56 | |-----------------------------|---------------------------| LINE_NUM A @#A_BFR @#A_AFT B_PERM B_SCRTCH

01 1 0 1 . . 02 3 1 3 . . 03 5 3 5 1 1 04 7 5 7 3 3 06 11 9 11 5 5 08 15 13 15 7 7 09 17 15 17 11 11 10 19 17 19 15 15

Number of cases read: 8 Number of cases listed: 8 * ...... Example II: End ............ .

. #A is computed as the value of A. Observe that the value of @#A_AFT is the same as that of A, but the value of @#A_BFR is not. . B_PERM is LAG(A,2). B_SCRTCH is LAG(#A,2), i.e. of a scratch variable. . Cases 05 and 07 (original numbering) are deleted.

Notice that B_PERM, lagging the permanent variable A, and B_SCRTCH, lagging the scratch variable #A, are the same; and, in both cases, they are the value of A (the same as #A), two cases before AFTER the deletion.

With respect to lagging the scratch variable, this is consistent with deletion "in time", i.e. saving values as they were before global re-initialization, only of global re-initialization following a deleted case, isn't counted. My guess is, that this is the case. In any case, there's no evidence that logic for lagging A is different from that for lagging #A.

My guess is that what you call "in time" logic is used for both. But this test is certainly not definitive, and I can't think of one that would be.

>Thank you for thorough explanation and pointing me out the example >with INPUT PROGRAM, as well as "INPUT PROGRAM paradox" discussion. >Indeed, LOOP within INPUT PROGRAM operates differently with scratch >and permanent variables!

It does. However (not demonstrated) it operates the same for permanent variables with LEAVE specified, as it does for scratch variables.

-With very best wishes, Richard


Back to: Top of message | Previous page | Main SPSSX-L page