Date: Tue, 9 May 2006 09:25:45 -0400
Reply-To: Aaron Pearson <apearson@surveysciences.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Aaron Pearson <apearson@surveysciences.com>
Subject: Re: Deleting 'unanswered question' variables
Content-Type: text/plain; charset="us-ascii"
Richard (and other ListServ members),
Thank you very much for your reply (and I apologize for the delay in my
response). Unfortunately, I'm not sure that this solution will work for
my situation, though in retrospect, it is likely due to lack of
specificity on my part.
To begin, I should have been more specific in what the recoded values
look like - we are trying to make up for a limitation in our survey
system which does not differentiate questions that were seen and
purposely not answered by the respondent ('skipped' questions) from
questions that were never seen either because they were not relevant to
that respondent's case or the respondent dropped out of the survey early
without completing ('missed' questions). Right now, my code fills all of
the questions that were seen with a unique value that can only be
attributed to having been replaced. That value is supposed to indicate
that the question was seen and not answered ('skipped'). However, for
partials, my code is ignorant to when subjects dropped out - so it
starts filling in questions they never got to, but would have had they
actually completed the survey with the same unique code for 'skipped'
questions.
Here is an example of what the data might look like
(pre-recoding for 'skipped')
respID Complete/Partial Q1 Q2 Q3 Q4 Q4a(if Q4=1) Q5 Q6 Q7
001 P 1 2 1 2 . 3 . .
002 C 1 1 1 1 5 1 . no
003 P 1 2 . . . . . 'missing
string field'
(after recoding for 'skipped')
respID Complete/Partial Q1 Q2 Q3 Q4 Q4a(if Q4=1) Q5 Q6 Q7
001 P 1 2 1 2 . 3 999 999
002 C 1 1 1 1 5 1 999 no
003 P 1 2 999 999 999 999 999 999(as a
string field)
My recoding syntax accounts for questions dependencies such as the Q4 &
Q4A situation above. It replaces the '.' or empty
String fields with the value '999' only if it is a question that the
respondent actually encountered (so in the example above, respID 001
would have Q6 and Q7 replaced with '999' but Q4a would be left '.'. The
issue is that because respID 001 is a partial (they dropped out of the
survey without completing) recoding Q6 and Q7 is misleading as they
never actually skipped those questions (they likely did not see them at
all).
Logically, the simplest way to solve this issue would be to loop
backwards from the final question if 'complete/partial' = 'P' and see if
it equals '999'. Continuing in this loop, one could look for the first
instance that is not '999' or '.' and store the name of that variable in
a value, then delete all values from that variable to the end. However,
I don't know how to store a variable name in a value. I realize this may
not be possible in SPSS at all, in which case, I guess I'll have to
re-think the value, maybe just iteratively deleting '999's' until I come
across the first non-'999' case.
Anyway, that's my current dillemma. If you have further suggetsions,
that would be great. If not, I very much appreciate your taking the time
to help.
-Aaron
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
Richard Ristow
Sent: Wednesday, May 03, 2006 10:00 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: Deleting 'unanswered question' variables
At 05:33 PM 5/1/2006, Aaron Pearson wrote:
>I currently have a data file filled with both 'complete' and 'partial'
>responses to [a] survey. The 'partial' cases will have left questions
>unanswered at the end of the data. [...] My current task is to recode
>some of the data (which I have already done). However, to properly
>recode the 'partials', I need to somehow find the last question each of
>these respondents answered (e.g. the last column for which there is a
>response instead of a sysmis or unfilled string variable).
There's an attempt, below.
>My recoding code simply fills these empty cells with a new value. I
>would like to be able to:
>1) search for the last column in which a value was not replaced by my
>earlier code.
That doesn't look possible, once you have replaced the missing values.
How can SPSS know whether a value is from a survey, or you filled it in?
>If I could figure out how to find the last column for which data was
>entered, this would probably be enough for me to write additional
>syntax to figure out the rest of my problem.
Here's a start. You'll see that it takes the variables as numbered (from
1 to 4, in the test data) and returns the number of the last variable
with data. Your code would have to be aware of the numbering, and use
the computed number to decide which values not to fill in.
The worst problem is that, you say, you have both string and numeric
variables. The following works, but it tests ALL variables as both
numeric and as string. It will generate an error message for every
variable you have, because one of the tests will be syntactically
invalid. I've specified "PRINT" on the END REPEAT, so you can see the
generated statements that produce the error messages.
(There are probably better ways. In SPSS 14, with Python, to look at the
data dictionary and generate the correct test statement for each
variable. Or maybe, a better way altogether. Raynald? Anybody?)
This is SPSS draft output:
* ..................... .
LIST.
List
|-----------------------------|---------------------------|
|Output Created |03-MAY-2006 09:56:38 |
|-----------------------------|---------------------------|
NUMB01 ALPH02 NUMB03 ALPH04
1 Aaron 2 Benjamin
3 Charles 4
5 Peter .
6 .
Number of cases read: 4 Number of cases listed: 4
* The following code looks for the last non-missing (numeric) .
* or non-blank (alphanumeric) variable in a list. .
* The problem is code that works for both alpha and numeric .
* variables. The code below tests every variable both ways, .
* creating an error for one of the tests for each. .
NUMERIC END (F2).
NUMERIC HAS01 TO HAS04 (F2).
DO REPEAT
VARIABLE = NUMB01 TO ALPH04
/FLAG = HAS01 TO HAS04
/INDEX = 01 TO 04.
. COMPUTE #HASDATA = 0.
. IF (VARIABLE NE ' ') #HASDATA = 1.
. IF NOT MISSING(1*VARIABLE) #HASDATA = 1.
. COMPUTE FLAG = #HASDATA.
. IF (#HASDATA EQ 1) END = INDEX.
END REPEAT
/**/ PRINT /*-*/.
134 +COMPUTE #HASDATA = 0
135 +IF (NUMB01 NE ' ') #HASDATA = 1
>Error # 4305 in column 30. Text: )
>A relational operator may have two numeric operands or two character
string >operands. To compare a character string to a numeric quantity,
consider >using the STRING or NUMBER function.
>This command not executed.
136 +IF NOT MISSING(1*NUMB01) #HASDATA = 1
137 +COMPUTE HAS01 = #HASDATA
138 +IF (#HASDATA EQ 1) END = 01
139 +COMPUTE #HASDATA = 0
140 +IF (ALPH02 NE ' ') #HASDATA = 1
141 +IF NOT MISSING(1*ALPH02) #HASDATA = 1
>Error # 4307 in column 36. Text: )
>One of the operands for an arithmetic operation is other than a
numeric >variable or numeric expression.
>This command not executed.
142 +COMPUTE HAS02 = #HASDATA
143 +IF (#HASDATA EQ 1) END = 2
144 +COMPUTE #HASDATA = 0
145 +IF (NUMB03 NE ' ') #HASDATA = 1
>Error # 4305 in column 30. Text: )
>A relational operator may have two numeric operands or two character
string >operands. To compare a character string to a numeric quantity,
consider >using the STRING or NUMBER function.
>This command not executed.
146 +IF NOT MISSING(1*NUMB03) #HASDATA = 1
147 +COMPUTE HAS03 = #HASDATA
148 +IF (#HASDATA EQ 1) END = 3
149 +COMPUTE #HASDATA = 0
150 +IF (ALPH04 NE ' ') #HASDATA = 1
151 +IF NOT MISSING(1*ALPH04) #HASDATA = 1
>Error # 4307 in column 36. Text: )
>One of the operands for an arithmetic operation is other than a
numeric >variable or numeric expression.
>This command not executed.
152 +COMPUTE HAS04 = #HASDATA
153 +IF (#HASDATA EQ 1) END = 4
LIST.
List
|-----------------------------|---------------------------|
|Output Created |03-MAY-2006 09:56:38 |
|-----------------------------|---------------------------|
NUMB01 ALPH02 NUMB03 ALPH04 END HAS01 HAS02 HAS03 HAS04
1 Aaron 2 Benjamin 4 1 1 1 1
3 Charles 4 3 1 1 1 0
5 Peter . 2 1 1 0 0
6 . 1 1 0 0 0
Number of cases read: 4 Number of cases listed: 4