|
See below.
Jon Peck
SPSS, an IBM Company
peck@us.ibm.com
312-651-3435
From:
Richard Ristow <wrristow@mindspring.com>
To:
SPSSX-L@LISTSERV.UGA.EDU
Date:
11/15/2009 07:15 PM
Subject:
Re: [SPSSX-L] Advice regarding very large dataset
Sent by:
"SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
At 10:23 AM 11/15/2009, Jon K Peck wrote:
I'm not clear on why vectors don't meet the requirements for this problem.
You read in your data as usual and define a vector that in effect
overlays the variable list. Then you can use ordinary SPSS transformation
looping commands such as LOOP and use the vector indexes as subscripts.
But, here's the data structure:
[There is] a single record for each visitor with up to 100 page views, and
each page view is represented by many variables. A simplified schematic
might be:
UserID X1 Y1 Z1 X2 Y2 Z2....X100 Y100 Z100
There are many more than 3 variables per page view
It would be great to define vectors X, Y, and Z with indices 1-100. But
SPSS can't do that; it requires all elements of any vector to be
contiguous. You could, if all variables are numeric, define
VECTOR AllData X1 TO Z100.
but that leads to terribly clumsy code to calculate the index values.
>>>You could reorder the variables easily with a little Python code (to
avoid writing out the names). Or do the transformations with a small
Python program.
To reorder the variables (this requires the Python plugin from Developer
Central):
data list free /UserID X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3.
begin data
999 1 11 111 2 22 222 3 33 333
end data.
dataset name xyz.
begin program.
import spss, spssaux
xvars = spssaux.VariableDict(pattern="X")
yvars = spssaux.VariableDict(pattern="Y")
zvars = spssaux.VariableDict(pattern="Z")
keepers = sorted(xvars.variables) + sorted(yvars.variables) +
sorted(zvars.variables)
spss.Submit("match files file=* /keep = UserID " + " ".join(keepers))
end program.
Note:
- The names are sorted strictly alphabetically. That means that x10 comes
before x2.
HTH,
Jon Peck
DO REPEAT does work. It's a lengthy statement, since you have to name
every variable:
DO REPEAT X = X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X11 X12 X13 X14 [continuing to]
X91 X92 X93 X94 X95 X96 X97 X98 X99 X100
/Y = Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
...
Y91 Y92 Y93 Y94 Y95 Y96 Y97 Y98 Y99 Y100
and the same for Z.
As everybody knows, I usually advise 'unrolling' such structures to one
record per event:
UserID PageView X Y Z
But it would be nice to have SPSS handle the original records more
gracefully; for example, with a construct like
VECTOR X,Y,Z =X1 TO Z100.
===================== To manage your subscription to SPSSX-L, send a
message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text
except the command. To leave the list, send the command SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command INFO
REFCARD
[text/html]
|