SPSS, an IBM Company
Richard Ristow <email@example.com>
11/15/2009 07:15 PM
Re: [SPSSX-L] Advice regarding very large dataset
"SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
At 10:23 AM 11/15/2009, Jon K Peck wrote:
I'm not clear on why vectors don't meet the requirements for this problem.
You read in your data as usual and define a vector that in effect
overlays the variable list. Then you can use ordinary SPSS transformation
looping commands such as LOOP and use the vector indexes as subscripts.
But, here's the data structure:
[There is] a single record for each visitor with up to 100 page views, and
each page view is represented by many variables. A simplified schematic
UserID X1 Y1 Z1 X2 Y2 Z2....X100 Y100 Z100
There are many more than 3 variables per page view
It would be great to define vectors X, Y, and Z with indices 1-100. But
SPSS can't do that; it requires all elements of any vector to be
contiguous. You could, if all variables are numeric, define
VECTOR AllData X1 TO Z100.
but that leads to terribly clumsy code to calculate the index values.
>>>You could reorder the variables easily with a little Python code (to
avoid writing out the names). Or do the transformations with a small
To reorder the variables (this requires the Python plugin from Developer
data list free /UserID X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3.
999 1 11 111 2 22 222 3 33 333
dataset name xyz.
import spss, spssaux
xvars = spssaux.VariableDict(pattern="X")
yvars = spssaux.VariableDict(pattern="Y")
zvars = spssaux.VariableDict(pattern="Z")
keepers = sorted(xvars.variables) + sorted(yvars.variables) +
spss.Submit("match files file=* /keep = UserID " + " ".join(keepers))
- The names are sorted strictly alphabetically. That means that x10 comes
DO REPEAT does work. It's a lengthy statement, since you have to name
DO REPEAT X = X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
X11 X12 X13 X14 [continuing to]
X91 X92 X93 X94 X95 X96 X97 X98 X99 X100
/Y = Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10
Y91 Y92 Y93 Y94 Y95 Y96 Y97 Y98 Y99 Y100
and the same for Z.
As everybody knows, I usually advise 'unrolling' such structures to one
record per event:
UserID PageView X Y Z
But it would be nice to have SPSS handle the original records more
gracefully; for example, with a construct like
VECTOR X,Y,Z =X1 TO Z100.
===================== To manage your subscription to SPSSX-L, send a
message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text
except the command. To leave the list, send the command SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command INFO