Jon Peck
SPSS, an IBM Company
peck@us.ibm.com
312-651-3435
From:
Richard Ristow <wrristow@mindspring.com>
To:
SPSSX-L@LISTSERV.UGA.EDU
Date:
11/15/2009 07:15 PM
Subject:
Re: [SPSSX-L] Advice regarding very
large dataset
Sent by:
"SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
At 10:23 AM 11/15/2009, Jon K Peck wrote:
I'm not clear on why vectors don't meet the requirements
for this problem. You read in your data as usual and define a vector
that in effect overlays the variable list. Then you can use ordinary
SPSS transformation looping commands such as LOOP and use the vector indexes
as subscripts.
But, here's the data structure:
[There is] a single record for each visitor with up to
100 page views, and each page view is represented by many variables. A
simplified schematic might be:
UserID X1 Y1 Z1 X2 Y2 Z2....X100 Y100 Z100
There are many more than 3 variables per page view
It would be great to define vectors X, Y, and Z with indices 1-100. But
SPSS can't do that; it requires all elements of any vector to be contiguous.
You could, if all variables are numeric, define
VECTOR AllData X1 TO Z100.
but that leads to terribly clumsy code to calculate the index values.
>>>You could reorder the variables easily
with a little Python code (to avoid writing out the names). Or do
the transformations with a small Python program.
To reorder the variables (this requires the Python plugin
from Developer Central):
data list free /UserID X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3.
begin data
999 1 11 111 2 22 222 3 33 333
end data.
dataset name xyz.
Note:
- The names are sorted strictly alphabetically. That
means that x10 comes before x2.
HTH,
Jon Peck
DO REPEAT does work. It's a lengthy statement, since you have to
name every variable:
DO REPEAT X = X1 X2 X3 X4 X5 X6
X7 X8 X9 X10
X11 X12 X13
X14 [continuing to]
X91 X92 X93
X94 X95 X96 X97 X98 X99 X100
/Y = Y1 Y2 Y3 Y4
Y5 Y6 Y7 Y8 Y9 Y10
...
Y91 Y92 Y93
Y94 Y95 Y96 Y97 Y98 Y99 Y100
and the same for Z.
As everybody knows, I usually advise 'unrolling' such structures to one
record per event:
UserID PageView X Y Z
But it would be nice to have SPSS handle the original records more gracefully;
for example, with a construct like
VECTOR X,Y,Z =X1 TO Z100.
===================== To manage your subscription
to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L),
with no body text except the command. To leave the list, send the command
SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the
command INFO REFCARD