LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (November 2009)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Sun, 15 Nov 2009 19:43:57 -0700
Reply-To:   Jon K Peck <peck@us.ibm.com>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Jon K Peck <peck@us.ibm.com>
Subject:   Re: Advice regarding very large dataset
Comments:   To: Richard Ristow <wrristow@mindspring.com>
In-Reply-To:   <7.0.1.0.2.20091115170707.038a1be0@mindspring.com>
Content-Type:   multipart/alternative;

See below.

Jon Peck SPSS, an IBM Company peck@us.ibm.com 312-651-3435

From: Richard Ristow <wrristow@mindspring.com> To: SPSSX-L@LISTSERV.UGA.EDU Date: 11/15/2009 07:15 PM Subject: Re: [SPSSX-L] Advice regarding very large dataset Sent by: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>

At 10:23 AM 11/15/2009, Jon K Peck wrote:

I'm not clear on why vectors don't meet the requirements for this problem. You read in your data as usual and define a vector that in effect overlays the variable list. Then you can use ordinary SPSS transformation looping commands such as LOOP and use the vector indexes as subscripts.

But, here's the data structure:

[There is] a single record for each visitor with up to 100 page views, and each page view is represented by many variables. A simplified schematic might be:

UserID X1 Y1 Z1 X2 Y2 Z2....X100 Y100 Z100

There are many more than 3 variables per page view

It would be great to define vectors X, Y, and Z with indices 1-100. But SPSS can't do that; it requires all elements of any vector to be contiguous. You could, if all variables are numeric, define

VECTOR AllData X1 TO Z100.

but that leads to terribly clumsy code to calculate the index values.

>>>You could reorder the variables easily with a little Python code (to avoid writing out the names). Or do the transformations with a small Python program.

To reorder the variables (this requires the Python plugin from Developer Central):

data list free /UserID X1 Y1 Z1 X2 Y2 Z2 X3 Y3 Z3. begin data 999 1 11 111 2 22 222 3 33 333 end data. dataset name xyz.

begin program. import spss, spssaux xvars = spssaux.VariableDict(pattern="X") yvars = spssaux.VariableDict(pattern="Y") zvars = spssaux.VariableDict(pattern="Z") keepers = sorted(xvars.variables) + sorted(yvars.variables) + sorted(zvars.variables) spss.Submit("match files file=* /keep = UserID " + " ".join(keepers)) end program.

Note: - The names are sorted strictly alphabetically. That means that x10 comes before x2.

HTH, Jon Peck

DO REPEAT does work. It's a lengthy statement, since you have to name every variable:

DO REPEAT X = X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12 X13 X14 [continuing to] X91 X92 X93 X94 X95 X96 X97 X98 X99 X100 /Y = Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 ... Y91 Y92 Y93 Y94 Y95 Y96 Y97 Y98 Y99 Y100 and the same for Z.

As everybody knows, I usually advise 'unrolling' such structures to one record per event:

UserID PageView X Y Z

But it would be nice to have SPSS handle the original records more gracefully; for example, with a construct like

VECTOR X,Y,Z =X1 TO Z100.

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


[text/html]


Back to: Top of message | Previous page | Main SPSSX-L page