LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (February 2005)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 25 Feb 2005 20:18:39 -0500
Reply-To:     Richard Ristow <>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         Richard Ristow <>
Subject:      Re: TOP 10 COMBINATIONS
Comments: To: Victor Tarrago <>
In-Reply-To:  <>
Content-Type: text/plain; charset="iso-8859-1"; format=flowed

At 03:37 AM 2/22/2005, Victor Tarrago wrote:

>I have the following problem and I don’t know if it could be solved >with SPSS.

I don't think any of us have answered, because we haven't been sure what to say. To give a quick answer: probably, SPSS is as good a tool as any for doing what you say you want. But what you say you're doing seems a very clumsy way to represent, and handle, your data in SPSS, or any other system.

>Imagine five attributes (x, y, z, n & j): two with two levels, two >with four levels and one with five levels, each utility level in a >different variable, so we have the following 17 variables: x1 x2 y1 y2 >z1 z2 z3 z4 n1 n2 n3 n4 j1 j2 j3 j4 j5

If I understand you: either x1 or x2 is 1, but not both; one, and only one, of z1 z2 z3 z4 is 1; etc.

Normally you'd have *one* variable 'x' for attribute x, with values 1 or 2 (or missing); one variable 'n' for attribute n, with values 1, 2, 3, 4 (or missing); etc. Among other things, it makes, say, FREQUENCIES much easier to run, and to understand.

Is there a reason you didn't do it that way? Your representation ('dummy-variable coding') can be useful for entering categorical attributes into regression models, and similar. But many such models will create the dummies for you; for the others, it's wiser to start with single variables with multiple levels, and create the multiple 'dummy' variables when you need them.

>We would like to create new variables, one variable for each possible >combination of levels of all attributes (in the example would be >2*2*4*4*5 =320 new variables). The value for each variable should be >the sum of the >levels being combined.

I assume that the last sentence is not what you mean; that you really mean, "the value of each variable should be 1 if the variables for the corresponding levels are 1". If I'm misunderstanding you, this is where it shows; please respond, if so.

>After those variables had been created (that is, all possible >combinations), we would like to rank them and identify the top 10 >preferred variables (or combinations) (with the highest mean value).

It looks like you want the most frequently occurring of your 320 combinations. (The 'mean value' of each of your 320 variables would be the fraction of the time the corresponding combination occurs in your data.)

If you had, as I've suggested, *one* variable for each of your attributes x, y, z, n & j, it's pretty easy to count occurrences of all combinations:

AGGREGATE /OUTFILE=* /BREAK= X Y Z N J /OCCUR 'Number of occurrences of combination' = N.

Then, you have one record for each combination (each one that's found in your data, that is), with "OCCUR" being the number of times the combination occurs. Sort, list, and report as you like.

You can even do it with your structure, though it's clumsier:

AGGREGATE /OUTFILE=* /BREAK= x1 x2 y1 y2 z1 z2 z3 z4 n1 n2 n3 n4 j1 j2 j3 j4 j5 /OCCUR 'Number of occurrences of combination' = N.

NOW, here comes the final problem:

>In our case we have 11 attributes with different number of levels >(4*4*5*2*3*4*5*4*5*4*3)

There, you're probably stuck, and neither SPSS nor any other program will help you. If I entered the data correctly, that's 2,304,000 different combinations of levels. It makes no sense at all to count occurrences by level unless you have many, many times that many observations. And if you did, and did count occurrences by combinations of levels, how would you understand what you got?

>and most of the attributes are continuous as price.

which simply means, counting by 'cells', or combinations of values, makes even less sense.

>Any idea would be appreciated. >Victor Tarragó Sanromŕ

Can you say more about what you're hoping to learn from your data? And, maybe, a little more about what the data represents?

Good luck to you, Richard Ristow

Back to: Top of message | Previous page | Main SPSSX-L page