Date: Fri, 25 Feb 2005 20:18:39 -0500
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: TOP 10 COMBINATIONS
In-Reply-To: <20050222083450.E65F93A7F44@smtp-01.servidoresdns.net>
Content-Type: text/plain; charset="iso-8859-1"; format=flowed
At 03:37 AM 2/22/2005, Victor Tarrago wrote:
>I have the following problem and I don’t know if it could be solved
>with SPSS.
I don't think any of us have answered, because we haven't been sure
what to say. To give a quick answer: probably, SPSS is as good a tool
as any for doing what you say you want. But what you say you're doing
seems a very clumsy way to represent, and handle, your data in SPSS, or
any other system.
>Imagine five attributes (x, y, z, n & j): two with two levels, two
>with four levels and one with five levels, each utility level in a
>different variable, so we have the following 17 variables: x1 x2 y1 y2
>z1 z2 z3 z4 n1 n2 n3 n4 j1 j2 j3 j4 j5
If I understand you: either x1 or x2 is 1, but not both;
one, and only one, of z1 z2 z3 z4 is 1; etc.
Normally you'd have *one* variable 'x' for attribute x, with values 1
or 2 (or missing); one variable 'n' for attribute n, with values 1, 2,
3, 4 (or missing); etc. Among other things, it makes, say, FREQUENCIES
much easier to run, and to understand.
Is there a reason you didn't do it that way? Your representation
('dummy-variable coding') can be useful for entering categorical
attributes into regression models, and similar. But many such models
will create the dummies for you; for the others, it's wiser to start
with single variables with multiple levels, and create the multiple
'dummy' variables when you need them.
>We would like to create new variables, one variable for each possible
>combination of levels of all attributes (in the example would be
>2*2*4*4*5 =320 new variables). The value for each variable should be
>the sum of the
>levels being combined.
I assume that the last sentence is not what you mean; that you really
mean, "the value of each variable should be 1 if the variables for the
corresponding levels are 1". If I'm misunderstanding you, this is where
it shows; please respond, if so.
>After those variables had been created (that is, all possible
>combinations), we would like to rank them and identify the top 10
>preferred variables (or combinations) (with the highest mean value).
It looks like you want the most frequently occurring of your 320
combinations. (The 'mean value' of each of your 320 variables would be
the fraction of the time the corresponding combination occurs in your
data.)
If you had, as I've suggested, *one* variable for each of your
attributes x, y, z, n & j, it's pretty easy to count occurrences of all
combinations:
AGGREGATE /OUTFILE=*
/BREAK= X Y Z N J
/OCCUR 'Number of occurrences of combination' = N.
Then, you have one record for each combination (each one that's found
in your data, that is), with "OCCUR" being the number of times the
combination occurs. Sort, list, and report as you like.
You can even do it with your structure, though it's clumsier:
AGGREGATE /OUTFILE=*
/BREAK= x1 x2 y1 y2 z1 z2 z3 z4 n1 n2 n3 n4 j1 j2 j3 j4 j5
/OCCUR 'Number of occurrences of combination' = N.
NOW, here comes the final problem:
>In our case we have 11 attributes with different number of levels
>(4*4*5*2*3*4*5*4*5*4*3)
There, you're probably stuck, and neither SPSS nor any other program
will help you. If I entered the data correctly, that's 2,304,000
different combinations of levels. It makes no sense at all to count
occurrences by level unless you have many, many times that many
observations. And if you did, and did count occurrences by combinations
of levels, how would you understand what you got?
>and most of the attributes are continuous as price.
which simply means, counting by 'cells', or combinations of values,
makes even less sense.
>Any idea would be appreciated.
>Victor Tarragó Sanromà
Can you say more about what you're hoping to learn from your data? And,
maybe, a little more about what the data represents?
Good luck to you,
Richard Ristow