=========================================================================
Date: Thu, 20 Jul 2006 15:28:12 -0500
Reply-To: "Beadle, ViAnn" <viann@spss.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Beadle, ViAnn" <viann@spss.com>
Subject: Re: 10 most frequent occurring values of a multiple response set
Content-Type: text/plain; charset="us-ascii"
Then ignore the whole concept of a multiple response set and just compute some variable which is a combination of all three values. For example if z1, z2, and z3 take on two values you'll need some thing like:
Compute z=z1 + z2*1000 + z3*100000.
The second step is to rank occurrences, not values.
You need to use aggregate to capture the occurrences into a variable, using the N function and z as your break variable. This will give you a dataset with one row for each unique value of z and N. Sort that dataset in descending order on N and then compute nrank= $casenum after the sort.
So your aggregated dataset has z, N, and nrank. You have to get nrank onto your original dataset through a table match. But to do so, you need to sort both the aggregated dataset and the original dataset on z and use z as the matching key. Once nrank is on your dataset then you can either filter or select cases with rankz less than or equal to 10.
Here's some syntax that I pasted from SPSS, release 14+ that might do the trick:
GET
FILE='C:\Program Files\SPSS\orginaldata.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
COMPUTE Z=z1+z2*1000+z3*100000.
DATASET DECLARE ranked_data.
AGGREGATE
/OUTFILE='ranked_data'
/BREAK=z
/N=N.
DATASET ACTIVATE ranked_data.
SORT CASES BY
N (D) .
COMPUTE nrank = $casenum .
EXECUTE .
DATASET ACTIVATE DataSet1.
SORT CASES BY
z (A) .
DATASET ACTIVATE ranked_data.
SORT CASES BY
z (A) .
DATASET ACTIVATE DataSet1.
SAVE OUTFILE='C:\Program Files\SPSS\originaldata.sav'
/COMPRESSED.
MATCH FILES /FILE=*
/TABLE='ranked_data'
/BY z.
EXECUTE.
USE ALL.
COMPUTE filter_$=(nrank <= 10).
VARIABLE LABEL filter_$ 'nrank <= 10 (FILTER)'.
VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'.
FORMAT filter_$ (f1.0).
FILTER BY filter_$.
... rest of analysis goes here
I think the big issue here is what to do about ties. In my example the 10 most frequently occurring value was shared by 5 values and this code takes the first 10 frequencies which happen to be sorted on the z variable.
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Edward Boadi
Sent: Thursday, July 20, 2006 2:25 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: 10 most frequent occurring values of a multiple response set
This is not RFM analysis.
Yes Iam looking for 10 most frequently occurring combinations of the three variables as my initial step.
Then select X , y1 , y2 , z1, z2 and z3 where (z1,z2,z3) = z ie where z1,z2, and z3 corresponds to the 10 most
frequent occurring combinations of z1,z2 and z3.
Regards.
-----Original Message-----
From: Beadle, ViAnn [mailto:viann@spss.com]
Sent: Thursday, July 20, 2006 3:15 PM
To: Edward Boadi; SPSSX-L@LISTSERV.UGA.EDU
Subject: RE: 10 most frequent occurring values of a multiple response
set
I'm not quite sure what it means to rank z since it is a set of 3 values. Are you looking for the most frequently occurring combinations of the three variables?
Is this some sort of RFM analysis?
-----Original Message-----
From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Edward Boadi
Sent: Thursday, July 20, 2006 2:03 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: 10 most frequent occurring values of a multiple response set
Dear List,
I have a data file with variables :
X , y1 , y2 , z1, z2 and z3
I wont to a accomplish the following task :
1. create a multiple response set z from z1,z2 and z3 .
2. Rank z and select cases for rank z <= 10
3. select cases from my original data file where z = z1, z2 or z3
My objective is to create a new dataset restricted to 10 most frequent occurring values of a multiple response set created from z1 , z2 and z3
Any ideas on how to accomplish this will be most welcome.