LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (July 2006)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
=========================================================================
Date:         Thu, 20 Jul 2006 15:28:12 -0500
Reply-To:     "Beadle, ViAnn" <viann@spss.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Beadle, ViAnn" <viann@spss.com>
Subject:      Re: 10 most frequent occurring values of a multiple response  set
Comments: To: Edward Boadi <eboadi@abhct.com>
Content-Type: text/plain; charset="us-ascii"

Then ignore the whole concept of a multiple response set and just compute some variable which is a combination of all three values. For example if z1, z2, and z3 take on two values you'll need some thing like:

Compute z=z1 + z2*1000 + z3*100000.

The second step is to rank occurrences, not values.

You need to use aggregate to capture the occurrences into a variable, using the N function and z as your break variable. This will give you a dataset with one row for each unique value of z and N. Sort that dataset in descending order on N and then compute nrank= $casenum after the sort. So your aggregated dataset has z, N, and nrank. You have to get nrank onto your original dataset through a table match. But to do so, you need to sort both the aggregated dataset and the original dataset on z and use z as the matching key. Once nrank is on your dataset then you can either filter or select cases with rankz less than or equal to 10.

Here's some syntax that I pasted from SPSS, release 14+ that might do the trick: GET FILE='C:\Program Files\SPSS\orginaldata.sav'. DATASET NAME DataSet1 WINDOW=FRONT. COMPUTE Z=z1+z2*1000+z3*100000. DATASET DECLARE ranked_data. AGGREGATE /OUTFILE='ranked_data' /BREAK=z /N=N. DATASET ACTIVATE ranked_data. SORT CASES BY N (D) . COMPUTE nrank = $casenum . EXECUTE . DATASET ACTIVATE DataSet1. SORT CASES BY z (A) . DATASET ACTIVATE ranked_data. SORT CASES BY z (A) . DATASET ACTIVATE DataSet1. SAVE OUTFILE='C:\Program Files\SPSS\originaldata.sav' /COMPRESSED. MATCH FILES /FILE=* /TABLE='ranked_data' /BY z. EXECUTE. USE ALL. COMPUTE filter_$=(nrank <= 10). VARIABLE LABEL filter_$ 'nrank <= 10 (FILTER)'. VALUE LABELS filter_$ 0 'Not Selected' 1 'Selected'. FORMAT filter_$ (f1.0). FILTER BY filter_$. ... rest of analysis goes here

I think the big issue here is what to do about ties. In my example the 10 most frequently occurring value was shared by 5 values and this code takes the first 10 frequencies which happen to be sorted on the z variable.

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Edward Boadi Sent: Thursday, July 20, 2006 2:25 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: Re: 10 most frequent occurring values of a multiple response set

This is not RFM analysis.

Yes Iam looking for 10 most frequently occurring combinations of the three variables as my initial step. Then select X , y1 , y2 , z1, z2 and z3 where (z1,z2,z3) = z ie where z1,z2, and z3 corresponds to the 10 most frequent occurring combinations of z1,z2 and z3.

Regards.

-----Original Message----- From: Beadle, ViAnn [mailto:viann@spss.com] Sent: Thursday, July 20, 2006 3:15 PM To: Edward Boadi; SPSSX-L@LISTSERV.UGA.EDU Subject: RE: 10 most frequent occurring values of a multiple response set

I'm not quite sure what it means to rank z since it is a set of 3 values. Are you looking for the most frequently occurring combinations of the three variables?

Is this some sort of RFM analysis?

-----Original Message----- From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of Edward Boadi Sent: Thursday, July 20, 2006 2:03 PM To: SPSSX-L@LISTSERV.UGA.EDU Subject: 10 most frequent occurring values of a multiple response set

Dear List, I have a data file with variables : X , y1 , y2 , z1, z2 and z3

I wont to a accomplish the following task : 1. create a multiple response set z from z1,z2 and z3 . 2. Rank z and select cases for rank z <= 10 3. select cases from my original data file where z = z1, z2 or z3

My objective is to create a new dataset restricted to 10 most frequent occurring values of a multiple response set created from z1 , z2 and z3

Any ideas on how to accomplish this will be most welcome.


Back to: Top of message | Previous page | Main SPSSX-L page