Date: Mon, 24 Jul 2006 09:12:04 -0400
Reply-To: Edward Boadi <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Edward Boadi <firstname.lastname@example.org>
Subject: Re: 10 most frequent occurring values of a multiple response set
Content-Type: text/plain; charset="iso-8859-1"
I want to keep the 10 values that occur most often, over all three variables (z1,z2 and Z3).
1. if z1=A,z2=B,z3=C, A is among the top 10, and B and C are not? make z2 and z3 system-missing for that record (case)
2. if z1=A,z2=B,z3=C, A and B is among the top 10, and C is not? make z3 system-missing for that record (case) etc
My objective is to set (z1,z2 and Z3) to system-missing for values of (z1,z2 and z3) that are not in the top 10.
From: Richard Ristow [mailto:email@example.com]
Sent: Friday, July 21, 2006 6:36 PM
To: Edward Boadi; SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: 10 most frequent occurring values of a multiple response
At 12:19 PM 7/21/2006, Edward Boadi wrote:
>I have some couple of questions.
>GET FILE='C:\Program Files\SPSS\originaldata.sav'.
>MATCH FILES /FILE=*
> /BY z.
>SELECT IF (nrank <= 10).
>1. ranked_data contains aggregated data with variables z and nrank
>2. originaldata.sav is the original data file with
>variables x,y1,y2,z1,z2 and z3
>3. z variable was created from the aggregation of z1,z2 and z3
>The syntax above is suppose to keep only cases with z1 , z2 and Z3
>are in the ranked data file (nrank <= 10). But after donig my analysis
>I still get values of z 1, z2 and z3 that are not in the ranked data
>file. Please advise.
OK. The syntax that ViAnn Beadle suggested, and I modified, looks for
the top ten COMBINATIONS of the three values z1, z2, z3. (The code is
sensitive to order - the combination z1=A,z2=B,z3=C is counted as
different from, say z1=C,z2=B,z3=A.)
I wasn't sure from your postings, but it sounds like you want to keep
the 10 values that occur most often, over all three variables. OK, that
can be done, though let me know, first, if I've got that right. And
what do you do, if, say,
z1=A,z2=B,z3=C, A is among the top 10, and B and C are not? Keep the
combination, or make z2 and z3 system-missing, or what?
It's manageable; I'd just like to understand the problem better.
>Regards to all.
>From: Richard Ristow [mailto:firstname.lastname@example.org]
>Sent: Friday, July 21, 2006 12:46 AM
>Cc: Edward Boadi; Beadle, ViAnn
>Subject: Re: 10 most frequent occurring values of a multiple response
>At 04:28 PM 7/20/2006, Beadle, ViAnn wrote:
> >Compute some variable which is a combination of all three values.
> >example if z1, z2, and z3 take on two[-digit] values you'll need
> >thing like:
> >Compute z=z1 + z2*1000 + z3*100000.
> >The second step is to rank occurrences, not values.
> >You need to use aggregate to capture the occurrences into a
> >using the N function and z as your break variable.
>Etc. I think this is exactly right, except why "compute some variable
>which is a combination of all three values"? AGGREGATE is perfectly
>happy with BREAKing on multiple variables. I'd suggest
>DATASET DECLARE ranked_data.
> /BREAK=z1 z2 z3
>DATASET DECLARE ranked_data.
>No virus found in this incoming message.
>Checked by AVG Anti-Virus.
>Version: 7.1.394 / Virus Database: 268.10.1/390 - Release Date: