Date: Wed, 24 Sep 2003 11:20:51 -0400
Reply-To: Mark Davenport <madavenp@OFFICE.UNCG.EDU>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Mark Davenport <madavenp@OFFICE.UNCG.EDU>
Subject: counting unique cases; yet another method
Content-Type: text/plain; charset=US-ASCII
As has been said, there are dozens of solutions. This one may be simpler to use. Lets assume you want to identify duplicate id numbers (duplicate cases) for removal from the dataset; a needed activity if you want to match-merge by id number. The following syntax creates a duplicate identifier variable called count#.
COMPUTE casenum=$CASENUM.
EXECUTE.
RANK VARIABLES=CASENUM by id_numb
/RANK INTO count#.
EXECUTE.
Compute adds a new variable to the end of your set which is simply a case number (that is, a row number).
Rank sorts the variables but ascending case number and then looks for duplicate values in the id_numb column (assuming varx is the variable of concern.
Then, a new variable called count# is created. In this column, a '1' will appear in every row wherein a unique id number is found. When a duplicate is found, it will be marked with a '2' in the count# column. After you run this syntax, run a frequencies on the count# var. The frequencies for '1' shows the number of unique id numbers, the freq for '2' shows the number of dups, the freq for '3' shows the number of triplicates, etc.
This actually does the same thing that Jim's sytax did. It just does it in a different way. Many rookies know what COMPUTE does, fewer know what AUTORECODE does. Additionally, you only have 2 new variables to deal with and only one of concern (count#).
Mark
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Mark A. Davenport Ph.D.
Asst to the Vice Chancellor for Student Affairs/Research and Evaluation
The University of North Carolina at Greensboro
149 Mossman Bldg.
Greensboro, NC 27402-6170
336.334.5099
madavenp@office.uncg.edu
'An approximate answer to the right problem is worth a good deal more than an exact
answer to an approximate problem' -- J. W. Tukey
>>> "Moffitt, James" <james.moffitt@thomson.com> 9/24/2003 10:27:22 AM >>>
Jim:
I know I'll appear to be a real imbecile, but I'm a syntax beginner and I
simply have to swallow my pride and ask the simplest of questions: how,
specifically, does one use the code you posted to test your routine? Does
the section that begins DATA LIST FREE /vara (A1) and ends with END DATA
create a file with a variable named vara containing the 24 records you've
listed or must we create such a file before we attempt to paste your code
into a syntax window and run it? I opened a new SPSS file, created a
variable named vara, entered the appropriate values in the first 24 rows,
opened a new syntax window, and pasted in your code so it appeared like
this:
DATA LIST FREE /vara (A1).
BEGIN DATA
1 1 1 1 2 2 3 3 4 4 5 5 5 6 6 6
7 7 7 7 7 8 9 9
END DATA.
AUTORECODE vara /INTO varnbr.
RANK varnbr /RANK INTO varcnt /TIES=CONDENSE.
DESCRIPTIVES varnbr /STAT=max.
I then place my cursor in the syntax code, pressed ctrl+A and pressed
ctrl+R.
I got 3 columns labeled vara, varbrn, and varcnt. All three contained the
same value for each record with the exception than the value in varcnt was
displayed as the integer followed by a decimal and 3 zeros. What did I do
wrong? Thanks in advance.
-----Original Message-----
From: Marks, Jim [mailto:Jim.Marks@lodgenet.com]
Sent: Wednesday, September 24, 2003 8:16 AM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: Re: should be real simple... counting unique cases