LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (October 2001)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Mon, 29 Oct 2001 11:18:12 -0500
Reply-To:   Sean Carey <carey@FAS.HARVARD.EDU>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Sean Carey <carey@FAS.HARVARD.EDU>
Subject:   Duplicate cases
Content-Type:   text/plain; charset="iso-8859-1"

I have a very large dataset of court cases. There are some duplicated cases in the dataset when CASEID, STATE and COUNTY are identical. I would like to create a dummy variable for whether or not a case is a duplicate (i.e not unique), and another variable for the sequence of these duplicate cases.

I have used the SORT and LAG functions so far (syntax below), but can only seem to account for the 2nd, 3rd, etc cases, not the original appearance of a duplicated case.

SORT cases by CASEID STATE COUNTY YEAR. if (CASEID = lag(CASEID )) flag=1. if (STATE = lag(STATE )) flag1=1. if (COUNTY = lag(COUNTY )) flag2=1. if (flag=flag1=flag2) dummy=1. compute countvar = 0 . if (CASEID =lag(CASEID ) & STATE =lag(STATE ) & COUNTY =lag(COUNTY )) countvar = lag(countvar)+1.

Any help would be gratefully received.

Best, Sean Carey


Back to: Top of message | Previous page | Main SPSSX-L page