|Date: ||Mon, 29 Oct 2001 11:18:12 -0500|
|Reply-To: ||Sean Carey <carey@FAS.HARVARD.EDU>|
|Sender: ||"SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>|
|From: ||Sean Carey <carey@FAS.HARVARD.EDU>|
|Subject: ||Duplicate cases|
|Content-Type: ||text/plain; charset="iso-8859-1"|
I have a very large dataset of court cases. There are some duplicated cases
in the dataset when CASEID, STATE and COUNTY are identical. I would like to
create a dummy variable for whether or not a case is a duplicate (i.e not
unique), and another variable for the sequence of these duplicate cases.
I have used the SORT and LAG functions so far (syntax below), but can only
seem to account for the 2nd, 3rd, etc cases, not the original appearance of
a duplicated case.
SORT cases by CASEID STATE COUNTY YEAR.
if (CASEID = lag(CASEID )) flag=1.
if (STATE = lag(STATE )) flag1=1.
if (COUNTY = lag(COUNTY )) flag2=1.
if (flag=flag1=flag2) dummy=1.
compute countvar = 0 .
if (CASEID =lag(CASEID ) & STATE =lag(STATE ) & COUNTY =lag(COUNTY ))
countvar = lag(countvar)+1.
Any help would be gratefully received.