| Date: | Mon, 29 Oct 2001 11:18:12 -0500 |
| Reply-To: | Sean Carey <carey@FAS.HARVARD.EDU> |
| Sender: | "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU> |
| From: | Sean Carey <carey@FAS.HARVARD.EDU> |
| Subject: | Duplicate cases |
| Content-Type: | text/plain; charset="iso-8859-1" |
I have a very large dataset of court cases. There are some duplicated cases
in the dataset when CASEID, STATE and COUNTY are identical. I would like to
create a dummy variable for whether or not a case is a duplicate (i.e not
unique), and another variable for the sequence of these duplicate cases.
I have used the SORT and LAG functions so far (syntax below), but can only
seem to account for the 2nd, 3rd, etc cases, not the original appearance of
a duplicated case.
SORT cases by CASEID STATE COUNTY YEAR.
if (CASEID = lag(CASEID )) flag=1.
if (STATE = lag(STATE )) flag1=1.
if (COUNTY = lag(COUNTY )) flag2=1.
if (flag=flag1=flag2) dummy=1.
compute countvar = 0 .
if (CASEID =lag(CASEID ) & STATE =lag(STATE ) & COUNTY =lag(COUNTY ))
countvar = lag(countvar)+1.
Any help would be gratefully received.
Best,
Sean Carey
|