Date: Fri, 15 Sep 2006 22:05:02 -0300
Reply-To: Carlos Renato <renato.stat@gmail.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Carlos Renato <renato.stat@gmail.com>
Subject: Re: selecting most recent record
In-Reply-To: <E0428445172CB7469BF187FF4D00514903CF8C60@CHIEMAIL1.spss.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Dear Friend
If your problem is this I have a soluction.
Client Data
1 20/09/2010
1 21/09/2010
1 22/09/2010
2 01/01/1999
2 08/01/2000
3 10/10/2001
3 21/01/1999
This syntax is sufficient to decide the problem above
* Identify Duplicate Cases.
SORT CASES BY Client(A) Date(A) .
MATCH FILES /FILE = * /BY Client
/FIRST = PrimaryFirst /LAST = PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE MatchSequence = 1 - PrimaryLast.
ELSE.
COMPUTE MatchSequence = MatchSequence + 1.
END IF.
LEAVE MatchSequence.
FORMAT MatchSequence (f7).
COMPUTE InDupGrp = MatchSequence > 0.
SORT CASES InDupGrp(D).
MATCH FILES /FILE = * /DROP = PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS PrimaryLast 'Indicator of each last matching case as
Primary' .
VALUE LABELS PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES = PrimaryLast .
EXECUTE.
FILTER OFF.
USE ALL.
SELECT IF(NOT(PrimaryLast =0)).
EXECUTE .
Only the last case of each client is in the current file now.
Good bye.
Carlos Renato
Statistician
+55 (81)35811246
2006/9/15, Oliver, Richard <roliver@spss.com>:
>
> Correct idea, but not quite correct implementation. I think this is what
> you mean:
>
> MATCH FILES FILE=* /BY empid / last = last .
>
> You don't want to use both id and date as BY variables; you just want to
> use id as the BY variable. And you want to use LAST (or sort in descending
> order) to get the most recent date.
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
> Trondsen, Norman(DHS-CLPPB)
> Sent: Friday, September 15, 2006 3:39 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Re: selecting most recent record
>
> Sydelle:
>
> Try this. I use it regularly on Medi-Cal data and lead poisoning case
> management data. After sorting the data set on the key VARS, MATCH FILES
> flags the first record for each patient by the date VAR. You then have
> options. You can run freqs on FIRST without deleting anything. Or SELECT
> IF will delete all but the first record.
>
> An alternative after the SORT is to run RANK CASES instead of MATCH
> FILES. This will create a rank variable and number the records for each ID
> variable sequentially beginning with 1.
>
> SORT CASES BY
> empid (A) svcdt (A) .
>
> MATCH FILES FILE=* /BY empid svcdt / FIRST = first.
>
> SELECT IF first.
>
>
>
> Norm Trondsen
>
> CDHS-CLPPB
>
> Program Evaluation & Research
>
> 850 Marina Bay Parkway
>
> Building P, Third Floor
>
> Richmond, CA 94808-6403
>
> 510.620.5655
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
> Raffe, Sydelle, SSA
> Sent: Friday, September 15, 2006 11:25 AM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: selecting most recent record
>
>
> I have a data set in which there are multiple records for each client. I
> would like to select the record with the most recent date to keep. I've
> tried the lag function and restructuring the data. Can anyone help?
> Thanks as usual.
>
> Sydelle Raffe, Alameda County Social Services Agency Information Services
> Division
> e:mail: draffe@acgov.org
> phone: 510-271-9174 fax: 510-271-9107
> If you have a request for information, please submit an ERPP request form
> at:
> https://alamedasocialservices.org/staff/support_services/statistics_and_
> reports/odm/index.cfm
>
--
Carlos Renato
|