Date: Thu, 3 Jan 2008 22:20:32 -0500
Reply-To: Richard Ristow <wrristow@mindspring.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <wrristow@mindspring.com>
Subject: Re: Fixing multiple records for same time interval...
In-Reply-To: <ffa35560712190700o2111ed0fv1fa7768ef8798c58@mail.gmail.com >
Content-Type: text/plain; charset="us-ascii"; format=flowed
A programming exercise that got put off, in hectic holiday travel.
At 10:00 AM 12/19/2007, rich reeves wrote:
>I have a data set where individuals have multiple rows with interval
>information. What I have looks like:
|-----------------------------|---------------------------|
|Output Created |03-JAN-2008 21:58:42 |
|-----------------------------|---------------------------|
id time_begin time_end
a 08/15/2006 12/15/2006
a 08/01/2006 12/11/2006
a 08/31/2006 11/20/2006
a 09/15/2006 12/31/2006
a 01/15/2007 05/01/2007
a 05/31/2007 08/01/2007
a 06/15/2007 08/31/2007
Number of cases read: 7 Number of cases listed: 7
(This is reformatted to what you probably actually have. Your example
came through with many line-breaks you probably didn't intend; I
wonder why this happens so often?)
>I want to create a maximum interval for each set of overlapping
>intervals for an individual. That is for a set of rows with
>overlapping time intervals I want the minimum begin date and the
>maximum end date:
id time_begin time_end
a 8/1/2006 12/31/2006
a 1/15/2007 5/1/2007
a 5/31/2007 8/31/2007
That is, you want one record for every interval in which all days
fell within the interval for some one of the records, by ID. Here's a
solution using LAG and AGGREGATE. Variable #LATEST is a scratch
variable (see the SPSS *Command Syntax Reference*), which means that
(a) it keeps its value from one case to the next (b) it isn't kept in
the file. SPSS 14 draft output (WRR:not saved separately):
* Assign each record to an interval of .
* continuous 'activity': .
SORT CASES BY time_begin.
NUMERIC INTERVAL (F3).
NUMERIC #LATEST (ADATE10).
DO IF $CASENUM EQ 1 /* First record in the file */.
. COMPUTE INTERVAL = 1.
. COMPUTE #LATEST = time_end.
ELSE IF id NE LAG(ID) /* First record for an id */.
. COMPUTE INTERVAL = 1.
. COMPUTE #LATEST = time_end.
ELSE IF time_begin LE #LATEST /* Continue current interval*/.
. COMPUTE INTERVAL = LAG(INTERVAL).
. COMPUTE #LATEST = MAX(#LATEST,time_end).
ELSE /* Start new interval */.
. COMPUTE INTERVAL = LAG(INTERVAL) + 1.
. COMPUTE #LATEST = time_end.
END IF.
. /**/ LIST /*-*/.
List
|-----------------------------|---------------------------|
|Output Created |03-JAN-2008 22:16:26 |
|-----------------------------|---------------------------|
id time_begin time_end INTERVAL
a 08/01/2006 12/11/2006 1
a 08/15/2006 12/15/2006 1
a 08/31/2006 11/20/2006 1
a 09/15/2006 12/31/2006 1
a 01/15/2007 05/01/2007 2
a 05/31/2007 08/01/2007 3
a 06/15/2007 08/31/2007 3
Number of cases read: 7 Number of cases listed: 7
* Get earliest and latest dates for .
* each interval: .
AGGREGATE OUTFILE=*
/BREAK = ID INTERVAL
/time_begin = MIN(time_begin)
/time_end = MAX(time_end).
LIST.
List
|-----------------------------|---------------------------|
|Output Created |03-JAN-2008 22:16:27 |
|-----------------------------|---------------------------|
id INTERVAL time_begin time_end
a 1 08/01/2006 12/31/2006
a 2 01/15/2007 05/01/2007
a 3 05/31/2007 08/31/2007
Number of cases read: 3 Number of cases listed: 3
===================
APPENDIX: Test data
===================
DATA LIST LIST/
id time_begin time_end
(A1 ADATE10 ADATE10).
BEGIN DATA
a 8/15/2006 12/15/2006
a 8/1/2006 12/11/2006
a 8/31/2006 11/20/2006
a 9/15/2006 12/31/2006
a 1/15/2007 5/1/2007
a 5/31/2007 8/1/2007
a 6/15/2007 8/31/2007
END DATA.
LIST.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD