Date: Wed, 10 Sep 2003 21:51:16 -0400
Reply-To: Richard Ristow <email@example.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Richard Ristow <firstname.lastname@example.org>
Subject: Re: data merge - possibly using dates?
Content-Type: text/plain; charset="us-ascii"; format=flowed
At 03:05 PM 9/10/2003 -0700, Nico Peruzzi, Ph.D. wrote:
>I have two datasets that I need to combine. First one has an ID,
>date, and other data. Second one has ID, date (in a different format)
>and other data.
>Problem is that the ID is not unique to each case (a series of cases
>share the same ID), so when I try a basic merge, things don't line up
>as I'd expect. So, I thought I could try use the dates somehow.
>Can I do a 'complex' merge that first looks at ID then date, or vice-versa?
Very easily: You include both keys on the /BY clause of a MATCH FILES.
But there are a lot of issues to settle before you do it.
First, and nothing to do with SPSS, this can only work if the pair of
values, ID and DATE, are unique within both files; and it's only
meaningful if that's what the records are 'about': that each record has
information about whatever the IDs represent, for essentially the same
set of dates.
I worry, because you write,
>Date in the first set is in this format: 07APR2003:12:51:41
that is, date and time, down to seconds; and
>Date in the second set is broken across three variables:
>year (4 digits), month (2 digits) and day (2 digits).
that is, NO time, just the date. Could, for example, the first dataset
have *several* records for the same day, for the same ID?
>Any thoughts on getting these together?
Well, if the questions I mentioned above aren't a problem, it's fairly
easy. The form "07APR2003:12:51:41" is a bit of a pain to read; see
code at end of the posting.
/* Assume File 1 is the current file, containing variables */
/* ID, DST_DATE, DST_TIME, and others: */
SORT CASES BY ID DST_DATE.
/KEEP =ID DATE ALL.
/* Assume File 2 is c:\MY_SPSS\FILE2.SAV, with variables */
/* ID, DATE_YR, DATE_MO, DATE_DY, and others: */
COMPUTE DATE = DATE.DMY(DATE_YR,DATE_MO,DATE_DY).
FORMATS DATE (DATE11).
SORT CASES BY ID DATE.
/* Here's how you merge them, as the current file: */
/BY ID DATE.
/* Now manipulate, save, etc. as you like. */
* APPENDIX: .
* To read the time-stamp value in the first file, .
* here's the best I could do (this is SPSS draft output) .
* (Adjust the columns to begin wherever the data does begin .
* in the input record) .
file='C:\c_testsp\spssx-l\Peruzzi - date merge\TimeStmp.DAT'
/1 DST_DY 1-2(F)
DST_TIME 11 -18(time).
Data List will read 1 records from C:\c_testsp\spssx-l\Peruzzi - date
Variable Rec Start End Format
DST_DY 1 1 2 F2.0
DST_MO 1 3 5 Month3
DST_YR 1 6 9 F4.0
DST_TIME 1 11 18 Time8.0
COMPUTE DST_DATE = DATE.DMY(DST_DY,DST_MO,DST_YR).
FORMATS DST_DATE (DATE11).
VARIABLE LABELS DST_DATE 'Date portion of date-time stamp'
DST_TIME 'Time portion of date-time stamp'.
LIST DST_DATE DST_TIME.
10 Sep 03
Number of cases read: 1 Number of cases listed: 1