Date: Thu, 11 Sep 2003 09:52:51 -0700
Reply-To: "Nico Peruzzi, Ph.D." <nperuzzi@yahoo.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: "Nico Peruzzi, Ph.D." <nperuzzi@yahoo.com>
Subject: Re: data merge - possibly using dates? - part 2
In-Reply-To: <5.1.0.14.2.20030910193947.05f1bd90@pop.mindspring.com>
Content-Type: text/plain; charset=us-ascii
Thanks to all for the great tips. Dates have been
converted, but as mentioned, some duplication issues are
hindering the merge.
Here's the data structure.
Dataset 1
ID Date
1 A
1 B
1 C
1 D
2 E
2 F
2 A
2 G
3 H
3 B
3 I
...
So, you see that there are some of the same dates across
different IDs.
Dataset 2 (note that ID2 is unique for each case,
unfortunately it is not present in dataset 1)
ID Date ID2
1 A
1 B
1 B
1 B
1 B
1 C
1 C
1 C
1 D
2 E
2 E
2 E
2 E
2 F
2 F
2 A
2 G
3 H
3 B
3 B
3 B
3 I
3 I
4 A
4 A
4 B
...
There is 10 times more data in dataset 2. Think of set 2
as all the possible times something happened, and dataset
one as a particular intervention that only happened (at
most) once each date for each ID.
Another piece of info is that the intervention, if it
happened on a date, always happened "first thing in the
morning" - i.e., it did not occur for other cases later
that same day. Unfortunately the time marker is only
present in the smaller intervention dataset (set 1).
However, and this may help, ID2 in dataset 2 is a unique #
that is ordered based on time. I'm trying to figure out:
1. How I could use ID2 to sort out when an intervention
happened in dataset 2, but even then,
2. If my merge will even work given the duplication issues,
or if there is a work around.
I'm a little bogged in the details right now; I'd
appreciate any thoughts.
Thanks, Nico
--- Richard Ristow <wrristow@mindspring.com> wrote:
> At 03:05 PM 9/10/2003 -0700, Nico Peruzzi, Ph.D. wrote:
>
> >I have two datasets that I need to combine. First one
> has an ID,
> >date, and other data. Second one has ID, date (in a
> different format)
> >and other data.
> >
> >Problem is that the ID is not unique to each case (a
> series of cases
> >share the same ID), so when I try a basic merge, things
> don't line up
> >as I'd expect. So, I thought I could try use the dates
> somehow.
>
> >Can I do a 'complex' merge that first looks at ID then
> date, or vice-versa?
>
> Very easily: You include both keys on the /BY clause of a
> MATCH FILES.
> But there are a lot of issues to settle before you do it.
>
> First, and nothing to do with SPSS, this can only work if
> the pair of
> values, ID and DATE, are unique within both files; and
> it's only
> meaningful if that's what the records are 'about': that
> each record has
> information about whatever the IDs represent, for
> essentially the same
> set of dates.
>
> I worry, because you write,
>
> >Date in the first set is in this format:
> 07APR2003:12:51:41
>
> that is, date and time, down to seconds; and
>
> >Date in the second set is broken across three variables:
> >year (4 digits), month (2 digits) and day (2 digits).
>
> that is, NO time, just the date. Could, for example, the
> first dataset
> have *several* records for the same day, for the same ID?
>
> >Any thoughts on getting these together?
>
> Well, if the questions I mentioned above aren't a
> problem, it's fairly
> easy. The form "07APR2003:12:51:41" is a bit of a pain to
> read; see
> code at end of the posting.
>
> /* Assume File 1 is the current file, containing
> variables */
> /* ID, DST_DATE, DST_TIME, and others:
> */
>
> SORT CASES BY ID DST_DATE.
> SAVE OUTFILE='c:\tmp\FILE1.SAV'
> /RENAME=(DST_DATE=DATE)
> /KEEP =ID DATE ALL.
>
> /* Assume File 2 is c:\MY_SPSS\FILE2.SAV, with variables
> */
> /* ID, DATE_YR, DATE_MO, DATE_DY, and others:
> */
>
> GET FILE='c:\MY_SPSS\FILE2.SAV'.
> COMPUTE DATE = DATE.DMY(DATE_YR,DATE_MO,DATE_DY).
> FORMATS DATE (DATE11).
> SORT CASES BY ID DATE.
>
> /* Here's how you merge them, as the current file:
> */
> MATCH FILES
> /FILE='c:\tmp\FILE1.SAV'
> /FILE=*
> /BY ID DATE.
>
> /* Now manipulate, save, etc. as you like.
> */
>
*.............................................................
> * APPENDIX:
> .
> * To read the time-stamp value in the first file,
> .
> * here's the best I could do (this is SPSS draft output)
> .
> * (Adjust the columns to begin wherever the data does
> begin .
> * in the input record)
> .
>
> data list
> file='C:\c_testsp\spssx-l\Peruzzi - date
> merge\TimeStmp.DAT'
> fixed records=1
> /1 DST_DY 1-2(F)
> DST_MO 3-5(month)
> DST_YR 6-9(F)
> DST_TIME 11 -18(time).
>
> Data List will read 1 records from
> C:\c_testsp\spssx-l\Peruzzi - date
> merge\TimeStmp.DAT
>
> Variable Rec Start End Format
>
> DST_DY 1 1 2 F2.0
> DST_MO 1 3 5 Month3
> DST_YR 1 6 9 F4.0
> DST_TIME 1 11 18 Time8.0
>
> COMPUTE DST_DATE = DATE.DMY(DST_DY,DST_MO,DST_YR).
> FORMATS DST_DATE (DATE11).
> VARIABLE LABELS DST_DATE 'Date portion of date-time
> stamp'
> DST_TIME 'Time portion of date-time
> stamp'.
> LIST DST_DATE DST_TIME.
>
> List
> 10 Sep 03
> DST_DATE DST_TIME
> 07-APR-2003 12:51:41
> Number of cases read: 1 Number of cases listed: 1
>
__________________________________
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com
|