LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (September 2003)Back to main SPSSX-L pageJoin or leave SPSSX-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 11 Sep 2003 09:52:51 -0700
Reply-To:     "Nico Peruzzi, Ph.D." <nperuzzi@yahoo.com>
Sender:       "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:         "Nico Peruzzi, Ph.D." <nperuzzi@yahoo.com>
Subject:      Re: data merge - possibly using dates? - part 2
In-Reply-To:  <5.1.0.14.2.20030910193947.05f1bd90@pop.mindspring.com>
Content-Type: text/plain; charset=us-ascii

Thanks to all for the great tips. Dates have been converted, but as mentioned, some duplication issues are hindering the merge.

Here's the data structure.

Dataset 1

ID Date 1 A 1 B 1 C 1 D 2 E 2 F 2 A 2 G 3 H 3 B 3 I ...

So, you see that there are some of the same dates across different IDs.

Dataset 2 (note that ID2 is unique for each case, unfortunately it is not present in dataset 1)

ID Date ID2 1 A 1 B 1 B 1 B 1 B 1 C 1 C 1 C 1 D 2 E 2 E 2 E 2 E 2 F 2 F 2 A 2 G 3 H 3 B 3 B 3 B 3 I 3 I 4 A 4 A 4 B ...

There is 10 times more data in dataset 2. Think of set 2 as all the possible times something happened, and dataset one as a particular intervention that only happened (at most) once each date for each ID.

Another piece of info is that the intervention, if it happened on a date, always happened "first thing in the morning" - i.e., it did not occur for other cases later that same day. Unfortunately the time marker is only present in the smaller intervention dataset (set 1).

However, and this may help, ID2 in dataset 2 is a unique # that is ordered based on time. I'm trying to figure out:

1. How I could use ID2 to sort out when an intervention happened in dataset 2, but even then,

2. If my merge will even work given the duplication issues, or if there is a work around.

I'm a little bogged in the details right now; I'd appreciate any thoughts.

Thanks, Nico

--- Richard Ristow <wrristow@mindspring.com> wrote: > At 03:05 PM 9/10/2003 -0700, Nico Peruzzi, Ph.D. wrote: > > >I have two datasets that I need to combine. First one > has an ID, > >date, and other data. Second one has ID, date (in a > different format) > >and other data. > > > >Problem is that the ID is not unique to each case (a > series of cases > >share the same ID), so when I try a basic merge, things > don't line up > >as I'd expect. So, I thought I could try use the dates > somehow. > > >Can I do a 'complex' merge that first looks at ID then > date, or vice-versa? > > Very easily: You include both keys on the /BY clause of a > MATCH FILES. > But there are a lot of issues to settle before you do it. > > First, and nothing to do with SPSS, this can only work if > the pair of > values, ID and DATE, are unique within both files; and > it's only > meaningful if that's what the records are 'about': that > each record has > information about whatever the IDs represent, for > essentially the same > set of dates. > > I worry, because you write, > > >Date in the first set is in this format: > 07APR2003:12:51:41 > > that is, date and time, down to seconds; and > > >Date in the second set is broken across three variables: > >year (4 digits), month (2 digits) and day (2 digits). > > that is, NO time, just the date. Could, for example, the > first dataset > have *several* records for the same day, for the same ID? > > >Any thoughts on getting these together? > > Well, if the questions I mentioned above aren't a > problem, it's fairly > easy. The form "07APR2003:12:51:41" is a bit of a pain to > read; see > code at end of the posting. > > /* Assume File 1 is the current file, containing > variables */ > /* ID, DST_DATE, DST_TIME, and others: > */ > > SORT CASES BY ID DST_DATE. > SAVE OUTFILE='c:\tmp\FILE1.SAV' > /RENAME=(DST_DATE=DATE) > /KEEP =ID DATE ALL. > > /* Assume File 2 is c:\MY_SPSS\FILE2.SAV, with variables > */ > /* ID, DATE_YR, DATE_MO, DATE_DY, and others: > */ > > GET FILE='c:\MY_SPSS\FILE2.SAV'. > COMPUTE DATE = DATE.DMY(DATE_YR,DATE_MO,DATE_DY). > FORMATS DATE (DATE11). > SORT CASES BY ID DATE. > > /* Here's how you merge them, as the current file: > */ > MATCH FILES > /FILE='c:\tmp\FILE1.SAV' > /FILE=* > /BY ID DATE. > > /* Now manipulate, save, etc. as you like. > */ > *............................................................. > * APPENDIX: > . > * To read the time-stamp value in the first file, > . > * here's the best I could do (this is SPSS draft output) > . > * (Adjust the columns to begin wherever the data does > begin . > * in the input record) > . > > data list > file='C:\c_testsp\spssx-l\Peruzzi - date > merge\TimeStmp.DAT' > fixed records=1 > /1 DST_DY 1-2(F) > DST_MO 3-5(month) > DST_YR 6-9(F) > DST_TIME 11 -18(time). > > Data List will read 1 records from > C:\c_testsp\spssx-l\Peruzzi - date > merge\TimeStmp.DAT > > Variable Rec Start End Format > > DST_DY 1 1 2 F2.0 > DST_MO 1 3 5 Month3 > DST_YR 1 6 9 F4.0 > DST_TIME 1 11 18 Time8.0 > > COMPUTE DST_DATE = DATE.DMY(DST_DY,DST_MO,DST_YR). > FORMATS DST_DATE (DATE11). > VARIABLE LABELS DST_DATE 'Date portion of date-time > stamp' > DST_TIME 'Time portion of date-time > stamp'. > LIST DST_DATE DST_TIME. > > List > 10 Sep 03 > DST_DATE DST_TIME > 07-APR-2003 12:51:41 > Number of cases read: 1 Number of cases listed: 1 >

__________________________________ Do you Yahoo!? Yahoo! SiteBuilder - Free, easy-to-use web site design software http://sitebuilder.yahoo.com


Back to: Top of message | Previous page | Main SPSSX-L page