Date: Wed, 28 Mar 2012 13:36:08 -0700
Reply-To: Bruce Weaver <bruce.weaver@hotmail.com>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Bruce Weaver <bruce.weaver@hotmail.com>
Subject: Re: Database Management Help
In-Reply-To: <8D94A946E17E9941AE94FC3AE11CAB7E043FBB6CFD@everest.ad.pacificu.edu>
Content-Type: text/plain; charset=us-ascii
From what you've said, I think I'd try this:
1. Get the BMI file, keeping only the variables of interest (via /KEEP).
2. Add data from the Depression file via ADD FILES, and deleting any
variables not of interest. Also use the /IN sub-command (twice) to create
flags indicating cases from the BMI and Depression files.
3. In the resulting file, sort cases by ID and date of contact. (If you use
/BY ID Date on the ADD FILES in step 2, this should be done already.)
4. Use AGGREGATE to get the MAX values of the flag variables created on Step
2 (with ID as the BREAK variable, and writing new variables to the working
file). Keep only those IDs that have data from both files. Assuming the
Step 2 flags are called BMIFlag and DepFlag:
SELECT IF (BMIFlag and DepFlag).
EXE.
6. Assuming it is possible to have BMI and Depression data for the same
date, you need to check for that. You could use AGGREGATE again, breaking
on ID and Date this time, and using the FIRST function to keep the first
value value of the BMI and Depression variables. (I assume the maximum
number of records per IDxDate combination is two.) Note that this time, you
would replace the working file rather than writing new variables to the
working file.
I don't have time to concoct an example and test it right now.
HTH.
Michael, Paul G. wrote
>
> Hi All,
>
> I have two data sets that I would like to merge using ID as the keyed
> variable, and each data file has duplicates IDs. The variables of interest
> in the first dataset are ID, date of contact (DD-MM-YYYY), and Body Mass
> Index (BMI). The variables of interest in the second data set are ID, date
> of contact (DD-MM-YYYY), and depression score. The duplicate IDs occur
> because some subjects had more than 1 BMI score from different time points
> and/or more than 1 depression score at different time points.
>
> The contact dates from each file do not match up in all instances (in fact
> very few subjects have the same contact date in both files). I only want
> to keep subjects who have both BMI data and depression data but I need to
> preserve information from all the contact dates.
>
> When I try a simple merge by adding variables (e.g., depression score and
> contact date) to the BMI data set using ID as the keyed variable, I run
> into the problem of having to go through thousands of cases and delete
> those that don't have both BMI and depression data.
>
> For example I might have one subject whose BMI was collected 10 times on
> different dates and none of these dates match the depression data for this
> subject. So after the merge I have eleven rows for this subject that I
> would like to preserve (all BMI data and depression data). Another subject
> may have BMI scores collected at 7 different dates, but this subject has
> no depression data. I would like to remove all the BMI data for this
> subject since they have no depression data.
>
> Is there a way in which I can merge these files in a different way to get
> what I need or a method to delete cases in the merged data file based on
> duplicate IDs and BMI data but no depression data? Any help is greatly
> appreciated!
>
> Best,
>
> Paul
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
-----
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."
NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Database-Management-Help-tp5601508p5601689.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|