Date: Sun, 5 Jul 1998 16:18:23 MET
Reply-To: "M. MILLS" <m.mills@FRW.RUG.NL>
Sender: "SPSSX(r) Discussion" <SPSSX-L@UGA.CC.UGA.EDU>
From: "M. MILLS" <m.mills@FRW.RUG.NL>
Subject: Linking files revisited
Hi once again,
I would like to restate a question regarding linking files, this time
in a less narrative manner! Sorry for the confusion.
I have two files, household and individual. I will explain what they
look like first and then what I wish to do with them.
1. HOUSEHOLD DATA FILE
9 digit hhld id
aaabbbccc (id)..hhwater(etc)..hlin$01..hrel$01(etc)..hlin$02...hrel$02
aaabbbddd(id)..hhwater(etc)..hlin$01..hrel$01(etc)..hlin$02...hrel$02
eeefffggg (id)..hhwater(etc)..hlin$01..hrel$01(etc)..hlin$02...hrel$02
eeefffhhh (id)..hhwater(etc)..hlin$01..hrel$01(etc)..hlin$02...hrel$02
The first variable is a 9 digit household identification (these three variables
are also available separately in both of the data files as hhstate=aaa,
hhtown=bbb, hhnumber=ccc). This id is followed by a number of
household-specific variables (heating, water source, etc.). Then the file
has information divided by each individual member, up to 38
household members.
NOTE: The household file was used as a 'filter' to get basic
household information and then to isolate only ever-married women between
reproductive ages. Therefore, not all individuals proceed to the
individual file (e.g., not men, older and younger women), and there may be
multiple women from one household.
2. INDIVIDUAL DATA FILE
11 digit case id
aaabbbcccxx (id)...state...town...number...line...education...work etc.
aaabbbdddzz (id)...state...town...number...line...education...work etc.
eeefffgggyy (id)...state...town...number...line...education...work etc.
eeefffhhhww (id)...state...town...number...line...education...work etc.
In this file, the individual case identification has an *additional*
2-digit variable of 'line number' (in other words the raw text data was
probably entered as hhld being the first line and each individual sequentially
thereafter). ***HERE IS THE CLINCHER....
For example, if in the Individual Data File 'line=03 (which is also
the last 2 digits in the 11 digit case id#)', then all
of the information for this 3rd person in the household is under the
variables, 'hhlin$03, hhrel$03', etc. in the Household Data File.
Once again, each variable from the case id is available
separately in the individual data file under slightly different names
(state=aaa, town=bbb, number=ccc, line=xx).
WHAT I WANT TO DO.
What I want to do is to link all of this 'basic' (e.g., water,
heating) AND 'individual-specific' household data (e.g., relation to
head of household, etc.), with each individual woman in the individual
data file.
Therefore, for the first woman in 'state=aaa /town=bbb /household=ccc/line=xx
I want to link her to her household information of 'state=aaa/
town=bbb, /household=ccc and the: a) basic household variables listed
only once (hhwater, etc.); and, b) the individual-specific household variables of,
'hhline$xx, hhrel$xx, etc.'
This is both needed for information, but also to de-select multiple
women from one household during regression in order not to violate the
assumption of the independence of observations (but, that's another
topic!). I have all of the pieces of the puzzle, but can't seem to
put it together!
I hope this is clearer - any suggestions are welcome.
Thanks for taking the time to read my question.
Melinda
|