Date: Fri, 25 Jan 2002 13:09:07 -0300
Reply-To: hmaletta@fibertel.com.ar
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@fibertel.com.ar>
Subject: Re: multiple record matching
Content-Type: text/plain; charset=us-ascii
Kathy:
Matching many to many requires defining what you exacty want. Let us
think of an example that I actually encountered not long ago. In an
agricultural survey there were a number of variables concerning the
farmers' families (one record per person), some variables concerning the
farm and the household (one record per farm/household; each household
corresponded to exactly one farm), and some variables referred to
specific crops (one record per crop).
In this kind of situation I wanted to do the following:
1. Describe each person according to farm/household characteristics
(farm size, socioeconomic status, family size, type of housing, etc).
2. Describe farm/households according to personal characteristics of the
family members (sex, age, education, etc).
3. Describe farm/households by variables concerning crops grown and
their characteristics (area planted, yield, etc), and concerning
household characteristics.
4. Describe persons according to characteristics of the crops grown in
their farms.
The procedures are as follows:
1. MATCH FILES /FILE 'PERSONS.SAV'/TABLE 'FARM_HOUSEHOLDS.SAV'/BY ID.
2. Open PERSONS.SAV. Use AGGREGATE /BY PERSON to create household-level
variables based on personal characteristics (e.g.: male members, female
members, number of people with higher education, number of children
under 18, etc.). Then MATCH the resulting file with FARMS_HOUSEHOLDS.SAV
based on household ID.
3. Open CROPS.SAV and use AGGREGATE to create variables at
farm/household level based on crops characteristics, e.g. number of
crops, total value of production, total area under winter crops, total
area under cereals, etc.
Then MATCH the resulting file with FARMS_HOUSEHOLDS.SAV based on ID.
4. Use FARMS_HOUSEHOLDS.SAV as modified at steps 2 and 3 above, and
apply MATCH FILES /FILE 'PERSONS.SAV'/TABLE 'FARMS_HOUSEHOLDS.SAV'/BY
ID.
Directly matching "many to many" (e.g. crops to people) could be done in
a different manner. Suppose farms grow no more than, say, three crops
each. Crops grown in all farms may be many, but no more than three per
farm. Create one file for first crop named, one file for second crop,
and one file for third crop. Suppose there is a variable CROPNUM in the
CROPS.SAV file valued 1, 2, or 3 for first, second or third crop in the
farm.- Then:
GET FILE 'CROPS.SAV'.
TEMPORARY.
SELECT IF (CROPNUM=1).
SAVE OUTFILE 'CROP_1.SAV'.
USE ALL.
TEMPORARY.
SELECT IF (CROPNUM=2).
SAVE OUTFILE 'CROP_2.SAV'.
USE ALL.
TEMPORARY.
SELECT IF (CROPNUM=3).
SAVE OUTFILE 'CROP_3.SAV'.
MATCH FILES /FILE 'FARMS_HOUSEHOLDS.SAV'/FILE 'CROP_1.SAV'/
FILE 'CROP_2.SAV'/FILE 'CROP_3.SAV'/BY ID.
SAVE OUTFILE 'FARMS_HOUSEHOLDS_CROPS.SAV'.
MATCH FILES /FILE 'PERSONS.SAV'/TABLE 'FARMS_HOUSEHOLDS_CROPS.SAV'/by
ID.
SAVE OUTFILE 'PERSONS_CROPS.SAV'.
The resulting file contains one record per person, with personal
variables, farm/household variables, and variables for up to three
crops.
There are actually other, more elegant ways to add the three crops'
variables to the FARMS_HOUSEHOLDS file; I have used the simplest for the
sake of clarity.
Hector Maletta
Universidad del Salvador
Buenos Aires, Argentina
kmcdonald@dmhmrsas.state.va.us wrote:
>
> How do you deal with both files having more than one record?
>
> -----Original Message-----
> From: Hector Maletta [mailto:hmaletta@fibertel.com.ar]
> Sent: Thursday, January 24, 2002 3:05 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Re: aggregate
>
> Jessica:
> Use the MATCH FILES commando designating the non-duplicated file as a
> TABLE, as follows:
> MATCH FILES /TABLE 'FILE 1'/FILE 'FILE 2' /BY id.
>
> ID is the name I assign to the matching variable. The resulting file
> will have one record per case existing in File 2, and each record will
> have all variables from File 1 (repeated for all the duplicated cases
> with the same ID) plus all variables from File 2 (except those that bear
> the same name as in File 1, in which case SPSS prefers taking the values
> from the file that is mentioned in the command. You can reverse the
> order of the files in the command (always taking FILE 1 as a TABLE), if
> you wish that the version in File 2 is preserved in the case of
> variables existing in both files.
>
> Hector Maletta
> Universidad del Salvador
> Buenos Aires, Argentina
>
> Wozniak wrote:
> >
> > I have two files that I am trying to match up on a certain variable. One
> file has no duplicates for that variable, the other has more than one
> instance of the same variable.
> > For example: file one has:
> > 1
> > 2
> > 3
> > 4
> > 5
> >
> > file two looks like this for that variable
> > 1
> > 1
> > 2
> > 3
> > 3
> > 3
> > 4
> > 5
> > 5
> > 5
> > I want to match them up so that the numbers look like this
> >
> > from File one from File two
> >
> > 1 1
> > 1 1
> > 2 2
> > 3 3
> > 3 3
> > 3 3
> > 4 4
> > 5 5
> > 5 5
> > 5 5
> > any suggestions would be greatly appreciated. thanks
> > Jessica
|