Date: Wed, 1 Sep 2010 18:01:21 -0400
Reply-To: Mike Palij <mp26@nyu.edu>
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Mike Palij <mp26@nyu.edu>
Subject: Re: sorting out a nested data structure
Content-Type: text/plain; charset="iso-8859-1"
If each case represents a kid's data and each kid has a unique
identifier, then you can group all "cases" for a single kid. So,
if "unique_kid_ID" is the name of your variable; a listing of your
data might look like this:
unique_kid_ID (family variables) (agency vars) (funding vars) etc.
100000001 etc
100000001
100000001
100000002
100000002
100000003
100000004
100000004
Above the first four lines correspond to the same kid but for
different agencies, funding sources, etc. With data in this format
you could come up with an aggregated datafile that summarizes
the variables for the "cases" of each kid or possible use
CASESTOVARIABLES to make a single case for each kid.
But you still need a unique identifier for family (i.e., unique_family_ID)
that will allow you to group different kids together. Once you
have a unique family id you might use aggregate to get summary
or use CASESTOVARIABLES to create a single case for each
family that contains all of the variables for all kids in the family that
you have data for (though this makes me think that your dataset
would be "sparse", that is, some families may have many kids
but most many have only one or two which would leave variables
with missing values), Consider
Unique_Family_ID Kid01 (kid #1 vars) Kid02 (kid #2 vars) ... Kid0k (kth Kid's vars)
presumably some of the info you used in creating unique_kid_ID will
allow you to create unique_family_ID and then use aggregate or
create a case structure that has variables representing all kids in
the family (then using compute statements to get the summary info
you want).
HTH
-Mike Palij
New York University
mp26@nyu.edu
----- Original Message -----
From: "Gene Maguin" <emaguin@buffalo.edu>
To: <SPSSX-L@LISTSERV.UGA.EDU>
Sent: Wednesday, September 01, 2010 4:48 PM
Subject: Re: sorting out a nested data structure
> -----Original Message-----
> From: Terry Westover [mailto:tnwestover@ucdavis.edu]
> Sent: Wednesday, September 01, 2010 4:34 PM
> To: 'Gene Maguin'
> Subject: RE: sorting out a nested data structure
>
> Yes, that's basically the situation. The data look like this (each case is
> an individual child):
>
> Family ID (different ones for different agencies) Kid ID (diff for diff
> agencies) Agency ID Funding Program ID
>
> We need to get to something like your structure below where we assign a
> "true" unique identifier for both the child and family. Our unit of
> analysis is family.
>
> We have developed a variable like your "identifying datastring" by combining
> name, birthdate, and zipcode. So we can identify multiple entries for the
> same child. Where we're a little "stuck" is in figuring out how to use that
> variable to modify the data structure so that we capture all the services
> information per child and then group all children into the proper "true" or
> unique families and assign a family unique identifier.
>
> We need to be able to capture what services each family uses for all of
> their children and the total out of pocket costs the family pays for
> childcare.
>
> Thanks much for your help
> Terry
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
> Gene Maguin
> Sent: Wednesday, September 01, 2010 1:10 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: Re: sorting out a nested data structure
>
> Terry,
>
> I'd like to make sure that I understand your data structure (and it might
> help to post some sample data with the relevant sections shown). I
> understand your data this way.
>
>
> AgencyID FamID KidID TrueFamID TrueKidID IdentifyingDataString
> A123 1002 B234 100876 2
> WW89 A003 QWE2 101023 1
> A123 3421 A346 120945 3
> UA90 1002 RTYQ 100876 2
> A123 3421 A389 120945 2
>
> One family can have one kid at one agency.
> Same family can have the same kid at two (or more) different agencies.
> Same family can have two (or more) different kids at same agencies.
>
> Does this cover the possibilities.
>
> So you make up what I'm calling the IdentifyingDataString that is composed
> of, for instance,
> DOB, gender, kid birth certificate first name, etc. Exactly what doesn't
> matter per se but just that you have something that is unique to each kid.
>
> Is all this correct?
>
> Gene Maguin
>
>
>
>
> ________________________________
>
> From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of
> Terry Westover
> Sent: Wednesday, September 01, 2010 2:17 PM
> To: SPSSX-L@LISTSERV.UGA.EDU
> Subject: sorting out a nested data structure
>
> Hello SPSS users -
>
> We are working on a statewide study of subsidized childcare. We have a
> large data set with a nested structure - children nested within families.
> Each row (record) is an individual child. Each record contains a family ID
> and a variable identifying which agency is providing care for that
> individual child - each childcare agency assigns each family a unique (to
> that agency) identifier but there is no statewide system of issuing a unique
> family identifier. So, families may have children receiving services from
> more than one agency and thus have multiple family IDs and we need to be
> able to identify all programs/agencies that are serving each family. The
> family is our unit of analysis.
>
> We can, using concatenation and duplicate functions, identify duplicate
> children across agencies (e.g. with different family ids) - this unique
> child id is a string variable - but we still have a few problems to solve
> that I hope you can help with:
> 1. Because the dataset is so large, manually combing thru the
> duplicates to assign our own unique family or child identifiers is not
> practical.
> 2. Is there an "assign" function that will automate assigning unique
> ids to children using the string variable we have constructed?
> 3. Once we figure out how to assign unique child identifiers we are
> still faced with the problem of finding some automated way of grouping all
> the children within families so each family has a unique identifier,
> regardless of how many individual agencies/programs are providing services
> to the children w/i that family. Since the family is our unit of analysis
> this is critical. Any suggestions?
>
> I realize that most of the queries that come across this listserv are much
> more sophisticated and apologize for asking what is likely a simple question
> but it's one I can't seem to get my head around.
>
> Thanks for any assistance you can provide,
>
> Theresa (Terry) Westover, Director
> Center for Education and Evaluation Services
> CRESS, School of Education
> 1 Shields Ave.
> University of California, Davis 95616-8729
> 530-754-9523 office
> 530-752-6135 fax
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|