LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (December 2008, week 3)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Fri, 19 Dec 2008 17:05:49 -0600
Reply-To:     Mary <mlhoward@avalon.net>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Mary <mlhoward@AVALON.NET>
Subject:      Re: Aggregate and individual-level data analysis
Comments: To: Rieza Soelaeman <rsoelaeman@GMAIL.COM>
Content-Type: text/plain; charset="iso-8859-1"

Actually, this is a bit better.

First what you'd need is the population of the zip codes that are in your disease surveillance dataset. If it isn't too many, then try http://www.city-data.com/ and it will give you the population, or there are other databases that you can obtain this from (I would think that they would originally come from the U.S. Gov. Census).

Then you've got this disease surveillance dataset, which is by individual (or could be even lower data, such as individual claims). If it is lower aggregate data than the individual level, I think you would want to aggregate it to one record per individual with that person's zipcode.

Then once you get it to that point you could count the number of ID's within zip code:

Now to get everybody that doesn't have the disease. Let's start with the population counts:

Then we can get rates per population.

Now once we've got that, we can turn to the other data set, and merge in your counts of child abuse.

Then merge that in with your population data on the diseases.

Then you'll have a table with the disease counts and abuse counts, along with their rates, in one table. Perhaps you can do a correlation on the two rates or other analysis at that point.

data detail; infile cards missover; input id zip_code; cards; 001 52402 001 52402 002 52402 002 52242 ; run;

proc sql; create table individual_level as select distinct id, zip_code from detail order by zip_code, id; quit;

proc sql; create table disease_counts as select '1' as disease_status, zip_code, count(id) as disease_count from individual_level group by zip_code order by zip_code; quit;

data pop_counts; infile cards missover; input zip_code population; cards; 52402 39912 52244 42212 52241 50000 52242 49000 ; run;

proc sort data=disease_counts; by zip_code; run;

proc sort data=pop_counts; by zip_code; run;

proc sql; create table disease_rates as select pop_counts.zip_code, pop_counts.population, disease_counts.disease_count, case when disease_counts.disease_count=. then 0 else disease_count/(population/10000) end as disease_rate format 7.2 from pop_counts left outer join disease_counts on pop_counts.zip_code=disease_counts.zip_code; quit;

data child_abuse_counts; infile cards missover; input zip_code abuse_cases; cards; 52402 1 52244 0 52241 2 52242 1 ; run;

proc sql; create table disease_rates_with_child as select disease_rates.zip_code, disease_rates.population, disease_rates.disease_count, child_abuse_counts.abuse_cases, disease_rates.disease_rate, case when child_abuse_counts.abuse_cases=. then 0 else child_abuse_counts.abuse_cases/(population/10000) end as abuse_rate format 7.2 from disease_rates left outer join child_abuse_counts on disease_rates.zip_code=child_abuse_counts.zip_code; quit;

proc corr data=disease_rates_with_child; var disease_rate abuse_rate; run;

-Mary

----- Original Message ----- From: Rieza Soelaeman To: SAS-L@LISTSERV.UGA.EDU Sent: Friday, December 19, 2008 3:44 PM Subject: Re: Aggregate and individual-level data analysis

Ok, so that was a silly little example. I apologize that it's not as good as my other silly little examples :(. I guess I just wanted to create an example that is similar enough to my situation, which involves confidential data. So I have this disease surveillance dataset, in which everyone in this dataset has the disease (this is what Dataset1 was alluding to). A colleague has asked me to run some analyses to see if there is a correlation between the prevalence of the disease with child abuse. Now the data on child abuse she gave me was Child Protective Services (CPS) data, which is aggregate data. It's a 1-pager Excel spreadsheet with the zipcodes as rows and as columns, the type of CPS intervention (5 categories, this is where I came up with the idea of BMI classification). Essentially, it's a proc freq of zipcode*(category of CPS intervention). That is all I have to work with. Like I said, I do have individual level data from the disease surveillance system, I just have very limited data to compare it against.

I hope this clarifies things a bit. I've forgotten everything I learned about aggregate data analysis since I left school 6 months ago :(.

Thanks guys!

On Fri, Dec 19, 2008 at 4:13 PM, Mary <mlhoward@avalon.net> wrote:

> Yes, Peter is right, you would need to have information on the people > at the football game who did not buy food (it is only food at the football > game, Peter, not food altogether--presumably this does not include > the food at tailgating and so forth before or after the game). You'd > have to have the zip codes of the people who didn't buy food in > order to compare those who bought food versus not. > > -Mary > ----- Original Message ----- > From: Peter Flom > To: SAS-L@LISTSERV.UGA.EDU > Sent: Friday, December 19, 2008 2:52 PM > Subject: Re: Aggregate and individual-level data analysis > > > Rieza wrote > > > Yet another question from me. > > > > Suppose I have 2 datasets: > > > > 1. Dataset1--Contains individual-level data on who bought food at a > > concession stand during a football game > > 2. Dataset2--Contains aggregate data on prevalence of obesity (bmi > >=30) > > and overweight (bmi >=25) by zip code > > > > > > Dataset1 looks roughly like this: > > name zip code John 78530 Jane 78531 Angie 78532 Eileen 78530 Tim > 78530 > > Bob 78532 > > > > et cetera...let's say there are 3000 people in this dataset, all of > these > > people bought food. > > > > > > Dataset2 looks roughly like this: > > zip code overwt obese 78530 500 200 78531 600 500 78532 100 50 > > > > Supposing I wanted to know if there was a correlation between buying > food > > and obesity, what procedure can I run? Notice that overweight and > obese > > are BMI classifications, so really, Dataset2 represents data from 1950 > > respondents. I get a feeling that I need to disaggregate Dataset2, > because > > I was kicking myself in the head when I tried to turn Dataset1 into an > > aggregate dataset, and finding it impossible (and stupid) to try to > plot the > > data... > > > > As always, I welcome and appreciate any suggestions on how to tackle > this. > > > I don't think there is much you can do here .... because everyone in > dataset 1 bought food. > > But maybe .... > > Are the people in data set 1 a random sample from each ZIP code? > Do you know the total population of each ZIP code? > Do you know what percentage of people buy food? (I'm thinking it's close > to everyone, one way or another .... How does someone never buy food?) > > Peter > > Peter L. Flom, PhD > Statistical Consultant > www DOT peterflom DOT com >

-- Rieza H Soelaeman, MPH


Back to: Top of message | Previous page | Main SAS-L page