Date: Fri, 19 Dec 2008 14:41:58 -0600
Reply-To: Mary <mlhoward@avalon.net>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Mary <mlhoward@AVALON.NET>
Subject: Re: Aggregate and individual-level data analysis
Content-Type: text/plain; charset="iso-8859-1"
I'm not an expert, but I'll give this a college try...
I don' t think you can say that people who buy food are obese, because you don't
know the actual weights of those who went to the football game.
I suppose you might hypothesis whether the proportion of obesity in someone's
zip code affects whether people from that zip code buy food at the football game.
Do all zip codes have the same populations? I would guess that they do not,
so if they don't then convert your aggregate files into rates (such as .10 percent
obese of total population in zip code).
Then you could join dataset 1 and 2:
proc sql;
create table newtable as
table1.*
table2.*
from table1
left outer join table2
on table1.zipcode=table2.zipcode;
quit;
Then you could do a logistic regression to predict whether people bought food
or not:
proc logistic data=newtable;
model bought_food(DESC)= overweight_rate_in_zip_code;
run;
But note, even if you are doing this, you are not predicting whether
overweight *people* buy food, only whether the rate of obesity
in the area people live in affects whether they buy food.
-Mary
----- Original Message -----
From: Rieza Soelaeman
To: SAS-L@LISTSERV.UGA.EDU
Sent: Friday, December 19, 2008 1:12 PM
Subject: Aggregate and individual-level data analysis
Dear SAS-Lers,
Yet another question from me.
Suppose I have 2 datasets:
1. Dataset1--Contains individual-level data on who bought food at a
concession stand during a football game
2. Dataset2--Contains aggregate data on prevalence of obesity (bmi >=30)
and overweight (bmi >=25) by zip code
Dataset1 looks roughly like this:
name zip code John 78530 Jane 78531 Angie 78532 Eileen 78530 Tim 78530
Bob 78532
et cetera...let's say there are 3000 people in this dataset, all of these
people bought food.
Dataset2 looks roughly like this:
zip code overwt obese 78530 500 200 78531 600 500 78532 100 50
Supposing I wanted to know if there was a correlation between buying food
and obesity, what procedure can I run? Notice that overweight and obese
are BMI classifications, so really, Dataset2 represents data from 1950
respondents. I get a feeling that I need to disaggregate Dataset2, because
I was kicking myself in the head when I tried to turn Dataset1 into an
aggregate dataset, and finding it impossible (and stupid) to try to plot the
data...
As always, I welcome and appreciate any suggestions on how to tackle this.
--
Rieza H Soelaeman, MPH