Date: Wed, 5 Mar 2008 11:43:23 -0500
Reply-To: Robert Feyerharm <robertf@HEALTH.OK.GOV>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Robert Feyerharm <robertf@HEALTH.OK.GOV>
Subject: interpolating cumulative probabilities to categorical data
One of the public health surveys which I'm currently analyzing collects
information about the respondents household income. The respondent is
asked which income category her household income falls in:
Less than $10,000,
$10,000 to $14,999,
$15,000 to $19,999,
$20,000 to $24,999,
$25,000 to $34,999,
$35,000 to $49,999, or
$50,000 or more.
I've been asked to calculate the % of mothers from various household sizes
who fall below 185% of the Federal Poverty Level (FPL) for 2006. Of course
the FPL incomes don't match the income categories in my public health
survey. What are the best techniques for interpolating %s between income
categories?
I've thought of two approaches to this problem:
1) Calculate a simple linear interpolation without resorting to
statistical models. For example, if 50% of single moms have incomes
<$10,000, and 70% have incomes <$15,000 according to the survey data, then
calculate the % of mothers with incomes <$12,500 by finding 50%+(70%-50%)*
($12,500-$10,000)/($15,000-$10,000)=60%. This method can be applied all
income categories except $50,000+.
2) Perform a logarithmic transformation y = -log(1 - F(p)) on the
cumulative probabilities for each income category in order to fit a least
squares regression line to the 6 data points. Then interpolate values of F
(p) for any value of x=income by solving the regression formula y = a +
b*x and then "detransforming" to find F(p)=1-10^-y. For large sample I've
achieved a good fit, indicating that income distribution can be described
by an exponential density function. The advantage of this method over
method 1 is that I can obtain Mean 95% CIs to estimate the precision of my
estimate. The drawback is that for small samples (n<30) the model doesn't
achieve a significant fit.
I welcome any comments/suggestions for tackling this problem.
Thanks!
Robert
|