LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (March 2008, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Wed, 5 Mar 2008 11:43:23 -0500
Reply-To:     Robert Feyerharm <robertf@HEALTH.OK.GOV>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         Robert Feyerharm <robertf@HEALTH.OK.GOV>
Subject:      interpolating cumulative probabilities to categorical data

One of the public health surveys which I'm currently analyzing collects information about the respondents household income. The respondent is asked which income category her household income falls in:

Less than $10,000, $10,000 to $14,999, $15,000 to $19,999, $20,000 to $24,999, $25,000 to $34,999, $35,000 to $49,999, or $50,000 or more.

I've been asked to calculate the % of mothers from various household sizes who fall below 185% of the Federal Poverty Level (FPL) for 2006. Of course the FPL incomes don't match the income categories in my public health survey. What are the best techniques for interpolating %s between income categories?

I've thought of two approaches to this problem:

1) Calculate a simple linear interpolation without resorting to statistical models. For example, if 50% of single moms have incomes <$10,000, and 70% have incomes <$15,000 according to the survey data, then calculate the % of mothers with incomes <$12,500 by finding 50%+(70%-50%)* ($12,500-$10,000)/($15,000-$10,000)=60%. This method can be applied all income categories except $50,000+.

2) Perform a logarithmic transformation y = -log(1 - F(p)) on the cumulative probabilities for each income category in order to fit a least squares regression line to the 6 data points. Then interpolate values of F (p) for any value of x=income by solving the regression formula y = a + b*x and then "detransforming" to find F(p)=1-10^-y. For large sample I've achieved a good fit, indicating that income distribution can be described by an exponential density function. The advantage of this method over method 1 is that I can obtain Mean 95% CIs to estimate the precision of my estimate. The drawback is that for small samples (n<30) the model doesn't achieve a significant fit.

I welcome any comments/suggestions for tackling this problem.



Back to: Top of message | Previous page | Main SAS-L page