Date: Fri, 20 Jun 2008 11:54:28 -0700
Reply-To: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Subject: Re: GLIMMIX Question - Dependent Observations
Content-Type: text/plain; charset=ISO-8859-1
As usual, thank you.
On Jun 20, 1:27 pm, stringplaye...@yahoo.com (Dale McLerran) wrote:
> --- On Thu, 6/19/08, Ryan <Ryan.Andrew.Bl...@GMAIL.COM> wrote:
> > From: Ryan <Ryan.Andrew.Bl...@GMAIL.COM>
> > Subject: Re:GLIMMIXQuestion- Dependent Observations
> > To: SA...@LISTSERV.UGA.EDU
> > Date: Thursday, June 19, 2008, 7:29 PM
> > Thank you, Dale! You've helped me with so many questions
> > already. I
> > hope it's okay if I ask you two more...
> > 1. The dichotomous variable in my model was collected at the subjects
> > level (not city level), and the categories are not mutually exclusive--
> > there were people who fit into both categories. I'm not sure how to
> > handle this issue--one option I thought was to raise it to the city
> > level, and code the city as a particular category based on the higher
> > rate (by the way, DV (rate) and the continuous IV are functions of
> > data at the city level). So if the rate is higher in category one,
> > then that city is assigned category one. Would that work? Would you
> > recommend an alternative approach that can maintain the variable at
> > the city level?
> > 2. As mentioned above, the DV (rate) and the continuous IV in my model
> > are functions of aggregated data. After you mentioned that a city with
> > less observations would be weighted less, I realized that all cases
> > would actually have equal weights at the city level. Is there a way to
> > deal with unequal Ns per case while maintaining city as the unit of
> > analysis for all variables?
> > Anyway, I realize I've asked much of you. I completely understand if
> > you're too busy to respond. I appreciate your help. It's been a true
> > learning experience!
> > Ryan
> I'm confused now. I don't know how your dependent variable (collected
> at the individual level) can take on two values and those two values
> are not mutually exclusive. It sounds to me as though there are two
> boxes that the respondent can check off, and that there are no
> constraints that if they check box 1 then they cannot check box 2
> (and vice versa).
Yes! The DV is a rate of obtaining a category in the dichotomous IV
(when the dichotomous IV is raised to the city level).
Concretely, Rate = # of people who contracted disease A or B / total
number of people at risk of the respective disease within a city.
The dichotomous IV, which was collected at the subjects level,
reflects two diseases, and people can have one or both--most only have
one. I want to compare the relative risk of contracting disease A to
contracting disease B (Poisson type regression). As a result, I
thought of raising the dichotomous variable, disease type, to the city
level, and if more people have disease A than disease B that city
would be categorized as disease A--bad idea, I know.
If the dichotomous variable were in fact mutually exclusive, this
analysis would be fairly straightforward (after your help with spatial
analysis!) . The primary goal is to run a statistical test comparing
the risk of contracting disease A to the risk of contracting disease
B, after controlling for a continuous variable. The challenege is that
participant A could have contracted both diseases, and when you raise
it to the city level (which you have to do to obtain the rate),
certainly no city has a diagnosis of only one disease.
I know I keep saying this, but just the fact that you've talked
through some of this stuff with me has been invaluable.
> To me, that would represent two (almost certainly correlated) binary
> responses. I would be looking at modeling the binary responses at the
> individual level with the person-specific IV as a predictor. At the
> same time, you can allow for variation across cities in the proportion
> who respond positively. In addition to allowing for the person-specific
> IV to relate directly to the person-specific response, this analysis
> preserves information about differences in number of subjects in
> the different cities. A city with only 10 respondents will have a
> city random effect estimate which has a much larger standard error
> than a city with 1000 respondents.
> If I am correct that there are two check boxes and hence two binary
> responses, then an appropriate model for check box 1 would be
> something like:
> model box1 = x / s dist=binary;
> random intercept / subject=city
> type=sp(pow)(lat long)
I'm not sure if this would answer my question regarding relative risk
of contracting one disease versus another.
> A similar model could be fit for check box 2 as a response. One could
> model check box 1 and check box 2 responses together as correlated
> within individuals. There may be quite a few ways that such an analysis
> could be constructed. It is not clear given the spatial covariance
> structure assumed for the city random effects along with correlated
> responses within individuals just what the appropriate code would be
> for such a model.
Yes. I think this is where I need to be headed.
> Statisticians have the habit of adding confusion to seemingly simple
> problems, don't we? Are you more or less confused than at the start
> of this dialogue?
This model is particularly confusing. Although I haven't finalized the
model, you have certainly moved me along tremendously!
> Dale McLerran
> Fred Hutchinson Cancer Research Center
> mailto: dmclerra@NO_SPAMfhcrc.org
> Ph: (206) 667-2926
> Fax: (206) 667-5977
> ---------------------------------------- Hide quoted text -
> - Show quoted text -