Date: Sat, 21 Jun 2008 13:05:17 -0700
Reply-To: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Ryan <Ryan.Andrew.Black@GMAIL.COM>
Organization: http://groups.google.com
Subject: Re: GLIMMIX Question - Dependent Observations
Content-Type: text/plain; charset=ISO-8859-1
On Jun 20, 2:54 pm, Ryan <Ryan.Andrew.Bl...@gmail.com> wrote:
> As usual, thank you.
>
> On Jun 20, 1:27 pm, stringplaye...@yahoo.com (Dale McLerran) wrote:
>
>
>
>
>
> > --- On Thu, 6/19/08, Ryan <Ryan.Andrew.Bl...@GMAIL.COM> wrote:
>
> > > From: Ryan <Ryan.Andrew.Bl...@GMAIL.COM>
> > > Subject: Re:GLIMMIXQuestion- Dependent Observations
> > > To: SA...@LISTSERV.UGA.EDU
> > > Date: Thursday, June 19, 2008, 7:29 PM
> > > Thank you, Dale! You've helped me with so many questions
> > > already. I
> > > hope it's okay if I ask you two more...
>
> > > 1. The dichotomous variable in my model was collected at the subjects
> > > level (not city level), and the categories are not mutually exclusive--
> > > there were people who fit into both categories. I'm not sure how to
> > > handle this issue--one option I thought was to raise it to the city
> > > level, and code the city as a particular category based on the higher
> > > rate (by the way, DV (rate) and the continuous IV are functions of
> > > data at the city level). So if the rate is higher in category one,
> > > then that city is assigned category one. Would that work? Would you
> > > recommend an alternative approach that can maintain the variable at
> > > the city level?
>
> > > 2. As mentioned above, the DV (rate) and the continuous IV in my model
> > > are functions of aggregated data. After you mentioned that a city with
> > > less observations would be weighted less, I realized that all cases
> > > would actually have equal weights at the city level. Is there a way to
> > > deal with unequal Ns per case while maintaining city as the unit of
> > > analysis for all variables?
>
> > > Anyway, I realize I've asked much of you. I completely understand if
> > > you're too busy to respond. I appreciate your help. It's been a true
> > > learning experience!
>
> > > Ryan
>
> > Ryan,
>
> > I'm confused now. I don't know how your dependent variable (collected
> > at the individual level) can take on two values and those two values
> > are not mutually exclusive. It sounds to me as though there are two
> > boxes that the respondent can check off, and that there are no
> > constraints that if they check box 1 then they cannot check box 2
> > (and vice versa).
>
> Yes! The DV is a rate of obtaining a category in the dichotomous IV
> (when the dichotomous IV is raised to the city level).
>
> Concretely, Rate = # of people who contracted disease A or B / total
> number of people at risk of the respective disease within a city.
>
> The dichotomous IV, which was collected at the subjects level,
> reflects two diseases, and people can have one or both--most only have
> one. I want to compare the relative risk of contracting disease A to
> contracting disease B (Poisson type regression). As a result, I
> thought of raising the dichotomous variable, disease type, to the city
> level, and if more people have disease A than disease B that city
> would be categorized as disease A--bad idea, I know.
>
> If the dichotomous variable were in fact mutually exclusive, this
> analysis would be fairly straightforward (after your help with spatial
> analysis!) . The primary goal is to run a statistical test comparing
> the risk of contracting disease A to the risk of contracting disease
> B, after controlling for a continuous variable. The challenege is that
> participant A could have contracted both diseases, and when you raise
> it to the city level (which you have to do to obtain the rate),
> certainly no city has a diagnosis of only one disease.
>
> I know I keep saying this, but just the fact that you've talked
> through some of this stuff with me has been invaluable.
>
>
>
>
>
>
>
> > To me, that would represent two (almost certainly correlated) binary
> > responses. I would be looking at modeling the binary responses at the
> > individual level with the person-specific IV as a predictor. At the
> > same time, you can allow for variation across cities in the proportion
> > who respond positively. In addition to allowing for the person-specific
> > IV to relate directly to the person-specific response, this analysis
> > preserves information about differences in number of subjects in
> > the different cities. A city with only 10 respondents will have a
> > city random effect estimate which has a much larger standard error
> > than a city with 1000 respondents.
>
> > If I am correct that there are two check boxes and hence two binary
> > responses, then an appropriate model for check box 1 would be
> > something like:
>
> > procglimmixdata=muydata;
> > model box1 = x / s dist=binary;
> > random intercept / subject=city
> > type=sp(pow)(lat long)
> > group=region;
> > run;
>
> I'm not sure if this would answer myquestionregarding relative risk
> of contracting one disease versus another.
>
>
>
> > A similar model could be fit for check box 2 as a response. One could
> > model check box 1 and check box 2 responses together as correlated
> > within individuals. There may be quite a few ways that such an analysis
> > could be constructed. It is not clear given the spatial covariance
> > structure assumed for the city random effects along with correlated
> > responses within individuals just what the appropriate code would be
> > for such a model.
>
> Yes. I think this is where I need to be headed.
>
>
>
> > Statisticians have the habit of adding confusion to seemingly simple
> > problems, don't we? Are you more or less confused than at the start
> > of this dialogue?
>
> This model is particularly confusing. Although I haven't finalized the
> model, you have certainly moved me along tremendously!
>
>
>
>
>
> > Dale
>
> > ---------------------------------------
> > Dale McLerran
> > Fred Hutchinson Cancer Research Center
> > mailto: dmclerra@NO_SPAMfhcrc.org
> > Ph: (20...
> > Fax: (206) 667-5977
> > ---------------------------------------- Hide quoted text -
>
> > - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -
Sorry the for the double post, but I think I've solved the problem (at
least one way), and wanted to share it.
I will run GLIMMIX with a repeated measures factor (disease) and a
covariate, and every variable in the model will be at the city level.
Each city will have the rate for disease A and the rate for disease
B.
Dataset
City Disease Rate Covariate Lat Long
1 1 .04 23
1 2 .07 34
2 1 .45 23
2 2 .01 22
3 1 .02 45
3 2 .36 11
.
.
.
where,
-->"1" reflects disease A and "2" reflects disease B under "Disease"
-->values under "Rate" reflect the rate for that disease for that
particular city, which is the DV
-->values under "Covariate" are continuous and will adjust for disease
rate per city
-->values under lat and long will be in degrees and will be based on
the centroid of each city
****I'll also include the covariance matrix that can deal with
correlations among cities...
type=sp(pow)(lat long)
If the disease effect is significant, this will answer my research
question of whether or not there is a significant difference in rates
between disease A and B, after controlling for the covariate.
I realize cities are being weighted equally in this model. At some
point, I may consider an adjustment based on the number of
observations per city.
I'm not sure how the syntax will look, but I'll get to research my
books/online guides on Monday.
Thanks again to everyone, and particularly Dale, for guidance.
Ryan
|