LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (August 2008, week 4)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:         Thu, 28 Aug 2008 13:51:50 -0400
Reply-To:     sudip chatterjee <sudip.memphis@GMAIL.COM>
Sender:       "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:         sudip chatterjee <sudip.memphis@GMAIL.COM>
Subject:      Re: Trend Test
Comments: To: stringplayer_2@yahoo.com
In-Reply-To:  <752779.69335.qm@web32204.mail.mud.yahoo.com>
Content-Type: text/plain; charset=ISO-8859-1

Dale thank you.

Let me start " Bottom Up "

As you mentioned

" Alternatively, since the infection counts are large and since the Poisson converges to a normal distribution for large expectation, you could compute a variable RATE = COUNT / (TARGET_POPULATION) and then use RATE as the response variable. There would be no need for the offset parameter and the distribution of the response would be assumed normal. You can plot these rates by year for each city and your audience will immediately see a trend in rates as well as city to city differences in rates."

I did this initially, I have RATE = (Count/city_pop) *1000

Now the problem with counts are that in some year in some city either thay have not reported or may be they are in reality very low/high cases . The city population in all cities has increased but slightly whereas the disease counts in some cities either increased 5 times or have decreased 5 times ( the reason for this are innumerable )

So I thought let me check with the real report rather than creating a Rate variable which I think somehow not capturing the right process hence I decided to go for this model

" Apart from differences in population, are there differences between cities in the infection rates? And if you believe that the answer to that question is yes, then as a follow-up, I would ask whether you are interested in trend in infection status only in these 5 cities, or do these cities represent a larger population of cities that you would want your inferences extended to? If there are (or could be) differences in infection rates across cities, then you need to include city in your model. Assuming that these 5 cities are not the only cities of interest and that you want inference about trend to extend to other cities, then city should enter the model as a random effect. If you are only interested in the 5 cities that appear in your data, then city should enter the model as a fixed effect."

Yes there are sometime huge differences in both counts and rates between cities and sometime within cities. I have more than 50 cities in the dataset.

" Do the city populations increase over the 21 years of data collection? If the populations increase over this time frame, then wouldn't the counts increase just because of the population increase? OK, that is two questions right there, but they are really part and parcel of the same fundamental problem that interest probably lies not in the absolute counts (which should increase over time), but in the rate of occurrence. That is, the fundamental question is probably not whether there is an increase in the number of infections over that time frame, but rather whether the infection rate has increased."

Yes the city population increased but in a very decent manner or slightly. The problem is the disease count in some year in some cities has increased or decreased a lot.

My aim is to figure out the cities which has either increase or decrease in disease count. The only information I have in city level is population.

I forgot to put the offset in my model but you showed me a new way of population and year interaction as offset (thank you !).

The problem is with the disease report, so I think that creating Rate variable wont capture the actual process going on, but if I do the test on the actual counts may be it will figure out the actual process.

Do need feedback on this thought as well.

Regards

On Thu, Aug 28, 2008 at 1:26 PM, Dale McLerran <stringplayer_2@yahoo.com> wrote: > --- On Thu, 8/28/08, sudip chatterjee <sudip.memphis@GMAIL.COM> wrote: > >> From: sudip chatterjee <sudip.memphis@GMAIL.COM> >> Subject: Trend Test >> To: SAS-L@LISTSERV.UGA.EDU >> Date: Thursday, August 28, 2008, 8:59 AM >> Dear All, >> >> In my dataset I have the counts or number of people infected with >> disease A (only A). I have information of cities from year 1980 - >> 2000. What I want is to do a simple trend test : My concern is that >> can I do this test with counts ? >> >> My data looks like >> >> City year People_infectd >> A 1980 120 >> A 1981 122 >> A 1982 133 >> ... >> .... >> A 2000 500 >> .... >> >> E 1981 250 >> ... >> .... >> .... >> >> E 2000 700 >> >> >> Should I use GLIMMIX for this kind of analysis ? >> >> >> Like : >> >> Proc Glimmix data = mydata ; >> nloptions tech = nrridg ; >> class city ; >> people_infectd = year / link = log dist = poisson solution ; >> random _residual_ / subject = city type = ar(1); >> run ; >> >> Need some feedback ??? >> >> Regards > > Sudip, > > Yes, the GLIMMIX procedure is appropriate for this analysis. But > I am not sure that your code is appropriate. Let me ask you a > couple of questions that might shed light on the appropriateness > of the code you present. > > Question 1: Do the city populations increase over the 21 years of > data collection? If the populations increase over this time frame, > then wouldn't the counts increase just because of the population > increase? OK, that is two questions right there, but they are > really part and parcel of the same fundamental problem that interest > probably lies not in the absolute counts (which should increase over > time), but in the rate of occurrence. That is, the fundamental > question is probably not whether there is an increase in the number > of infections over that time frame, but rather whether the infection > rate has increased. > > In order to address whether the infection rate has increased, you > need to include log(city population in year i) as an offset parameter > in your model. Actually, if the infections are expected in just a > particular population demographic, then you would ideally use the > population of that demographic in each city in each year. Do you > have or can you get such information? > > Question 2: Apart from differences in population, are there > differences between cities in the infection rates? And if you > believe that the answer to that question is yes, then as a follow-up, > I would ask whether you are interested in trend in infection status > only in these 5 cities, or do these cities represent a larger > population of cities that you would want your inferences extended > to? (OK, now I am up to four questions, but who's counting except > me? Whoops, 5 questions!) > > If there are (or could be) differences in infection rates across > cities, then you need to include city in your model. Assuming that > these 5 cities are not the only cities of interest and that you want > inference about trend to extend to other cities, then city should > enter the model as a random effect. If you are only interested in > the 5 cities that appear in your data, then city should enter the > model as a fixed effect. > > Assuming that these cities are only representative of some larger > population of cities and that city population (for some target > demographic) is available in each year, then appropriate code > would be: > > Proc Glimmix data=mydata ; > nloptions tech=nrridg ; > class city ; > model people_infectd = year / offset=log_pop_cityXyear > dist=poisson > solution ; > random intercept / subject=city; > random _residual_ / subject=city type=ar(1); > run ; > > > where log_pop_cityXyear=log(target_population) in a particular city > by year combination is computed in a data step prior to invoking > the GLIMMIX procedure. > > Alternatively, since the infection counts are large and since the > Poisson converges to a normal distribution for large expectation, > you could compute a variable RATE = COUNT / (TARGET_POPULATION) > and then use RATE as the response variable. There would be no need > for the offset parameter and the distribution of the response would > be assumed normal. You can plot these rates by year for each city > and your audience will immediately see a trend in rates as well as > city to city differences in rates. > > HTH, > > Dale > > --------------------------------------- > Dale McLerran > Fred Hutchinson Cancer Research Center > mailto: dmclerra@NO_SPAMfhcrc.org > Ph: (206) 667-2926 > Fax: (206) 667-5977 > --------------------------------------- >


Back to: Top of message | Previous page | Main SAS-L page