```Date: Sun, 29 Jul 2001 20:07:57 -0300 Reply-To: hmaletta@fibertel.com.ar Sender: "SPSSX(r) Discussion" From: Hector Maletta Subject: Re: Christian RE: Survey analysis Content-Type: text/plain; charset=us-ascii Christian Bautista wrote: > a) I have 309 houses in a community. > b) We have recollected history-information about the cases of Malaria (is > malaria not TB) that occurred from 1996 to 2000 in these houses. > That's means that per each house I know how many people lived in this house > during a specific month and how many of these people were a Malaria > positive. > c) Now, I have a record per house, see this example below: > Month IDhouse No.people No.positive > Jan96 1001 6 2 > Feb96 1001 6 1 > Mar96 1001 6 0 > .... > Jan97 1001 7 0 > Feb97 1001 7 2 > and so on....in this moment, I don't have all data sets. > d)I have calculated the prevalence by house and then by month, too. I cannot > calculate incidence because the disease is malaria and one person can get > sick many times during a year. > > My questions are the following: > a) I would like calculate the prevalence by year? Lets us define: 1 occurrence-month = One person sick in certain month 1 person-month = One person resident in the house in certain month. For example: If 6 persons live in the house in all the 12 months of a year, there would be 12x6=72 person-months in that house. You may define prevalence per year at the house level as: P = Sum of occurrence-months in a house over 12 months / Sum of person-months in the house over the same 12 months To do this you may proceed in several ways. I'll suggest one of them that seems convenient. First, you need the file to be sorted by house, year and month. That is, starting with house 001 in January 1996, house 001 in February 1996, etc., and ending in house 309 in December 2000. This is ordinarily done by the command: SORT CASES BY ID YEAR MONTH. But be careful about your "month" variable, which seems to be coded in a string of letters and numbers such as "Jan96", including both year and month: sorting would sort these strings in alphabetical order, not in the chronological ordering of the months within the year. What you need is two variables, one for the year (varying from 1996 to 2000) and one for the month (varying from 1 to 12). If necessary, your current "month-year" variable could be recoded in a new variable (call it DATE) with numeric codes such as 199601, 199602, 199603,...,199612,199701, ... , 200011, 200012. Once this is done, you can create the year and month variables by commands like these: COMPUTE YEAR=TRUNC(DATE/100). COMPUTE MONTH=DATE-(YEAR*100). SORT CASES BY ID YEAR MONTH. Once you have the file in order, you'd create an aggregated file with only one row per year per house. The single row will have annual totals instead of monthly figures, concerning one particular year and one particular house. This way, house 001 will generate four rows, one for each year from 1996 to 2000, and the same for the other houses (309 x 4 rows in total). This can be done with the following command: AGGREGATE OUTFILE=* /break=ID YEAR /PERMONTH 'Person months'=sum(people) /OCMONTH 'Annual occurrences'=sum(posit). The variables in your original file I assume are called "people" (people in the house in a certain month) and "posit" (number of positives in that house in a certain month). Apart from creating these two variables, useful to compute the annual prevalence, you may use the AGGREGATE command to create other variables of interest. For instance you may add the following lines to the precedent command (before the end period): /PEOPLESD 'Standard deviation of people'=SD(PEOPLE) /OCCURSD 'Standard deviation of occurrences'=SD(POSIT) These variables may be helpful to test the stability of the houses' total population and sick population respectively, over each year. You may also combine year and month into a new "DATE" variable as follows: COMPUTE DATE=year*100+MONTH. In this new variable, January, 1996 would be coded as 199601, and August 1997 as 199708. At this point, you can compute the prevalence of the disease at the house level: COMPUTE PREV = OCMONTH/PERMONTH. VAR LABEL PREV 'Prevalence of disease'. > b) I would want to know if the prevalence is increasiong over time or not? > (in analizing the time trend of the occurrences according to your > suggestions) First, recall you have the prevalence PER HOUSE. As this is affected by many variables, it is pretty likely to be very unstable, and hardly any tendency will be discernible over just four years. So a regression of prevalence by year or prevalence by date could be misleading. But you may want to have a table with data showing the OVERALL prevalence in the community over the years: TABLES /OBSERVATION PERMONTH OCMONTH /FTOTAL=TOTAL/TABLE PERMONT+OCMONTH BY YEAR+TOTAL /STAT SUM (PERMONTH (F8.0)'' OCMONTH (F8.0)''). This will produce a table with two rows (one for people and another for positives) and five columns of date (one per year, plus the total of four years). If you copy the numbers into an Excel spreadsheet you may easily add a third row with the prevalence. You may also use the REPORT command in SPSS to generate the same table (though transposed) with a column for the prevalence computed directly by SPSS with the DIVIDE function (explanation of this could be sent on request, but you'll do fine with Excel I think). The table would allow you to judge how prevalence changes over time at the community level (sum of the 309 houses). > c) The mean of people living per house is 5.0 and I think that is > homogenuos, is there a statistical test to see if this population is > homogeonuos or not? Just produce a frequency distribution (add the Graphics - Histogram option) to judge by yourself: FREQUENCIES PEOPLE /HISTOGRAM. If you want a normal curve superimposed on the histogram of data, to compare, then WRITE HISTOGRAM=NORMAL instead of just HISTOGRAM. Hope all this helps. Good luck. Hector Maletta Universidad del Salvador Buenos Aires, Argentina ```

