Date: Sun, 29 Jul 2001 20:07:57 -0300
Reply-To: hmaletta@fibertel.com.ar
Sender: "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From: Hector Maletta <hmaletta@FIBERTEL.COM.AR>
Subject: Re: Christian RE: Survey analysis
Content-Type: text/plain; charset=us-ascii
Christian Bautista wrote:
> a) I have 309 houses in a community.
> b) We have recollected history-information about the cases of Malaria (is
> malaria not TB) that occurred from 1996 to 2000 in these houses.
> That's means that per each house I know how many people lived in this house
> during a specific month and how many of these people were a Malaria
> positive.
> c) Now, I have a record per house, see this example below:
> Month IDhouse No.people No.positive
> Jan96 1001 6 2
> Feb96 1001 6 1
> Mar96 1001 6 0
> ....
> Jan97 1001 7 0
> Feb97 1001 7 2
> and so on....in this moment, I don't have all data sets.
> d)I have calculated the prevalence by house and then by month, too. I cannot
> calculate incidence because the disease is malaria and one person can get
> sick many times during a year.
>
> My questions are the following:
> a) I would like calculate the prevalence by year?
Lets us define: 1 occurrence-month = One person sick in certain month
1 person-month = One person resident in the house in certain month.
For example: If 6 persons live in the house in all the 12 months of a
year, there would be 12x6=72 person-months in that house.
You may define prevalence per year at the house level as:
P = Sum of occurrence-months in a house over 12 months / Sum of
person-months in the house over the same 12 months
To do this you may proceed in several ways. I'll suggest one of them
that seems convenient.
First, you need the file to be sorted by house, year and month. That is,
starting with house 001 in January 1996, house 001 in February 1996,
etc., and ending in house 309 in December 2000. This is ordinarily done
by the command: SORT CASES BY ID YEAR MONTH. But be careful about your
"month" variable, which seems to be coded in a string of letters and
numbers such as "Jan96", including both year and month: sorting would
sort these strings in alphabetical order, not in the chronological
ordering of the months within the year. What you need is two variables,
one for the year (varying from 1996 to 2000) and one for the month
(varying from 1 to 12). If necessary, your current "month-year" variable
could be recoded in a new variable (call it DATE) with numeric codes
such as 199601, 199602, 199603,...,199612,199701, ... , 200011, 200012.
Once this is done, you can create the year and month variables by
commands like these:
COMPUTE YEAR=TRUNC(DATE/100).
COMPUTE MONTH=DATE-(YEAR*100).
SORT CASES BY ID YEAR MONTH.
Once you have the file in order, you'd create an aggregated file with
only one row per year per house. The single row will have annual totals
instead of monthly figures, concerning one particular year and one
particular house. This way, house 001 will generate four rows, one for
each year from 1996 to 2000, and the same for the other houses (309 x 4
rows in total). This can be done with the following command:
AGGREGATE OUTFILE=* /break=ID YEAR
/PERMONTH 'Person months'=sum(people)
/OCMONTH 'Annual occurrences'=sum(posit).
The variables in your original file I assume are called "people" (people
in the house in a certain month) and "posit" (number of positives in
that house in a certain month).
Apart from creating these two variables, useful to compute the annual
prevalence, you may use the AGGREGATE command to create other variables
of interest. For instance you may add the following lines to the
precedent command (before the end period):
/PEOPLESD 'Standard deviation of people'=SD(PEOPLE)
/OCCURSD 'Standard deviation of occurrences'=SD(POSIT)
These variables may be helpful to test the stability of the houses'
total population and sick population respectively, over each year.
You may also combine year and month into a new "DATE" variable as
follows:
COMPUTE DATE=year*100+MONTH.
In this new variable, January, 1996 would be coded as 199601, and August
1997 as 199708.
At this point, you can compute the prevalence of the disease at the
house level:
COMPUTE PREV = OCMONTH/PERMONTH.
VAR LABEL PREV 'Prevalence of disease'.
> b) I would want to know if the prevalence is increasiong over time or not?
> (in analizing the time trend of the occurrences according to your
> suggestions)
First, recall you have the prevalence PER HOUSE. As this is affected by
many variables, it is pretty likely to be very unstable, and hardly any
tendency will be discernible over just four years. So a regression of
prevalence by year or prevalence by date could be misleading. But you
may want to have a table with data showing the OVERALL prevalence in the
community over the years:
TABLES /OBSERVATION PERMONTH OCMONTH /FTOTAL=TOTAL/TABLE PERMONT+OCMONTH
BY YEAR+TOTAL
/STAT SUM (PERMONTH (F8.0)'' OCMONTH (F8.0)'').
This will produce a table with two rows (one for people and another for
positives) and five columns of date (one per year, plus the total of
four years). If you copy the numbers into an Excel spreadsheet you may
easily add a third row with the prevalence.
You may also use the REPORT command in SPSS to generate the same table
(though transposed) with a column for the prevalence computed directly
by SPSS with the DIVIDE function (explanation of this could be sent on
request, but you'll do fine with Excel I think).
The table would allow you to judge how prevalence changes over time at
the community level (sum of the 309 houses).
> c) The mean of people living per house is 5.0 and I think that is
> homogenuos, is there a statistical test to see if this population is
> homogeonuos or not?
Just produce a frequency distribution (add the Graphics - Histogram
option) to judge by yourself:
FREQUENCIES PEOPLE /HISTOGRAM.
If you want a normal curve superimposed on the histogram of data, to
compare, then WRITE HISTOGRAM=NORMAL instead of just HISTOGRAM.
Hope all this helps. Good luck.
Hector Maletta
Universidad del Salvador
Buenos Aires, Argentina