Date: Wed, 13 Sep 2006 18:03:18 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Is there a danger for bias when using GAM procedure?
Content-Type: text/plain; charset="us-ascii"
I recall reading about serious problems with variance estimates computed
using SPlus and SAS GAM procedures. I believe the fault lay in the
backfitting algorithms used to fit the model, not in the GAM
methodology. Afterwards SAS Proc GAM advanced from experimental to
production in V8.2. I hope that meant that SAS corrected any problems in
the GAM procedure.
for interesting comments on use of GAMs. In particular,
...GAM assumes an additive model, that is, the difference
between SO4 at two longitude points is identical
across the whole range of latitudes, while the thinplate
smoothing spline method used in PROC TPSPLINE
does not make that assumption. The benefit
of the assumption is that PROC GAM runs much
faster than PROC TPSPLINE. The downside is that
the additive assumption may not be appropriate ....
I haven't seen much evidence of use of SAS Proc GAM since shortly after
the release of V8.2. GAM supports scoring of test datasets, but classes
of predictors not found in training datasets used to develop scoring
equations will have missing values in test datasets that have them. The
results of estimating a GAM tend to be difficult to interpret. Assuming
that you have more interest in explanation than prediction, you may need
to work backwards from GENMOD and partial prediction graphs suggested by
Friedman, Hastie, and Tibshirani to semi-parametric and non-parametric
Time-series and cross-section variations, of course, complicate
statistical modelling, and, with a large cohort and observations over a
ten year span, it will be difficult to separate out different trends,
cycles, and seasonal variations that may occur in different geographic
locations. I'd guess that microclimates favored by winemakers have a
less attractive analogy in pockets of high exposures to pollutants.
I don't see anything in your study overview about the distribution of
CVD outcomes over time. If you have a prospective study that has CVD
outcome per subject after 10 years, you'll need a much different method
of analysis than what you'd need for a time to critical CVD outcome
survival analysis. An analysis of variations of annual rates of CVD per
region over time given variations in exposures and other factors
requires other methods. The statistical models that you are considering
imply binary outcomes or rates of an outcome.
From: email@example.com [mailto:firstname.lastname@example.org]
On Behalf Of Cornel Lencar
Sent: Wednesday, September 13, 2006 4:40 PM
Subject: Is there a danger for bias when using GAM procedure?
I am working for a population study that tries to assess the effect of
air pollution (certain pollutants) on the cardio-vascular outcomes for a
cohort of ~ 700K between 1994-2003. So it is a longitudinal study and
from what I have searched in the literature the most used analytical
2. Poisson regression combined with a LOESS function (is this working
similarly as the GAM approach?) 3.Cross-over studies (but only for small
and very small cohorts 4. Poisson regression with variables for day of
week, seasonality and so on.
Most of these analyses are reported as performed with S-plus (it would
take me some time to work with R as fast as I do it with SAS) plus GAM
is reported as introducing some bias in the estimates. Is the same true
for the GAM procedure in SAS (this means, is there a conceptual problem
in the methodology?).
In the frame work of SAS, what would be the best methodologycal approach
that I can take, having a reasonable good data set (spatial location,
weather records, pollution, socio-economical variables, individual
Any suggestions would be welcomed and shared in the community.
School of Occupational and Environmental Hygiene
The University of British Columbia e-mail:email@example.com