Date: Mon, 31 Mar 2008 14:52:50 -0400
Reply-To: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
Subject: Re: Rules for GLMSELECT
In-Reply-To: <26144118.1206987953461.JavaMail.root@mswamui-thinleaf.atl.sa.earthlink.net>
Content-Type: text/plain; charset="us-ascii"
Peter:
I meant that one should re-estimate whatever final model GLMSELECT
specifies (after taking advantage of shrinkage of parameter estimates)
using a standard regression procedure. With more observations GLMSELECT
will have a larger set of alternative models to consider. I would have
serious doubts about a model specification that selects predictors that
seem no more likely to have a true association with the DV than any of
the ones rejected. Blind selection of a few predictors risks selecting
those that fit well to one sample and not to other samples.
Underloading predictors results biased parameter estimates (although
perhaps with narrower confidence intervals than less biased estimates).
Re-estimating a model using a standard regression program supports
analysis of residuals that may uncover bias in predictions.
S
-----Original Message-----
From: owner-sas-l@listserv.uga.edu [mailto:owner-sas-l@listserv.uga.edu]
On Behalf Of Peter Flom
Sent: Monday, March 31, 2008 2:26 PM
To: Sigurd Hermansen; SAS-L@LISTSERV.UGA.EDU
Subject: Re: Rules for GLMSELECT
I have to disagree with Sig, here.
I think the point of GLMSELECT, or at least a large part of the point,
is that it penalizes you for having too few observations for your number
of variables by selecting a simple model. For continuous DV, the runs
that David Cassell and I did for our paper show that if you greatly
overload your model with variables, then STEPWISE in its various
flavors will mess up, but GLMSELECT will not.
Peter
-----Original Message-----
>From: Sigurd Hermansen <HERMANS1@WESTAT.COM>
>Sent: Mar 31, 2008 1:40 PM
>To: SAS-L@LISTSERV.UGA.EDU
>Subject: Re: Rules for GLMSELECT
>
>Martin:
>Since GLMSELECT should be used only for exploratory modelling, and
>whatever model you select should be estimated using PROC LOGISTIC, PROC
>GENMOD, PROC MIXED, or another regression procedure, the same rules
>should apply whether or not you are using PROC GLMSELECT to help
>specify a model. I do think that automated exploration of predictive
>model specifications would require substantially more observations than
>specification of a model for the purpose of testing a specific
>hypothesis. S
>
>
>
>-----Original Message-----
>From: owner-sas-l@listserv.uga.edu
>[mailto:owner-sas-l@listserv.uga.edu]
>On Behalf Of martholt
>Sent: Monday, March 31, 2008 12:56 PM
>To: sas-l@uga.edu
>Subject: Rules for GLMSELECT
>
>
>When using logistic regression, the general advice is to have no fewer
>than 10 cases per variable. Do any such rules exist for GLMSELECT, or
>could you please point me to a document that discusses this.
>
>Thank you,
>
>Martin Holt
Statistical Consultant
www DOT peterflom DOT com