Date: Thu, 26 Sep 2002 10:48:46 -0500
Reply-To: Paul Thompson <paul@WUBIOS.WUSTL.EDU>
Sender: "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From: Paul Thompson <paul@WUBIOS.WUSTL.EDU>
Organization: Washington University in St. Louis
Subject: Re: Regression with Class Variables and Stepwise Selection
Content-Type: text/plain; charset=us-ascii; format=flowed
> This has to be a common question, but I haven't found an answer in the
> I want to do a linear regression with at least one class variable (more
> than just binary... let's use race as an example) as a predictor, and also
> do stepwise variable selection.
1) Stepwise selection has many defects, which I will not bore you with.
2) How many predictors do you have? You can actually do this by hand.
Find the predictor which is least effective and dump it. etc. etc.
3) Exercize for the reader: Write a macro to do stepwise selection with
PROC GLM for class variables. For 5 points, use ODS to put the
variables into a file. For 20 points, use PRINTTO. Real men/women will
not use ODS, but will use PRINTTO to output the results into a file, and
then read it it. Oh well, remnants of horrors gone by...
Actually, I was just kidding with the PRINTTO. Use ODS.
> PROC GLM has the class statement (yay!), but not the selection= option
> PROC REG has the selection= option (yay!) but not the class statement
> Certainly I could recode the k-level race variable to k-1 binary dummy
> variables (RaceWhite, RaceBlack, RaceAsian, etc.) and then run that through
> PROC REG with the selection= option... but that will test each individual
> dummy variable independently, and I want the whole race variable to be
> tested at once. (if only one binary variable were significant, and I chose
> that one as my reference level, I might miss it altogether! yuck.)
> On the other hand I could use the PROC GLM and recreate the stepwise
> selection process manually:
> 1 Code and Run ALL 1-variable models
> 2 compare F-values and select which variable (V1) to put in first
> 3 Code and Run ALL 2-variable models including V1
> 4 compare F-values and select which variable (V2) to put in next
> 5 See if either V1 or V2 is no longer significant - if so, drop it
> 6 repeat steps 3-5 until no more variables are significant or caught in a
> The first option is not the results that I want. The second option seems
> like it should already have been programmed to be done automatically (PROC
> LOGISTIC does it nicely).
> Anyone been here before? Maybe there are some nice macros to share? Maybe
> there's an option that I'm overlooking? Any help would be appreciated.