LISTSERV at the University of Georgia
Menubar Imagemap
Home Browse Manage Request Manuals Register
Previous messageNext messagePrevious in topicNext in topicPrevious by same authorNext by same authorPrevious page (January 2006, week 1)Back to main SAS-L pageJoin or leave SAS-L (or change settings)ReplyPost a new messageSearchProportional fontNon-proportional font
Date:   Fri, 6 Jan 2006 12:03:30 -0800
Reply-To:   Dale McLerran <stringplayer_2@YAHOO.COM>
Sender:   "SAS(r) Discussion" <SAS-L@LISTSERV.UGA.EDU>
From:   Dale McLerran <stringplayer_2@YAHOO.COM>
Subject:   Re: Creating dummy variables automatically
Comments:   To: Hari <excel_hari@yahoo.com>
In-Reply-To:   <1136573469.380275.88750@g43g2000cwa.googlegroups.com>
Content-Type:   text/plain; charset=iso-8859-1

--- Hari <excel_hari@YAHOO.COM> wrote:

> Hi, > > I have a variable called "Mkt" in my data file named as > "Big_Data". > > I have 53 different markets within "Mkt" and the number of > rows could be lets say 10000. > > I want to create 52 dummy variables each corresponding to 52 > markets (Zero value in all 52 dummy variable would indicate > that is the 53rd market). > > How do I automatically create these dummy variables in Big_data.

------------------------------SNIP---------------------------

> > regards, > Hari > India > > PS : I have a hazy idea of using these in a simple regression. > Proc mixed allows one to indicate class variables without usage > of dummy variable creation. Irrespective of whether I do proc > reg or not I want to learn the syntax. >

Hari,

I would not create a set of 52 dummy variables. You want to employ the different levels of Mkt as dummy variables in your regression analysis. That can be done automatically for you by various regression procedures. And it is the CLASS statement which controls construction of these dummy variables - contrary to what you state above. The purpose of the CLASS statement is to construct the design matrix (there is that phrase again) with a set of dummy (indicator) variables which represent the different levels of the categorical variables named on the CLASS statement. (Question: should these 53 different markets be represented as random effects? My a priori belief is that they should be.)

Regardless of whether Mkt is entering your model as a fixed or random effect, you do NOT NEED TO CONSTRUCT YOUR OWN DUMMY VARIABLES. The entire purpose of the CLASS statement when employed in one of the regression procedures is to construct the design matrix with the appropriate set of dummy variables.

There are several real advantages to keeping just the Mkt variable and not constructing your own dummy variables to represent the different markets. First, your data volume will be much smaller. Mkt contains all of the information that is contained in separate dummy variables. Thus, rather than needing 52 (or 53) variables, you need only the one variable. The CLASS statement will expand the different market values as different dummy variables at the time that you need such information. Second, the 52 (or 53) dummy variables which are constructed when you name Mkt on a CLASS statement are identified by the regression procedure as being related to one another and mutually exclusive of one another. That is, the regression procedure knows that these 52 (53) dummy variables represent a collection of effects for which you may want certain statistical tests. If these dummy variables are employed as fixed effects, then you may well want an overall F-test for mean differences across markets. In fact, you are likely to get this F-test whether you ask for it or not. Since the 52 (53) indicator variables in the design matrix were generated from a single CLASS variable, you will get a test for this effect automatically. If you code your own dummy variables, then you can usually construct such a test, but it would be a heck of a lot of work to do so.

Rethink your need to construct your own set of dummy variables for Mkt. You are almost certain to find that there is no need for you to construct your own set of dummy variables, and that it is in fact disadvantageous for you to use your own set of dummy variables.

Dale

--------------------------------------- Dale McLerran Fred Hutchinson Cancer Research Center mailto: dmclerra@NO_SPAMfhcrc.org Ph: (206) 667-2926 Fax: (206) 667-5977 ---------------------------------------

__________________________________________ Yahoo! DSL Something to write home about. Just $16.99/mo. or less. dsl.yahoo.com


Back to: Top of message | Previous page | Main SAS-L page