 Hari <excel_hari@YAHOO.COM> wrote:
> Hi,
>
> I have a variable called "Mkt" in my data file named as
> "Big_Data".
>
> I have 53 different markets within "Mkt" and the number of
> rows could be lets say 10000.
>
> I want to create 52 dummy variables each corresponding to 52
> markets (Zero value in all 52 dummy variable would indicate
> that is the 53rd market).
>
> How do I automatically create these dummy variables in Big_data.
SNIP
>
> regards,
> Hari
> India
>
> PS : I have a hazy idea of using these in a simple regression.
> Proc mixed allows one to indicate class variables without usage
> of dummy variable creation. Irrespective of whether I do proc
> reg or not I want to learn the syntax.
>
Hari,
I would not create a set of 52 dummy variables. You want to
employ the different levels of Mkt as dummy variables in
your regression analysis. That can be done automatically
for you by various regression procedures. And it is the
CLASS statement which controls construction of these dummy
variables  contrary to what you state above. The purpose
of the CLASS statement is to construct the design matrix
(there is that phrase again) with a set of dummy (indicator)
variables which represent the different levels of the
categorical variables named on the CLASS statement.
(Question: should these 53 different markets be represented
as random effects? My a priori belief is that they should be.)
Regardless of whether Mkt is entering your model as a fixed
or random effect, you do NOT NEED TO CONSTRUCT YOUR OWN
DUMMY VARIABLES. The entire purpose of the CLASS statement
when employed in one of the regression procedures is to
construct the design matrix with the appropriate set of dummy
variables.
There are several real advantages to keeping just the Mkt
variable and not constructing your own dummy variables to
represent the different markets. First, your data volume
will be much smaller. Mkt contains all of the information
that is contained in separate dummy variables. Thus, rather
than needing 52 (or 53) variables, you need only the one
variable. The CLASS statement will expand the different
market values as different dummy variables at the time that
you need such information. Second, the 52 (or 53) dummy
variables which are constructed when you name Mkt on a
CLASS statement are identified by the regression procedure
as being related to one another and mutually exclusive of
one another. That is, the regression procedure knows that
these 52 (53) dummy variables represent a collection of
effects for which you may want certain statistical tests.
If these dummy variables are employed as fixed effects, then
you may well want an overall Ftest for mean differences
across markets. In fact, you are likely to get this Ftest
whether you ask for it or not. Since the 52 (53) indicator
variables in the design matrix were generated from a single
CLASS variable, you will get a test for this effect
automatically. If you code your own dummy variables, then
you can usually construct such a test, but it would be a
heck of a lot of work to do so.
Rethink your need to construct your own set of dummy
variables for Mkt. You are almost certain to find that
there is no need for you to construct your own set of
dummy variables, and that it is in fact disadvantageous
for you to use your own set of dummy variables.
Dale

Dale McLerran
Fred Hutchinson Cancer Research Center
mailto: dmclerra@NO_SPAMfhcrc.org
Ph: (206) 6672926
Fax: (206) 6675977

