|
Philip,
Thanks for your suggestions. I will pay attention to the
representativeness issue and try to find reasonable control variables.
Qinghai
>It's hard to say from the limited description, but my first guess is that
>you will not get a statistically significant coefficient for a group with
>only 11 cases. If the small groups are significantly different, make sure
>it's group membership that is really driving the difference. For instance,
>if your groups are contract workers, production workers, managers, and
>executives and you want to postulate that group membership has an impact of
>X on your dependent variable, then you should take at least one of these
>two measures to control for possible bias.
>1) add variables for the other things that also generally correlate with
>group membership (i.e., age, tenure, education level, etc)
>or
>2) make sure that your subgroups are truly representative of the subgroup
>population on these measures (i.e., mean age of executives in Sweden is 47
>and mean age of my 11 executives in the sample is 47, mean tenure of
>executives in Sweden is 18 and mean tenure of my 11 executives in the
>sample is 47, etc.)
>
>As you can probably tell, the likelihood that test 2 will show
>representative subgroups declines with sample size. So you are probably
>better off constructing your model with the additional control variables.
>That will make it less likely that you come to a spurious conclusion like
>executives are more likely to be hospitalized that production workers
>(because in reality older people are more likely to be hospitalized than
>younger, and controlling for age, executives are less likely to be
>hospitalized) [just a guess]
>
>Philip Moore
>Market Research Manager
>(804) 747-0422 x4831
>(804) 935-4549 FAX
>
>The information in this email is extremely confidential. It is intended
>solely for the addressee. Access to this email by anyone else is
>unauthorized. Please do not copy or disseminate any portion of this email.
>
>
>
> Qinghai Huang
> <huangqh@psycholo
> gy.su.se> To
> Sent by: SPSSX-L@LISTSERV.UGA.EDU
> "SPSSX(r) cc
> Discussion"
> <SPSSX-L@LISTSERV Subject
> .UGA.EDU> Re: dummy variable coding in
> regression
>
> 07/08/2004 02:54
> PM
>
>
> Please respond to
> Qinghai Huang
> <huangqh@psycholo
> gy.su.se>
>
>
>
>
>
>
>Thanks very much for your message. Talking about the sample size, I
>have n= 360, nine categories will be used. But sizes of the 9
>categories range from 70 to 11. The group with the largest size will
>be used as reference group. Is there any bias with the uneven sample
>sizes across the groups?
>
>Thanks,
>Qinghai
>
>>and from a purely pragmatic and technical point of view...
> >
>>The significance of the rest of your dummy variables will be affected by
>>the size and difference of your choice of intercept set. If you had four
>>categories like contract workers, production-line workers, managers, and
>>executives where contract workers were significantly different than the
>>other three categories, then using contract workers as the intercept set
>>will produce statistically significant coefficients for all three of your
>>dummies. If, on the other hand, you choose production-line workers as the
>>intercept and they are not significantly different than managers, then
>only
>>two of your dummies (contract workers, and executives) will have
>>significant coefficients.
>>
>>Philip Moore
>>Market Research Manager
>>(804) 747-0422 x4831
>>(804) 935-4549 FAX
> >
|