Date:   Thu, 8 Jul 2004 23:04:12 +0200
Reply-To:   Qinghai Huang <>
Sender:   "SPSSX(r) Discussion" <SPSSX-L@LISTSERV.UGA.EDU>
From:   Qinghai Huang <>
Subject:   Re: dummy variable coding in regression
In-Reply-To:   <>
Content-Type:   text/plain; charset="us-ascii" ; format="flowed"


Thanks for your suggestions. I will pay attention to the representativeness issue and try to find reasonable control variables.


>It's hard to say from the limited description, but my first guess is that >you will not get a statistically significant coefficient for a group with >only 11 cases. If the small groups are significantly different, make sure >it's group membership that is really driving the difference. For instance, >if your groups are contract workers, production workers, managers, and >executives and you want to postulate that group membership has an impact of >X on your dependent variable, then you should take at least one of these >two measures to control for possible bias. >1) add variables for the other things that also generally correlate with >group membership (i.e., age, tenure, education level, etc) >or >2) make sure that your subgroups are truly representative of the subgroup >population on these measures (i.e., mean age of executives in Sweden is 47 >and mean age of my 11 executives in the sample is 47, mean tenure of >executives in Sweden is 18 and mean tenure of my 11 executives in the >sample is 47, etc.) > >As you can probably tell, the likelihood that test 2 will show >representative subgroups declines with sample size. So you are probably >better off constructing your model with the additional control variables. >That will make it less likely that you come to a spurious conclusion like >executives are more likely to be hospitalized that production workers >(because in reality older people are more likely to be hospitalized than >younger, and controlling for age, executives are less likely to be >hospitalized) [just a guess] > >Philip Moore >Market Research Manager >(804) 747-0422 x4831 >(804) 935-4549 FAX > >The information in this email is extremely confidential. It is intended >solely for the addressee. Access to this email by anyone else is >unauthorized. Please do not copy or disseminate any portion of this email. > > > > Qinghai Huang > <huangqh@psycholo >> To > Sent by: SPSSX-L@LISTSERV.UGA.EDU > "SPSSX(r) cc > Discussion" > <SPSSX-L@LISTSERV Subject > .UGA.EDU> Re: dummy variable coding in > regression > > 07/08/2004 02:54 > PM > > > Please respond to > Qinghai Huang > <huangqh@psycholo >> > > > > > > >Thanks very much for your message. Talking about the sample size, I >have n= 360, nine categories will be used. But sizes of the 9 >categories range from 70 to 11. The group with the largest size will >be used as reference group. Is there any bias with the uneven sample >sizes across the groups? > >Thanks, >Qinghai > >>and from a purely pragmatic and technical point of view... > > >>The significance of the rest of your dummy variables will be affected by >>the size and difference of your choice of intercept set. If you had four >>categories like contract workers, production-line workers, managers, and >>executives where contract workers were significantly different than the >>other three categories, then using contract workers as the intercept set >>will produce statistically significant coefficients for all three of your >>dummies. If, on the other hand, you choose production-line workers as the >>intercept and they are not significantly different than managers, then >only >>two of your dummies (contract workers, and executives) will have >>significant coefficients. >> >>Philip Moore >>Market Research Manager >>(804) 747-0422 x4831 >>(804) 935-4549 FAX > >

