Linear Model with independent categorical variable

In summary, the two models that include a gender factor yield different slopes and intercepts for different values of the gender variable.
  • #1
fog37
1,569
108
TL;DR Summary
Linear Model with independent categorical variable
Hello,

I have been pondering on the following: we have data for blood pressure BP (response variable) and data about age and gender (categorical variable with two levels). We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$

The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.
The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect. But if best-fit line vertical shift is meaningful, then ##gender## has an effect. That means that the ##BP## values for males and females form different clusters that would require two different best-fit lines (same slope different intercept).
The 2nd model, including ##gender## takes care of that difference. Would the 2nd model be exactly equivalent to creating two separate linear regression models and best-fit lines, one for the male group and one for the female group, once we recognize that male and female form different clusters of points w.r.t. blood pressure BP?

Thank you!
 
Physics news on Phys.org
  • #2
No, those would not be exactly equivalent. The group-wise fitting would allow different intercepts and different slopes for the two groups. The no-interaction model (your model 2) allows different intercepts but not different slopes for the two groups. Also, the standard error for the slope in the no-interaction model will be smaller (if there is indeed no significant interaction) because it is estimated with twice the data of either of the group-wise fits.
 
  • Like
Likes fog37
  • #3
Dale said:
No, those would not be exactly equivalent. The group-wise fitting would allow different intercepts and different slopes for the two groups. The no-interaction model (your model 2) allows different intercepts but not different slopes for the two groups. Also, the standard error for the slope in the no-interaction model will be smaller (if there is indeed no significant interaction) because it is estimated with twice the data of either of the group-wise fits.
Thanks. I see. So the no-interaction model 2 would be a better model than creating two separate models, one for each group. Thanks for confirming.
 
  • #4
fog37 said:
Thanks. I see. So the no-interaction model 2 would be a better model than creating two separate models, one for each group. Thanks for confirming.
Yes. And if you think that there may be an interaction then I would use an interaction model instead of group-wise fits. It is a lot easier to control for multiple comparisons that way.
 
  • Like
Likes fog37
  • #5
Dale said:
Yes. And if you think that there may be an interaction then I would use an interaction model instead of group-wise fits. It is a lot easier to control for multiple comparisons that way.
We can suspect the interaction term between ##age## and ##gender## but the proof would be to see that model 2 generates best-fit lines with different slopes for different values of the ##gender## variable. Once we see that, we should include the interaction term ##(age)\times(gender)##
 
  • #6
fog37 said:
We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$

The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.
The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect.
You should have your model equations and your description in the same order so there is no confusion about which model is "first" and which is "second". It looks like your model equations are in reverse order. Otherwise, I disagree with practically everything you said about those two models.
A third option is to separate the genders into two distinct data sets and do separate regressions on each one. It is not clear to me if that is what you had in mind for the model that does not include a "gender" factor. I recommend this approach if you have enough data for each gender to get adequate parameter estimates for each.
 

Similar threads

Replies
30
Views
3K
Replies
1
Views
1K
Replies
22
Views
3K
Replies
3
Views
1K
Replies
4
Views
1K
Back
Top