- #1
fog37
- 1,569
- 108
- TL;DR Summary
- Linear Model with independent categorical variable
Hello,
I have been pondering on the following: we have data for blood pressure BP (response variable) and data about age and gender (categorical variable with two levels). We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$
The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.
The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect. But if best-fit line vertical shift is meaningful, then ##gender## has an effect. That means that the ##BP## values for males and females form different clusters that would require two different best-fit lines (same slope different intercept).
The 2nd model, including ##gender## takes care of that difference. Would the 2nd model be exactly equivalent to creating two separate linear regression models and best-fit lines, one for the male group and one for the female group, once we recognize that male and female form different clusters of points w.r.t. blood pressure BP?
Thank you!
I have been pondering on the following: we have data for blood pressure BP (response variable) and data about age and gender (categorical variable with two levels). We can build two linear regression models: $$BP=b_0+b_1 age+b_2 gender$$ $$BP=b_0+b_1 age$$
The first model does not take gender into account and plots one single best-fit line disregarding that gender may have an effect.
The 2nd model includes ##gender## and two scenarios are possible: assuming no interaction term, the categorical variable ##gender## may shift the best fit regression line up or down depending its value being ##1## or ##0## and the sign of its corresponding coefficient. If the shift is very small, then ##gender## does not have an effect. But if best-fit line vertical shift is meaningful, then ##gender## has an effect. That means that the ##BP## values for males and females form different clusters that would require two different best-fit lines (same slope different intercept).
The 2nd model, including ##gender## takes care of that difference. Would the 2nd model be exactly equivalent to creating two separate linear regression models and best-fit lines, one for the male group and one for the female group, once we recognize that male and female form different clusters of points w.r.t. blood pressure BP?
Thank you!