- #1
fog37
- 1,569
- 108
- TL;DR Summary
- Handling categorical variables in R
Hello R users,
My general understanding is that, in R, nominal categorical variables (with 2 or more levels) must be first converted into factors and THEN to dummy variables (k-1 dummy variables for k levels). Is that correct?
Once we accomplish categorical variable -> factor -> dummy variables, we can then use the dummy variable as an independent or dependent variable in a statistical model (P.S. : when using the function ##lm()## in R, the function ##lm()## automatically does the dummy variable conversion but I am not sure that being true for other models).
What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?
Python does not have factors so that intermediate "factor" step does not apply...
Thanks!
My general understanding is that, in R, nominal categorical variables (with 2 or more levels) must be first converted into factors and THEN to dummy variables (k-1 dummy variables for k levels). Is that correct?
Once we accomplish categorical variable -> factor -> dummy variables, we can then use the dummy variable as an independent or dependent variable in a statistical model (P.S. : when using the function ##lm()## in R, the function ##lm()## automatically does the dummy variable conversion but I am not sure that being true for other models).
What if we converted the categorical variable to dummy variables without the intermediate factor step? Would that still work in R?
Python does not have factors so that intermediate "factor" step does not apply...
Thanks!