- #1
EngWiPy
- 1,368
- 61
Hello,
I was reading an example on binning data, where a continuous variable is transformed into a categorical variable. The dataframe name is df, and the continuous variable's column's name is 'horsepower'. We would like to transform the continuous variable feature into a categorical feature with three values: low, medium, and high, and put the result in a new feature called 'horsepower_cat'. The lines of code for this to be done are:
The values are:
From the variable bins not the whole range is included and we have three bins up to 208.5! Why? Why did we divide the range by 4 in the first line, although we want 3 equal bins? The example says the following:
Could anyone help me understand this?
I was reading an example on binning data, where a continuous variable is transformed into a categorical variable. The dataframe name is df, and the continuous variable's column's name is 'horsepower'. We would like to transform the continuous variable feature into a categorical feature with three values: low, medium, and high, and put the result in a new feature called 'horsepower_cat'. The lines of code for this to be done are:
Python:
binwidth = (df['horsepower'].max() - df['horsepower'].min())/4 #why 4 not 3??
bins = np.arange(df['horsepower'].max(), df['horsepower].min(), binwidth) #np is from import numpy as np
group_names = ['low', 'medium', 'high']
df['horsepower_cat'] = pd.cut(df['horsepower'], bins, labels = group_names, include_lowest = True) #pd is from import pandas as pd
The values are:
Python:
df['horsepower'].min(): 48.0
df['horsepower'].max(): 262.0
binwidth: 53.5
bins: array([48.0, 101.5, 155. , 208.5])
From the variable bins not the whole range is included and we have three bins up to 208.5! Why? Why did we divide the range by 4 in the first line, although we want 3 equal bins? The example says the following:
What does this mean?We would like four bins of equal size bandwidth,the forth is because the function "cut" include the rightmost edge.
Could anyone help me understand this?