- #1
SchroedingersLion
- 215
- 57
Hello guys,
I have some difficulties understanding the procedure of cross validation to estimate the hyperparameter ## \lambda ## in Ridge Regression.
The Ridge Regression yields the weight vector w from
$$ min_w ( ||Y-Xw||^2 + \lambda ||w||)$$
X is the data matrix that stores N data vectors in its rows, Y is the N-vector of the targets that belong to the N data vectors.
Now, as far as I understand it, the advantage of Ridge Regression as opposed to ordinary least squares, where ##\lambda=0##, is that we suppress the influence of statistical outliers in our given data.
However, I have read that a prominent way of finding the optimal ##\lambda ## is via cross-validation.
We split the data into training and test data, we estimate w for the training data and calculate the mean-squared-error of our model data ##Xw## on the test data. Then we do the same for different types of training-test splitting and average over the different MSE's.
We do this for a range of ##\lambda## and then choose the one ##\lambda## that leads to the least MSE in the cross validation. Fine.
But how is it legit to judge the quality of our model by looking at the mean-squared-error? If I want a minimal mean-squared-error I would have to set ##\lambda=0## and arrive at the ordinary least squares again.
Second question: I have just used a random data set from a machine learning website and performed Ridge Regression on it, using the Python Scikit package, the method linearmodel.RidgeCV.
https://scikit-learn.org/stable/mod...del.RidgeCV.html#sklearn.linear_model.RidgeCV
It gives me a very small optimal ##\lambda=0.02## (it uses cross validation).
What does that mean? That ordinary least squares would have been good enough?
Regards!
SL
I have some difficulties understanding the procedure of cross validation to estimate the hyperparameter ## \lambda ## in Ridge Regression.
The Ridge Regression yields the weight vector w from
$$ min_w ( ||Y-Xw||^2 + \lambda ||w||)$$
X is the data matrix that stores N data vectors in its rows, Y is the N-vector of the targets that belong to the N data vectors.
Now, as far as I understand it, the advantage of Ridge Regression as opposed to ordinary least squares, where ##\lambda=0##, is that we suppress the influence of statistical outliers in our given data.
However, I have read that a prominent way of finding the optimal ##\lambda ## is via cross-validation.
We split the data into training and test data, we estimate w for the training data and calculate the mean-squared-error of our model data ##Xw## on the test data. Then we do the same for different types of training-test splitting and average over the different MSE's.
We do this for a range of ##\lambda## and then choose the one ##\lambda## that leads to the least MSE in the cross validation. Fine.
But how is it legit to judge the quality of our model by looking at the mean-squared-error? If I want a minimal mean-squared-error I would have to set ##\lambda=0## and arrive at the ordinary least squares again.
Second question: I have just used a random data set from a machine learning website and performed Ridge Regression on it, using the Python Scikit package, the method linearmodel.RidgeCV.
https://scikit-learn.org/stable/mod...del.RidgeCV.html#sklearn.linear_model.RidgeCV
It gives me a very small optimal ##\lambda=0.02## (it uses cross validation).
What does that mean? That ordinary least squares would have been good enough?
Regards!
SL