-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FEA add pinball loss to SGDRegressor #22043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
ping @lorentzenchr @glemaitre |
) | ||
from .._quantile import QuantileRegressor | ||
|
||
benchmark = QuantileRegressor(alpha=1e-6, solver="highs").fit(X, y).score(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that this will fail with older scipy because the solve is unkown
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I am seeing the alpha
, it makes me think that we only support l1
penalty in the QuantileRegressor
.
@lorentzenchr is it fine to support l2
and elasticnet
in SGD?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this needs a skip_if
for scipy >= 1.6, IIRC.
And yes, SGD already comes with L2 and L1+L2, so why not supporting it. It would be more code to forbid it:wink:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need test cases for other penalties?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yes, SGD already comes with L2 and L1+L2, so why not supporting it. It would be more code to forbid it
@lorentzenchr so the reason to not support L2 and L1+L2 in QuantileRegressor
is that we cannot reformulate the problem as a linear programming problem?
Do we need test cases for other penalties?
Yes, it could be good. I don't know how easy this is but it could be great to have some integration test checking that the penalty has the expected impact on the coef_
and in the same time that we are indeed fitting the expected quantile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure on how to proceed further now, any pointers would be helpful. Thanks
The first round of review. |
Regardless of the fact that I'm not an expert for SGD and don't know how well it works for non-smooth objective, this PR would need unit tests similar to
|
These are great pointers, I will look into it. Thanks |
equivariance test is only failing for both with and without intercept situation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this contribution, @venkyyuvy.
Here are a few comments. If you can't continue this work, @ArturoAmorQ might be interested in pursuing it.
To follow-up with one of the proposal made by @lorentzenchr in #20132 (comment), I am personally be in favour of refactoring _sgd_fast.pyx
to use the common loss private submodule and thus avoid duplication. This submodule not only implements the pinball loss, but also other losses that are used in several modules.
If we choose to perform this refactoring first, then all the terminal and concrete LossFunction
s that are defined in this file must be moved to this private submodule and re-factored to extend the CyLossFunction
extension type and BaseLoss
case function:
scikit-learn/sklearn/_loss/_loss.pxd
Lines 27 to 30 in b571d64
cdef class CyLossFunction: | |
cdef double cy_loss(self, double y_true, double raw_prediction) nogil | |
cdef double cy_gradient(self, double y_true, double raw_prediction) nogil | |
cdef double_pair cy_grad_hess(self, double y_true, double raw_prediction) nogil |
scikit-learn/sklearn/_loss/loss.py
Lines 44 to 110 in b571d64
# Note: The shape of raw_prediction for multiclass classifications are | |
# - GradientBoostingClassifier: (n_samples, n_classes) | |
# - HistGradientBoostingClassifier: (n_classes, n_samples) | |
# | |
# Note: Instead of inheritance like | |
# | |
# class BaseLoss(BaseLink, CyLossFunction): | |
# ... | |
# | |
# # Note: Naturally, we would inherit in the following order | |
# # class HalfSquaredError(IdentityLink, CyHalfSquaredError, BaseLoss) | |
# # But because of https://github.com/cython/cython/issues/4350 we set BaseLoss as | |
# # the last one. This, of course, changes the MRO. | |
# class HalfSquaredError(IdentityLink, CyHalfSquaredError, BaseLoss): | |
# | |
# we use composition. This way we improve maintainability by avoiding the above | |
# mentioned Cython edge case and have easier to understand code (which method calls | |
# which code). | |
class BaseLoss: | |
"""Base class for a loss function of 1-dimensional targets. | |
Conventions: | |
- y_true.shape = sample_weight.shape = (n_samples,) | |
- y_pred.shape = raw_prediction.shape = (n_samples,) | |
- If is_multiclass is true (multiclass classification), then | |
y_pred.shape = raw_prediction.shape = (n_samples, n_classes) | |
Note that this corresponds to the return value of decision_function. | |
y_true, y_pred, sample_weight and raw_prediction must either be all float64 | |
or all float32. | |
gradient and hessian must be either both float64 or both float32. | |
Note that y_pred = link.inverse(raw_prediction). | |
Specific loss classes can inherit specific link classes to satisfy | |
BaseLink's abstractmethods. | |
Parameters | |
---------- | |
sample_weight : {None, ndarray} | |
If sample_weight is None, the hessian might be constant. | |
n_classes : {None, int} | |
The number of classes for classification, else None. | |
Attributes | |
---------- | |
closs: CyLossFunction | |
link : BaseLink | |
interval_y_true : Interval | |
Valid interval for y_true | |
interval_y_pred : Interval | |
Valid Interval for y_pred | |
differentiable : bool | |
Indicates whether or not loss function is differentiable in | |
raw_prediction everywhere. | |
need_update_leaves_values : bool | |
Indicates whether decision trees in gradient boosting need to uptade | |
leave values after having been fit to the (negative) gradients. | |
approx_hessian : bool | |
Indicates whether the hessian is approximated or exact. If, | |
approximated, it should be larger or equal to the exact one. | |
constant_hessian : bool | |
Indicates whether the hessian is one for this loss. | |
is_multiclass : bool | |
Indicates whether n_classes > 2 is allowed. | |
""" |
What do maintainers think? Would you first perform such a refactoring?
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC | ||
and BIC. An error is now raised when `n_features > n_samples` and | ||
when the noise variance is not provided. | ||
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and | ||
:user:`Andrés Babino <ababino>`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a spurious entry.
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC | |
and BIC. An error is now raised when `n_features > n_samples` and | |
when the noise variance is not provided. | |
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and | |
:user:`Andrés Babino <ababino>`. |
-|Feature| :class:`linear_model.SGDRegressor` would support pinball loss now. | ||
:pr:`22043` by :user:`Venkatachalam Natchiappan <venkyyuvy>`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be moved to doc/whats_new/v1.2.rst
.
cdef double residual = fabs(y - p) - self.epsilon | ||
return residual if residual > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change makes sense, but that's out of the scope of this PR. Hence this should be reverted.
cdef double residual = fabs(y - p) - self.epsilon | |
return residual if residual > 0 else 0 | |
cdef double ret = fabs(y - p) - self.epsilon | |
return ret if ret > 0 else 0 |
cdef double residual = fabs(y - p) - self.epsilon | ||
return residual * residual if residual > 0 else 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion than previously.
cdef double residual = fabs(y - p) - self.epsilon | |
return residual * residual if residual > 0 else 0 | |
cdef double ret = fabs(y - p) - self.epsilon | |
return ret * ret if ret > 0 else 0 |
I would prefer such a refactoring. But I'm not 100% sure it is feasible. It would certainly slow down the feature of this PR. So I'm ok both ways. |
Thanks for the review comments. |
Reference Issues/PRs
Partially addresses: #20132
What does this implement/fix? Explain your changes.
Added the pinball loss to sgd_fast.pyx.
Any other comments?
The results doesn't seems to be comparable with the QuantileRegressor. Need help.