Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FEA add pinball loss to SGDRegressor #22043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: main
Choose a base branch
Loading
from

Conversation

venkyyuvy
Copy link
Contributor

@venkyyuvy venkyyuvy commented Dec 21, 2021

Reference Issues/PRs

Partially addresses: #20132

What does this implement/fix? Explain your changes.

Added the pinball loss to sgd_fast.pyx.

Any other comments?

The results doesn't seems to be comparable with the QuantileRegressor. Need help.

@venkyyuvy
Copy link
Contributor Author

ping @lorentzenchr @glemaitre

@glemaitre glemaitre self-requested a review December 21, 2021 11:10
@venkyyuvy venkyyuvy changed the title adding pinball loss Adding pinball loss to SGDRegressor Dec 21, 2021
sklearn/linear_model/_sgd_fast.pyx Outdated Show resolved Hide resolved
sklearn/linear_model/_sgd_fast.pyx Outdated Show resolved Hide resolved
sklearn/linear_model/_sgd_fast.pyx Outdated Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
sklearn/linear_model/_sgd_fast.pyx Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved
)
from .._quantile import QuantileRegressor

benchmark = QuantileRegressor(alpha=1e-6, solver="highs").fit(X, y).score(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that this will fail with older scipy because the solve is unkown

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I am seeing the alpha, it makes me think that we only support l1 penalty in the QuantileRegressor.

@lorentzenchr is it fine to support l2 and elasticnet in SGD?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this needs a skip_if for scipy >= 1.6, IIRC.

And yes, SGD already comes with L2 and L1+L2, so why not supporting it. It would be more code to forbid it:wink:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need test cases for other penalties?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yes, SGD already comes with L2 and L1+L2, so why not supporting it. It would be more code to forbid it

@lorentzenchr so the reason to not support L2 and L1+L2 in QuantileRegressor is that we cannot reformulate the problem as a linear programming problem?

Do we need test cases for other penalties?

Yes, it could be good. I don't know how easy this is but it could be great to have some integration test checking that the penalty has the expected impact on the coef_ and in the same time that we are indeed fitting the expected quantile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on how to proceed further now, any pointers would be helpful. Thanks

sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_sgd.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member

The first round of review.

@glemaitre glemaitre changed the title Adding pinball loss to SGDRegressor FEA add pinball loss to SGDRegressor Dec 22, 2021
@lorentzenchr
Copy link
Member

Regardless of the fact that I'm not an expert for SGD and don't know how well it works for non-smooth objective, this PR would need unit tests similar to

def test_asymmetric_error(quantile):
and
def test_equivariance(quantile):

@venkyyuvy
Copy link
Contributor Author

venkyyuvy commented Jan 19, 2022

These are great pointers, I will look into it. Thanks

@venkyyuvy
Copy link
Contributor Author

equivariance test is only failing for both with and without intercept situation.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this contribution, @venkyyuvy.

Here are a few comments. If you can't continue this work, @ArturoAmorQ might be interested in pursuing it.


To follow-up with one of the proposal made by @lorentzenchr in #20132 (comment), I am personally be in favour of refactoring _sgd_fast.pyx to use the common loss private submodule and thus avoid duplication. This submodule not only implements the pinball loss, but also other losses that are used in several modules.

If we choose to perform this refactoring first, then all the terminal and concrete LossFunctions that are defined in this file must be moved to this private submodule and re-factored to extend the CyLossFunction extension type and BaseLoss case function:

cdef class CyLossFunction:
cdef double cy_loss(self, double y_true, double raw_prediction) nogil
cdef double cy_gradient(self, double y_true, double raw_prediction) nogil
cdef double_pair cy_grad_hess(self, double y_true, double raw_prediction) nogil

# Note: The shape of raw_prediction for multiclass classifications are
# - GradientBoostingClassifier: (n_samples, n_classes)
# - HistGradientBoostingClassifier: (n_classes, n_samples)
#
# Note: Instead of inheritance like
#
# class BaseLoss(BaseLink, CyLossFunction):
# ...
#
# # Note: Naturally, we would inherit in the following order
# # class HalfSquaredError(IdentityLink, CyHalfSquaredError, BaseLoss)
# # But because of https://github.com/cython/cython/issues/4350 we set BaseLoss as
# # the last one. This, of course, changes the MRO.
# class HalfSquaredError(IdentityLink, CyHalfSquaredError, BaseLoss):
#
# we use composition. This way we improve maintainability by avoiding the above
# mentioned Cython edge case and have easier to understand code (which method calls
# which code).
class BaseLoss:
"""Base class for a loss function of 1-dimensional targets.
Conventions:
- y_true.shape = sample_weight.shape = (n_samples,)
- y_pred.shape = raw_prediction.shape = (n_samples,)
- If is_multiclass is true (multiclass classification), then
y_pred.shape = raw_prediction.shape = (n_samples, n_classes)
Note that this corresponds to the return value of decision_function.
y_true, y_pred, sample_weight and raw_prediction must either be all float64
or all float32.
gradient and hessian must be either both float64 or both float32.
Note that y_pred = link.inverse(raw_prediction).
Specific loss classes can inherit specific link classes to satisfy
BaseLink's abstractmethods.
Parameters
----------
sample_weight : {None, ndarray}
If sample_weight is None, the hessian might be constant.
n_classes : {None, int}
The number of classes for classification, else None.
Attributes
----------
closs: CyLossFunction
link : BaseLink
interval_y_true : Interval
Valid interval for y_true
interval_y_pred : Interval
Valid Interval for y_pred
differentiable : bool
Indicates whether or not loss function is differentiable in
raw_prediction everywhere.
need_update_leaves_values : bool
Indicates whether decision trees in gradient boosting need to uptade
leave values after having been fit to the (negative) gradients.
approx_hessian : bool
Indicates whether the hessian is approximated or exact. If,
approximated, it should be larger or equal to the exact one.
constant_hessian : bool
Indicates whether the hessian is one for this loss.
is_multiclass : bool
Indicates whether n_classes > 2 is allowed.
"""

What do maintainers think? Would you first perform such a refactoring?

Comment on lines +541 to +546
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC
and BIC. An error is now raised when `n_features > n_samples` and
when the noise variance is not provided.
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and
:user:`Andrés Babino <ababino>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a spurious entry.

Suggested change
- |Fix| :class:`linear_model.LassoLarsIC` now correctly computes AIC
and BIC. An error is now raised when `n_features > n_samples` and
when the noise variance is not provided.
:pr:`21481` by :user:`Guillaume Lemaitre <glemaitre>` and
:user:`Andrés Babino <ababino>`.

Comment on lines +547 to +549
-|Feature| :class:`linear_model.SGDRegressor` would support pinball loss now.
:pr:`22043` by :user:`Venkatachalam Natchiappan <venkyyuvy>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be moved to doc/whats_new/v1.2.rst.

Comment on lines +321 to +322
cdef double residual = fabs(y - p) - self.epsilon
return residual if residual > 0 else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense, but that's out of the scope of this PR. Hence this should be reverted.

Suggested change
cdef double residual = fabs(y - p) - self.epsilon
return residual if residual > 0 else 0
cdef double ret = fabs(y - p) - self.epsilon
return ret if ret > 0 else 0

Comment on lines +348 to +349
cdef double residual = fabs(y - p) - self.epsilon
return residual * residual if residual > 0 else 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion than previously.

Suggested change
cdef double residual = fabs(y - p) - self.epsilon
return residual * residual if residual > 0 else 0
cdef double ret = fabs(y - p) - self.epsilon
return ret * ret if ret > 0 else 0

@lorentzenchr
Copy link
Member

What do maintainers think? Would you first perform such a refactoring?

I would prefer such a refactoring. But I'm not 100% sure it is feasible. It would certainly slow down the feature of this PR. So I'm ok both ways.

@venkyyuvy
Copy link
Contributor Author

Thanks for the review comments.
I am still interested in pursuing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.