Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH support sparse data input for QuantileRegressor #21086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Dec 15, 2021

Conversation

venkyyuvy
Copy link
Contributor

@venkyyuvy venkyyuvy commented Sep 19, 2021

Reference Issues/PRs:

partially addresses #20132.

What does this implement/fix? Explain your changes.

Edit: This enables sparse X for QuantileRegressor with the highs* solvers.

Any other comments?

@venkyyuvy
Copy link
Contributor Author

coefs for sparse and dense array inputs differ when alpha =0.

@glemaitre
Any pointers would be helpful to fix it.

@glemaitre
Copy link
Member

Thanks @venkyyuvy It seems that the CIs are still not passing. Can you check what is the reason. Do not hesitate if it seems to be a tricky corner case.

@venkyyuvy
Copy link
Contributor Author

There is change in the coef value when alpha is 0 (between sparse and dense input).
But the same works when alpha value is set to 1.

@pytest.mark.parametrize("alpha", [1, 0])
    def test_compare_sparse_with_dense_input(X_y_data, alpha):
        X, y = X_y_data
        reg_dense = QuantileRegressor(alpha=alpha).fit(X, y)
        sparse_x = sparse.csr_matrix(X)
        reg_sparse = QuantileRegressor(alpha=alpha).fit(sparse_x, y)
>       assert_array_almost_equal(reg_dense.coef_, reg_sparse.coef_)
E       AssertionError: 
E       Arrays are not almost equal to 6 decimals
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 0.75343428
E       Max relative difference: 0.00959428
E        x: array([79.282985])
E        y: array([78.529551])

tests/test_quantile.py:268: AssertionError
====================================== 1 failed, 1 passed, 14 warnings in 0.41s ======================================

@lorentzenchr
Copy link
Member

@venkyyuvy Does this error persist with solver="highs"?

BTW, the whatsnew entry should go into 1.1 instead of 1.0.

@venkyyuvy
Copy link
Contributor Author

sure.
To my surprise, it works fine for highs solver.

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the following comments have been addressed.

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
@venkyyuvy
Copy link
Contributor Author

venkyyuvy commented Sep 29, 2021

Tests are failing in 3 envs alone. I'm not able to reproduce that in my local.
ping @glemaitre

@glemaitre glemaitre self-assigned this Dec 2, 2021
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I merge main to see if the errors are still issued. Otherwise, I give a couple of comments.

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Show resolved Hide resolved
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another round. I still need to look at the tests.

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to decide which solver to allow for sparse input X. The 4 solvers "highs-ds", "highs-ipm", "highs", "interior-point" are possible.
Edit I, however, am in favor of starting with the highs* ones only.
I think, we should for sure add all highs solvers for sparse matrices as well as "interior-point" (in which case one needs to pass the extra option "sparse": True" in linprog). This then needs a test checking for a raised ValueError in the case of sparse input and solvers that don't support sparse.

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Next iteration. The docstring of the parameter solver needs an update about which solvers support sparse input.

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
@pytest.mark.parametrize(
"sparse_format", [sparse.csc_matrix, sparse.csr_matrix, sparse.coo_matrix]
)
@pytest.mark.parametrize("solver", ["highs"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("solver", ["highs"])
@pytest.mark.parametrize("solver", ["highs-ds", "highs-ipm", "highs", "interior-point"])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for interior-point solver alone, the test cases are failing. can we allow only highs* solver as of now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But then we should raise a ValueError, if interior-point is used with sparse data and the docstring has to be updated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And add it to test_compatible_solver_sparse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error persist even if I set 'sparse': True in solver options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, will do.

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. 2 final nitpicks.
@venkyyuvy Thanks for the endurance to finish this. Still, we need one more reviewer approval.

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved
@lorentzenchr
Copy link
Member

@glemaitre @agramfort @rth @TomDLT might be interested in giving a 2nd review approval.

@lorentzenchr
Copy link
Member

Maybe also @avidale?

Copy link
Contributor

@avidale avidale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

sklearn/linear_model/tests/test_quantile.py Show resolved Hide resolved
@lorentzenchr
Copy link
Member

Info: I intend to merge if CI is green, @ogrisel wrote LGTM, @avidale and me approved, so I think we're fine.

@venkyyuvy
Copy link
Contributor Author

Thanks @lorentzenchr @glemaitre @ogrisel @avidale

@lorentzenchr lorentzenchr changed the title Sparse data input option for Quantile Regressor ENH support sparse data input for QuantileRegressor Dec 15, 2021
@lorentzenchr lorentzenchr merged commit 9cc494d into scikit-learn:main Dec 15, 2021
@glemaitre
Copy link
Member

Thanks @venkyyuvy

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.