ENH support sparse data input for QuantileRegressor #21086

venkyyuvy · Sep 19, 2021

Reference Issues/PRs:

partially addresses #20132.

What does this implement/fix? Explain your changes.

Edit: This enables sparse X for QuantileRegressor with the highs* solvers.

Any other comments?

venkyyuvy · Sep 20, 2021

coefs for sparse and dense array inputs differ when alpha =0.

@glemaitre
Any pointers would be helpful to fix it.

glemaitre · Sep 20, 2021

Thanks @venkyyuvy It seems that the CIs are still not passing. Can you check what is the reason. Do not hesitate if it seems to be a tricky corner case.

venkyyuvy · Sep 21, 2021

There is change in the coef value when alpha is 0 (between sparse and dense input).
But the same works when alpha value is set to 1.

@pytest.mark.parametrize("alpha", [1, 0])
    def test_compare_sparse_with_dense_input(X_y_data, alpha):
        X, y = X_y_data
        reg_dense = QuantileRegressor(alpha=alpha).fit(X, y)
        sparse_x = sparse.csr_matrix(X)
        reg_sparse = QuantileRegressor(alpha=alpha).fit(sparse_x, y)
>       assert_array_almost_equal(reg_dense.coef_, reg_sparse.coef_)
E       AssertionError: 
E       Arrays are not almost equal to 6 decimals
E       
E       Mismatched elements: 1 / 1 (100%)
E       Max absolute difference: 0.75343428
E       Max relative difference: 0.00959428
E        x: array([79.282985])
E        y: array([78.529551])

tests/test_quantile.py:268: AssertionError
====================================== 1 failed, 1 passed, 14 warnings in 0.41s ======================================

lorentzenchr · Sep 21, 2021

@venkyyuvy Does this error persist with solver="highs"?

BTW, the whatsnew entry should go into 1.1 instead of 1.0.

venkyyuvy · Sep 22, 2021

sure.
To my surprise, it works fine for highs solver.

doc/whats_new/v1.0.rst

ogrisel

LGTM once the following comments have been addressed.

sklearn/linear_model/tests/test_quantile.py

venkyyuvy · Sep 29, 2021

Tests are failing in 3 envs alone. I'm not able to reproduce that in my local.
ping @glemaitre

glemaitre

I merge main to see if the errors are still issued. Otherwise, I give a couple of comments.

sklearn/linear_model/_quantile.py

lorentzenchr

Another round. I still need to look at the tests.

sklearn/linear_model/_quantile.py

test cases for csc alone

lorentzenchr

We need to decide which solver to allow for sparse input X. The 4 solvers "highs-ds", "highs-ipm", "highs", "interior-point" are possible.
Edit ~~I, however, am in favor of starting with the highs* ones only.~~
I think, we should for sure add all highs solvers for sparse matrices as well as "interior-point" (in which case one needs to pass the extra option "sparse": True" in linprog). This then needs a test checking for a raised ValueError in the case of sparse input and solvers that don't support sparse.

sklearn/linear_model/_quantile.py

sklearn/linear_model/tests/test_quantile.py

lorentzenchr

Next iteration. The docstring of the parameter solver needs an update about which solvers support sparse input.

sklearn/linear_model/_quantile.py

lorentzenchr · Dec 14, 2021

sklearn/linear_model/tests/test_quantile.py

+@pytest.mark.parametrize(
+    "sparse_format", [sparse.csc_matrix, sparse.csr_matrix, sparse.coo_matrix]
+)
+@pytest.mark.parametrize("solver", ["highs"])


Suggested change

@pytest.mark.parametrize("solver", ["highs"])

@pytest.mark.parametrize("solver", ["highs-ds", "highs-ipm", "highs", "interior-point"])

for interior-point solver alone, the test cases are failing. can we allow only highs* solver as of now?

Yes. But then we should raise a ValueError, if interior-point is used with sparse data and the docstring has to be updated.

And add it to test_compatible_solver_sparse.

error persist even if I set 'sparse': True in solver options.

sure, will do.

sklearn/linear_model/tests/test_quantile.py

lorentzenchr

LGTM. 2 final nitpicks.
@venkyyuvy Thanks for the endurance to finish this. Still, we need one more reviewer approval.

sklearn/linear_model/_quantile.py

lorentzenchr · Dec 15, 2021

@glemaitre @agramfort @rth @TomDLT might be interested in giving a 2nd review approval.

lorentzenchr · Dec 15, 2021

Maybe also @avidale?

avidale

LGTM!

sklearn/linear_model/tests/test_quantile.py

lorentzenchr · Dec 15, 2021

Info: I intend to merge if CI is green, @ogrisel wrote LGTM, @avidale and me approved, so I think we're fine.

venkyyuvy · Dec 15, 2021

Thanks @lorentzenchr @glemaitre @ogrisel @avidale

glemaitre · Dec 15, 2021

Thanks @venkyyuvy

sparse support

8b038db

github-actions bot added the module:linear_model label Sep 19, 2021

venkyyuvy added 2 commits September 20, 2021 09:22

compare sparse and dense input models

6fab90c

change log

39f31d3

remove print

27c8623

venkyyuvy added 2 commits September 22, 2021 08:57

changing solver

77e7793

updating changelog

cdc1ce1

handling coo format indexing

b328296

ogrisel reviewed Sep 23, 2021

View reviewed changes

doc/whats_new/v1.0.rst Outdated Show resolved Hide resolved

ogrisel reviewed Sep 23, 2021

View reviewed changes

venkyyuvy added 2 commits September 26, 2021 16:24

fixing change log

34511aa

merge with main

a359572

github-actions bot added the cython label Sep 26, 2021

venkyyuvy added 6 commits September 26, 2021 16:44

Merge remote-tracking branch 'upstream/main' into sparse_qr

2a25ffb

more parametrize

4a52a1b

more parametrization

4d4e026

adjusting penalty

cd8c53a

skipping test for low sp_version

3bed0bc

correcting version check

e5a3d37

glemaitre self-assigned this Dec 2, 2021

glemaitre added 2 commits December 2, 2021 12:06

Merge remote-tracking branch 'origin/main' into pr/venkyyuvy/21086

f04e985

DOC update whats new

a5aa914

glemaitre reviewed Dec 2, 2021

View reviewed changes

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/_quantile.py Show resolved Hide resolved

lorentzenchr reviewed Dec 10, 2021

View reviewed changes

venkyyuvy added 2 commits December 12, 2021 20:45

fixing nonzero sample weight indexing

33bf1e2

test cases for csc alone

updating the validate_data

c07a77b

venkyyuvy requested review from glemaitre and lorentzenchr December 13, 2021 12:01

lorentzenchr reviewed Dec 13, 2021

View reviewed changes

venkyyuvy added 2 commits December 14, 2021 12:53

allow all sparse formats

e871bf7

checking compatible solvers

5ce57e4

lorentzenchr reviewed Dec 14, 2021

View reviewed changes

venkyyuvy added 3 commits December 15, 2021 13:59

cleaning test cases

fad71b2

solver docstring update

db98459

doc update

aff659d

lorentzenchr reviewed Dec 15, 2021

View reviewed changes

sklearn/linear_model/tests/test_quantile.py Outdated Show resolved Hide resolved

venkyyuvy added 3 commits December 15, 2021 14:52

test_cleaning

49e663e

renaming test case

edb9747

removing interior-pt solver for sparse data

4b24346

lorentzenchr approved these changes Dec 15, 2021

View reviewed changes

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved

sklearn/linear_model/_quantile.py Outdated Show resolved Hide resolved

avidale approved these changes Dec 15, 2021

View reviewed changes

doc fixs

ecb3e4e

lorentzenchr reviewed Dec 15, 2021

View reviewed changes

sklearn/linear_model/tests/test_quantile.py Show resolved Hide resolved

scipy V check for tests

39a97d5

lorentzenchr changed the title ~~Sparse data input option for Quantile Regressor~~ ENH support sparse data input for QuantileRegressor Dec 15, 2021

lorentzenchr merged commit 9cc494d into scikit-learn:main Dec 15, 2021

lorentzenchr mentioned this pull request Dec 15, 2021

Alternative solvers for QuantileRegressor #20132

Open

3 tasks

venkyyuvy deleted the sparse_qr branch December 15, 2021 14:59

jeremiedbb mentioned this pull request Mar 13, 2023

TST sklearn/linear_model/tests/test_quantile.py::test_sparse_input fails on local windows #25838

Closed

	@pytest.mark.parametrize("solver", ["highs"])
	@pytest.mark.parametrize("solver", ["highs-ds", "highs-ipm", "highs", "interior-point"])

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH support sparse data input for QuantileRegressor #21086

ENH support sparse data input for QuantileRegressor #21086

Uh oh!

Conversation

venkyyuvy commented Sep 19, 2021 • edited by lorentzenchr Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs:

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

venkyyuvy commented Sep 20, 2021

Uh oh!

glemaitre commented Sep 20, 2021

Uh oh!

venkyyuvy commented Sep 21, 2021

Uh oh!

lorentzenchr commented Sep 21, 2021

Uh oh!

venkyyuvy commented Sep 22, 2021

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

venkyyuvy commented Sep 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Dec 14, 2021

Choose a reason for hiding this comment

Uh oh!

venkyyuvy Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Dec 15, 2021

Choose a reason for hiding this comment

Uh oh!

lorentzenchr Dec 15, 2021

Choose a reason for hiding this comment

venkyyuvy commented Sep 19, 2021 •

edited by lorentzenchr

Loading

venkyyuvy commented Sep 29, 2021 •

edited

Loading

lorentzenchr left a comment •

edited

Loading