Add support for array API to RidgeCV #27961

jeromedockes · Dec 14, 2023

Reference Issues/PRs

Towards #26024.

This PR extends the one for Ridge (still WIP, #27800) to use the array API in RidgeCV and RidgeClassifierCV (when cv="gcv")

What does this implement/fix? Explain your changes.

this could make those estimators faster as an important part of their computational cost is due to compute either an eigendecomposition of XX^T or an SVD of X

Any other comments?

The _RidgeGCV has numerical precision issues when computations are done in float32, which is why ATM in the main branch it always uses float64
I'm not sure what should be done for array API inputs on devices that do not have float64

not handled yet:

RidgeClassifierCV

github-actions · Dec 14, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 07af173. Link to the linter CI: here}

jeromedockes · Dec 15, 2023

I think the test failures for Ridge and RidgeCV arise from r2_score and will be handled in #27904
For RidgeClassifierCV we need to support the array API in LabelBinarizer

ogrisel · Mar 14, 2024

While I am thinking about it, please don't forget to update:

https://scikit-learn.org/dev/modules/array_api.html#support-for-array-api-compatible-inputs

ogrisel · May 15, 2024

sklearn/linear_model/_ridge.py

+        if sparse.issparse(X):
+            dtype = np.float64
+        else:
+            dtype = [xp.float64, xp.float32]


Contrary to what I said in this morning meeting, I think we might want to implement the following logic:

if the input namespace/device supports xp.float64 upcasting, then do the upcast (as we currently do with NumPy)

if not (e.g. pytorch + MPS device combination), accept that we have degraded numerical performance, adjust the tolerance in the tests accordingly and document this limited numerical precision guarantee in our Array API doc.

I think this is the strategy we are leaning towards in the review of #27113. During the review of the r2_score PR, I believe that @adrinjalali preferred that approach.

In a future PR, we might decide to drop the float32 -> float64 upcast in general for this estimator (as it silently triggers a potentially very large and unexpected memory allocation which is a usability problem in itself, even with NumPy) but I would rather make this decision independently of Array API support.

how would you recommend I check if the upcasting is possible? should I temporarily copy the max_precision_float_dtype and supported_float_dtypes changes from 27113 until it is merged? or is there already a utility in scikit-learn for checking that which I missed?

Feel free to copy with a TODO comment to remove redundant code once #27113 is merged to be able to decouple the 2 reviews.

when we do the upcast with what precision should we store the coefficients and intercept? I guess for prediction we do not need the extra precision so we should use X's original dtype?

sklearn/linear_model/_ridge.py

sklearn/utils/_array_api.py

adrinjalali

This is neat! From my point of view LGTM. But I haven't checked the tests or mathematical correctness.

adrinjalali · May 25, 2024

sklearn/linear_model/_ridge.py

        w = 1.0 / (eigvals + alpha)
        if self.fit_intercept:
            # the vector containing the square roots of the sample weights (1
            # when no sample weights) is the eigenvector of XX^T which
            # corresponds to the intercept; we cancel the regularization on
            # this dimension. the corresponding eigenvalue is
            # sum(sample_weight).
-            normalized_sw = sqrt_sw / np.linalg.norm(sqrt_sw)
+            norm = xp.linalg.vector_norm if is_array_api else np.linalg.norm


I think one could write a lightweight narwhals version for array API

We already have array_api_compat and our own numpy wrapper that we could leverage to help us do that.

There is a new array-api-extra project that has started to implement missing utilities on top of what is standardized in the spec and therefore implemented in the array-api-compat project:

https://data-apis.org/array-api-extra/

ogrisel

I pushed 84535ef to actually make the tests pass with PyTorch and the MPS device.

I also pushed b64baaa to revert most of the extra complexity in decision_function that did not seem justified when running the tests locally (including with PyTorch and MPS).

Apart from this, here is a new round of feedback.

sklearn/linear_model/_ridge.py

sklearn/utils/_array_api.py

sklearn/linear_model/_ridge.py

sklearn/utils/validation.py

sklearn/utils/_array_api.py

ogrisel

LGTM once coverage is improved a bit (for the easy to cover lines).

doc/modules/array_api.rst

sklearn/linear_model/_ridge.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

jeromedockes · Feb 28, 2025

sklearn/linear_model/tests/test_ridge.py

+        # The RidgeGCV is not very numerically stable in float32. It casts the
+        # input to float64 unless the device and array api combination makes it
+        # impossible.
+        tols = {"rtol": 1e-3, "atol": 1e-3}


this can only be covered on devices where the max float precision is float32

jeromedockes · Feb 28, 2025

sklearn/utils/_array_api.py

+            new_arrays.append(xp.asarray(array, device=device_))
+            continue
+        except Exception:
+            # direct conversion to a different library may fail in which


there is a test here to cover those lines (I checked it does locally) but only when both torch and array-api-strict are installed, is there such a configuration in one of the CI jobs?

github-actions bot added module:linear_model module:preprocessing module:utils labels Dec 14, 2023

jeromedockes marked this pull request as draft December 14, 2023 15:34

betatim added the Array API label Feb 1, 2024

jeromedockes mentioned this pull request Mar 13, 2024

add array api support in label binarizer #28626

Closed

1 task

jeromedockes added 3 commits May 15, 2024 09:38

arrayapi support in ridgecv

e36ba01

update array_api & validation & test

fba836a

_

54dbac2

jeromedockes force-pushed the ridgecv-arrayapi branch from 6be83f7 to 54dbac2 Compare May 15, 2024 08:55

array_api.rst & whatsnew

588a9e7

ogrisel reviewed May 15, 2024

View reviewed changes

betatim reviewed May 16, 2024

View reviewed changes

sklearn/linear_model/_ridge.py Outdated Show resolved Hide resolved

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

jeromedockes added 2 commits May 23, 2024 12:34

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

bf217f6

address review comments

f7926a6

adrinjalali approved these changes May 25, 2024

View reviewed changes

jeromedockes added 11 commits May 28, 2024 09:28

copy over supporte_float_dtypes and max_precision_float_dtype from 27113

cc58c72

upcast to float64 when possible

1b53c90

fix choice of attributes' dtype

d550d68

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

4e39a62

classifier

16d7b96

convert alphas

5d91b8e

y and sample_weight follow x

8299539

class weight

63ec92b

fix prepare_data & _score

d6a2207

check X and coef_ in same namespace

3297080

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

bfa5293

github-actions bot removed the CUDA CI label Dec 11, 2024

ogrisel added 2 commits December 11, 2024 18:49

FIX: make tests pass with torch and MPS device

84535ef

Keep decision_function minimal

b64baaa

ogrisel added the CUDA CI label Dec 11, 2024

github-actions bot removed the CUDA CI label Dec 11, 2024

ogrisel reviewed Dec 11, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

jeromedockes added 12 commits January 20, 2025 13:53

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

b186add

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

daa9138

move_to_namespace_and_device instead of make_converter

960c9e9

use _ravel

e860513

restore supported_float_dtypes from main

bde45bf

do not check for sparse matrices in get_namespace

e7fa3fd

remove unrelated change

f915c41

use get_namespace_and_device

e0e632d

add _NumpyLinalgAPIWrapper

2d53495

_

9dbde4b

_

332d21b

add test

0c23441

ogrisel added the CUDA CI label Feb 6, 2025

github-actions bot removed the CUDA CI label Feb 6, 2025

ogrisel approved these changes Feb 6, 2025

View reviewed changes

doc/modules/array_api.rst Outdated Show resolved Hide resolved

sklearn/linear_model/_ridge.py Show resolved Hide resolved

jeromedockes and others added 6 commits February 28, 2025 10:08

Apply suggestions from code review

16ebdaf

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

8958385

tests + ridgeclassifier multilabel issue

be33c87

test

afb9f51

_

85e40b5

_

4d1a13b

jeromedockes commented Feb 28, 2025

View reviewed changes

Merge remote-tracking branch 'upstream/main' into ridgecv-arrayapi

07af173

Search code, repositories, users, issues, pull requests...

Uh oh!

Add support for array API to RidgeCV #27961

Are you sure you want to change the base?

Add support for array API to RidgeCV #27961

Uh oh!

Conversation

jeromedockes commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

jeromedockes commented Dec 15, 2023

Uh oh!

ogrisel commented Mar 14, 2024

Uh oh!

ogrisel May 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeromedockes commented Dec 14, 2023 •

edited

Loading

github-actions bot commented Dec 14, 2023 •

edited

Loading

ogrisel May 15, 2024 •

edited

Loading