ENH Allows disabling refitting of CV estimators #30463

AhmedThahir · Dec 11, 2024

Reference Issues/PRs

Addresses ENH Allow disabling refitting of cross-validation estimators #30396
Created PR as per @jeremiedbb's advice.

What does this implement/fix? Explain your changes.

What

Allows disable refitting of cross-validation estimators (such as LassoCV, RidgeCV) on the full training set after finding the best hyperparameters.
User may use a keyword argument refit to toggle this behavior.

Why
User does not want to waste resources on refitting when user only wants one/more of the following, which do not involve refitting:

optimal hyperparameter
cv_results_ for the different hyperparameters
best_score_ of all the hyperparameters

This is especially impactful for large datasets.

How

Added a refit argument which disables refitting if refit=False.
By default, refit=True is set to prevent breaking existing usage

Any other comments?

This is my first (hopefully first of many) PR for scikit-learn. If you have any feedback on my implementation/PR documentation/etc, feel free to share - I'd really appreciate it.

Thanks to @paulAdrienMarie for all the support!

Addresses scikit-learn#30396

github-actions · Dec 11, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`black`

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/sklearn/linear_model/tests/test_ridge.py	2025-02-11 18:37:51.534575+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/sklearn/linear_model/tests/test_ridge.py	2025-02-11 18:38:09.269688+00:00
@@ -629,25 +629,27 @@
     ridge = Ridge(alpha=penalties[:-1])
     err_msg = "Number of targets and number of penalties do not correspond: 4 != 5"
     with pytest.raises(ValueError, match=err_msg):
         ridge.fit(X, y)
 
+
 def test_ridgecv_regression_refit():
     rng = np.random.RandomState(0)
     alphas = (0.1, 1.0, 10.0)
 
     for n_samples, n_features in ((6, 5), (5, 10)):
         y = rng.randn(n_samples)
         X = rng.randn(n_samples, n_features)
-        
+
         refit_true = RidgeCV(alphas=alphas, refit=True)
         refit_true.fit(X, y)
 
         refit_false = RidgeCV(alphas=alphas, refit=False)
         refit_false.fit(X, y)
-        
+
         assert_array_almost_equal(refit_true.best_score_, refit_false.best_score_)
+
 
 def test_ridgecv_regression_refit():
     rng = np.random.RandomState(0)
     alphas = (0.1, 1.0, 10.0)
 
@@ -657,12 +659,13 @@
         refit_true = RidgeClassifierCV(alphas=alphas, refit=True)
         refit_true.fit(X, y)
 
         refit_false = RidgeClassifierCV(alphas=alphas, refit=False)
         refit_false.fit(X, y)
-        
+
         assert_array_almost_equal(refit_true.best_score_, refit_false.best_score_)
+
 
 @pytest.mark.parametrize("n_col", [(), (1,), (3,)])
 @pytest.mark.parametrize("csr_container", CSR_CONTAINERS)
 def test_X_CenterStackOp(n_col, csr_container):
     rng = np.random.RandomState(0)
would reformat /home/runner/work/scikit-learn/scikit-learn/sklearn/linear_model/tests/test_ridge.py

Oh no! 💥 💔 💥
1 file would be reformatted, 924 files would be left unchanged.

`ruff`

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


sklearn/linear_model/tests/test_ridge.py:641:1: W293 [*] Blank line contains whitespace
    |
639 |         y = rng.randn(n_samples)
640 |         X = rng.randn(n_samples, n_features)
641 |         
    | ^^^^^^^^ W293
642 |         refit_true = RidgeCV(alphas=alphas, refit=True)
643 |         refit_true.fit(X, y)
    |
    = help: Remove whitespace from blank line

sklearn/linear_model/tests/test_ridge.py:647:1: W293 [*] Blank line contains whitespace
    |
645 |         refit_false = RidgeCV(alphas=alphas, refit=False)
646 |         refit_false.fit(X, y)
647 |         
    | ^^^^^^^^ W293
648 |         assert_array_almost_equal(refit_true.best_score_, refit_false.best_score_)
    |
    = help: Remove whitespace from blank line

sklearn/linear_model/tests/test_ridge.py:650:5: F811 Redefinition of unused `test_ridgecv_regression_refit` from line 634
    |
648 |         assert_array_almost_equal(refit_true.best_score_, refit_false.best_score_)
649 | 
650 | def test_ridgecv_regression_refit():
    |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ F811
651 |     rng = np.random.RandomState(0)
652 |     alphas = (0.1, 1.0, 10.0)
    |
    = help: Remove definition: `test_ridgecv_regression_refit`

sklearn/linear_model/tests/test_ridge.py:662:1: W293 [*] Blank line contains whitespace
    |
660 |         refit_false = RidgeClassifierCV(alphas=alphas, refit=False)
661 |         refit_false.fit(X, y)
662 |         
    | ^^^^^^^^ W293
663 |         assert_array_almost_equal(refit_true.best_score_, refit_false.best_score_)
    |
    = help: Remove whitespace from blank line

Found 4 errors.
[*] 3 fixable with the `--fix` option.

`mypy`

mypy detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed mypy version is mypy=1.9.0.


sklearn/externals/_arff.py:782: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/utils/_testing.py:1317: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/utils/_testing.py:1318: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/utils/_testing.py:1319: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/tree/_classes.py:1991: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/utils/tests/test_tags.py:76: note: By default the bodies of untyped functions are not checked, consider using --check-untyped-defs  [annotation-unchecked]
sklearn/linear_model/tests/test_ridge.py:650: error: Name "test_ridgecv_regression_refit" already defined on line 634  [no-redef]
Found 1 error in 1 file (checked 559 source files)

_{Generated for commit: aa386ef. Link to the linter CI: here}

AhmedThahir · Dec 11, 2024

I think I'm done from my side with the code changes and documentation.

Code changes for ElasticNetCV and MultitaskElasticNetCV will be done by @paulAdrienMarie

AhmedThahir · Dec 11, 2024

@kayo09, please do not make such reviews - it confuses us.

Only @paulAdrienMarie and I are assigned to work on this PR.

Edit: @kayo09, could you try to remove the review.

Update _coordinate_descent.py

AhmedThahir

@paulAdrienMarie The refit parameter was supposed to appear after cv parameter. This is not a programming issue, but a conflict with the sklearn docs automation, as I followed the scheme followed by GridSearchCV and accordingly made the documentation in the same order, ie refit after cv.

Just sharing this as feedback so that it may be useful for you in the future - not as criticism.

This was the error message from which I understood:

AssertionError: Docstring Error:
In function: sklearn.linear_model._coordinate_descent.ElasticNetCV.__init__
There's a parameter name mismatch in function docstring w.r.t. function signature, at index 8 diff: 'cv' != 'refit'
Full diff:
['l1_ratio',
'eps',
'n_alphas',
'alphas',
'fit_intercept',
'precompute',
'max_iter',
'tol',
+  'refit',
'cv',
'copy_X',
-  'refit',
'verbose',
'n_jobs',
'positive',
'random_state',
'selection']
In function: sklearn.linear_model._coordinate_descent.MultiTaskElasticNetCV.__init__
There's a parameter name mismatch in function docstring w.r.t. function signature, at index 7 diff: 'cv' != 'refit'
Full diff:
['l1_ratio',
'eps',
'n_alphas',
'alphas',
'fit_intercept',
'max_iter',
'tol',
+  'refit',
'cv',
'copy_X',
-  'refit',
'verbose',
'n_jobs',
'random_state',
'selection']

AhmedThahir · Dec 19, 2024

Hi @paulAdrienMarie

Could you help with fixing the code coverage issue?

What would be the best way to go about it?

paulAdrienMarie · Dec 19, 2024

Hi @AhmedThahir

I think we need to add unitests for the feature we added to the models using CrossValidation. I can take a look at it !

AhmedThahir · Jan 19, 2025

Hi @paulAdrienMarie, were you able to able to make any progress?

@jeremiedbb and @alifa98, could you pls support us? I've not been able to understand how to make appropriate tests.

OmarManzoor

Thanks for the PR @AhmedThahir

For the tests maybe you could try fitting once with refit=True and once with refit=False and check that the attributes of interest like the alphas are similar. You could do this for all the CV estimators that you are modified to include this new behavior.

Once that is taken care of we might also need a ChangeLog entry.

sklearn/linear_model/_omp.py

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

AhmedThahir · Feb 11, 2025

@OmarManzoor, thanks for the response :)

Thanks for the PR @AhmedThahir

:)

For the tests maybe you could try fitting once with refit=True and once with refit=False and check that the attributes of interest like the alphas are similar. You could do this for all the CV estimators that you are modified to include this new behavior.

Noted, can you check if this is a good idea to compare the best_score_: aa386ef. If yes, I'll replicate the same for other estimators as well.

Once that is taken care of we might also need a ChangeLog entry.

Okay, will contact you once that's done

AhmedThahir · Mar 7, 2025

Hi there, can anyone support with this?

AhmedThahir · Jul 2, 2025

Hi @jeremiedbb, hope you are doing well.

If it is not a good idea to implement this in sciki-learn, we can close this issue to keep the issues list clean.

Those interested can create a custom estimator on their own using my proposed logic.

Allow disabling refitting of CV estimators

e3b6399

Addresses scikit-learn#30396

github-actions bot added the module:linear_model label Dec 11, 2024

AhmedThahir mentioned this pull request Dec 11, 2024

ENH Allow disabling refitting of cross-validation estimators #30396

Open

AhmedThahir added 4 commits December 11, 2024 18:15

Adding for Ridge, Lars, OrthogonalMatchingPursuitCV

777bb55

Documenting parameter

0650ec1

Update _least_angle.py

5204077

Formatting

3a0b781

paulAdrienMarie added a commit to paulAdrienMarie/scikit-learn that referenced this pull request Dec 11, 2024

scikit-learn#30463 Allowing disabling refitting of CV estimators

28bbd61

Setting parameter constraints

2b6557b

paulAdrienMarie added a commit to paulAdrienMarie/scikit-learn that referenced this pull request Dec 11, 2024

scikit-learn#30463 removing modifications on the test file

c0dcb89

Pop refit from path_params

8c8dacc

kayo09 approved these changes Dec 11, 2024

View reviewed changes

paulAdrienMarie and others added 4 commits December 11, 2024 19:42

Update _coordinate_descent.py

4759bb8

Merge pull request #1 from paulAdrienMarie/patch-1

ad719e8

Update _coordinate_descent.py

Merge

6177f9f

Reorder refit parameter

59712ad

AhmedThahir commented Dec 11, 2024

View reviewed changes

AhmedThahir changed the title ~~Allows disabling refitting of CV estimators~~ ENH Allows disabling refitting of CV estimators Dec 12, 2024

OmarManzoor reviewed Feb 10, 2025

View reviewed changes

sklearn/linear_model/_omp.py Outdated Show resolved Hide resolved

Fix bug: self.fit to self.refit

07d2691

Co-authored-by: Omar Salman <omar.salman2007@gmail.com>

add test for RidgeCV & RidgeClassifierCV

aa386ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH Allows disabling refitting of CV estimators #30463

ENH Allows disabling refitting of CV estimators #30463

Uh oh!

AhmedThahir commented Dec 11, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Dec 11, 2024 •

edited

Loading

Uh oh!

AhmedThahir commented Dec 11, 2024

Uh oh!

AhmedThahir commented Dec 11, 2024 •

edited

Loading

Uh oh!

AhmedThahir left a comment •

edited

Loading

Uh oh!

AhmedThahir commented Dec 19, 2024

Uh oh!

paulAdrienMarie commented Dec 19, 2024

Uh oh!

AhmedThahir commented Jan 19, 2025 •

edited

Loading

Uh oh!

OmarManzoor left a comment

Uh oh!

Uh oh!

AhmedThahir commented Feb 11, 2025 •

edited

Loading

Uh oh!

AhmedThahir commented Mar 7, 2025

Uh oh!

AhmedThahir commented Jul 2, 2025

Uh oh!

Uh oh!

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH Allows disabling refitting of CV estimators #30463

Are you sure you want to change the base?

ENH Allows disabling refitting of CV estimators #30463

Uh oh!

Conversation

AhmedThahir commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

❌ Linting issues

black

ruff

mypy

Uh oh!

AhmedThahir commented Dec 11, 2024

Uh oh!

AhmedThahir commented Dec 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AhmedThahir left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

AhmedThahir commented Dec 19, 2024

Uh oh!

paulAdrienMarie commented Dec 19, 2024

Uh oh!

AhmedThahir commented Jan 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AhmedThahir commented Feb 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AhmedThahir commented Mar 7, 2025

Uh oh!

AhmedThahir commented Jul 2, 2025

Uh oh!

Uh oh!

AhmedThahir commented Dec 11, 2024 •

edited

Loading

github-actions bot commented Dec 11, 2024 •

edited

Loading

`black`

`ruff`

`mypy`

AhmedThahir commented Dec 11, 2024 •

edited

Loading

AhmedThahir left a comment •

edited

Loading

AhmedThahir commented Jan 19, 2025 •

edited

Loading

AhmedThahir commented Feb 11, 2025 •

edited

Loading