FIX Draw indices using sample_weight in Bagging #31414

antoinebaker · May 22, 2025

Part of #16298 and alternative to #31165.

What does this implement/fix? Explain your changes.

In Bagging estimators, sample_weight is now used to draw the samples and no longer forwarded to the underlying estimators. Bagging estimators now pass the statistical repeated/weighted equivalence test when bootstrap=True (the default, ie draw with replacement).

Compared to #31165, it better decouples two different usages of sample_weight:

sample_weight in bagging_estimator.fit are used as probabilities to draw the indices/rows
sample_weight in base_estimator.fit are used to represent the indices (more memory efficient than indexing), this is possible only if base_estimator.fit supports sample_weight (through metadata routing or natively).

#31165 introduced a new sampling_strategy argument to choose indexing/weighting for row sampling, but it would be better to do this in a dedicated follow up PR.

cc @ogrisel @GaetandeCast

github-actions · May 22, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: bc4e499. Link to the linter CI: here}

antoinebaker · May 22, 2025

BaggingRegressor(estimator=Ridge(), max_samples=100) now passes the statistical repeated/weighted equivalence test

Idem for BaggingClassifier(estimator=LogisticRegression(), max_samples=100)) and varying max_samples.

antoinebaker · May 22, 2025

However it fails (as expected) for bootstrap=False (draw without replacement), for example BaggingRegressor(estimator=Ridge(), bootstrap=False, max_samples=10)

ogrisel · May 23, 2025

However it fails (as expected) for bootstrap=False (draw without replacement).

Could you please document this known limitation, both in the docstring of the __init__ method for the bootstrap parameter and in the docstring of the fit method for the sample_weight parameter?

Something like: "Note that the expected frequency semantics for the sample_weight parameter are only fulfilled when sampling with replacement bootstrap=True".

~~Maybe we should raise a warning when calling BaggingClassifier(bootstrap=False, max_samples=0.5).fit(X, y, sample_weight=sample_weight) with sample_weight is not None.~~ The warning is already implemented and tested: https://github.com/scikit-learn/scikit-learn/pull/31414/files#diff-b7c01e77fe68ded1e41868f4a7e142190f935261624d4abdb299913ef944cbbbR676-R682.

ogrisel

Here is a pass of review. Could you please add a non-regression test using a small dataset with specifically engineered weights? For instance, you could have a dataset with 100 datapoints, with 98 data points with a null weight, 1 data point, with a weight of 1 and 1 with a weight of 2:

X = np.arange(100).reshape(-1, 1)
y = (X < 99).astype(np.int32)
sample_weight = np.zeros(shape=X.shape[0])
sample_weight[0] = 1
sample_weight[-1] = 2

Then you could fit a BaggingRegressor and a BaggleClassifier with a fake test estimator that just records the values passed as X, y and sample_weight as fitted attribute to be able to write assertions in the test.

Ideally this test should pass both with metadata routing enabled and disabled.

doc/whats_new/upcoming_changes/sklearn.ensemble/31414.fix.rst

sklearn/ensemble/_bagging.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel · May 27, 2025

doc/whats_new/upcoming_changes/sklearn.ensemble/31414.fix.rst

@@ -0,0 +1,5 @@
+- :class:`ensemble.BaggingClassfier`, :class:`ensemble.BaggingRegressor`
+  and :class:`ensemble.IsolationForest` now use `sample_weight` to draw
+  the samples instead of forwarding them multiplied with a uniformly sampling


Sorry, I made a typo in my past suggestion:

Suggested change

the samples instead of forwarding them multiplied with a uniformly sampling

the samples instead of forwarding them multiplied by a uniformly sampled

draw indices using sample_weight

0092620

github-actions bot added the module:ensemble label May 22, 2025

changelog

289a3d2

antoinebaker added 2 commits May 23, 2025 09:41

move consumes sw

d0f3ddb

test warning

804d863

antoinebaker mentioned this pull request May 23, 2025

FIX Use sample weight to draw samples in Bagging estimators #31165

Closed

antoinebaker marked this pull request as ready for review May 23, 2025 13:15

ogrisel added this to Losses and solvers May 23, 2025

ogrisel mentioned this pull request May 23, 2025

List of estimators with known incorrect handling of sample_weight #16298

Open

54 tasks

docstring

f01c278

ogrisel reviewed May 26, 2025

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.ensemble/31414.fix.rst Outdated Show resolved Hide resolved

sklearn/ensemble/_bagging.py Outdated Show resolved Hide resolved

sklearn/ensemble/_bagging.py Outdated Show resolved Hide resolved

Apply suggestions from code review

bc4e499

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

ogrisel reviewed May 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Draw indices using sample_weight in Bagging #31414

FIX Draw indices using sample_weight in Bagging #31414

Uh oh!

antoinebaker commented May 22, 2025 •

edited

Loading

Uh oh!

github-actions bot commented May 22, 2025 •

edited

Loading

Uh oh!

antoinebaker commented May 22, 2025 •

edited

Loading

Uh oh!

antoinebaker commented May 22, 2025

Uh oh!

ogrisel commented May 23, 2025 •

edited

Loading

Uh oh!

ogrisel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel May 27, 2025 •

edited

Loading

Uh oh!

Uh oh!

	the samples instead of forwarding them multiplied with a uniformly sampling
	the samples instead of forwarding them multiplied by a uniformly sampled

Search code, repositories, users, issues, pull requests...

Uh oh!

FIX Draw indices using sample_weight in Bagging #31414

Are you sure you want to change the base?

FIX Draw indices using sample_weight in Bagging #31414

Uh oh!

Conversation

antoinebaker commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

Uh oh!

github-actions bot commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

antoinebaker commented May 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

antoinebaker commented May 22, 2025

Uh oh!

ogrisel commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

antoinebaker commented May 22, 2025 •

edited

Loading

github-actions bot commented May 22, 2025 •

edited

Loading

antoinebaker commented May 22, 2025 •

edited

Loading

ogrisel commented May 23, 2025 •

edited

Loading

ogrisel May 27, 2025 •

edited

Loading