Overestimation of OOB score, probable bug in resampling?

Description

When I calculate out of bag score the out of bag score is quite high, even if there is no connection between features and labels. I assume, that something goes wrong in keeping track of which samples are out of bag for each tree. Hence, the samples get evaluated on some trees where they where in fact in the bag.

Steps/Code to Reproduce

Example:

import numpy as np
from imblearn import ensemble

X = np.arange(1000).reshape(-1, 1)
y = np.random.binomial(1, 0.5, size=1000)

rf = ensemble.BalancedRandomForestClassifier(oob_score=True)
rf.fit(X, y)
rf.oob_score_

the output is 0.838

Expected Results

Since there is no relationship between the X, y (y are just independent coin flips) OOB should be around 0.5

Actual Results

Something in the range of 0.8, which is very significant on a sample size of 1000.

Versions

Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.5
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Overestimation of OOB score, probable bug in resampling? #655

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Overestimation of OOB score, probable bug in resampling? #655

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions