Description
Description
When I calculate out of bag score the out of bag score is quite high, even if there is no connection between features and labels. I assume, that something goes wrong in keeping track of which samples are out of bag for each tree. Hence, the samples get evaluated on some trees where they where in fact in the bag.
Steps/Code to Reproduce
Example:
import numpy as np
from imblearn import ensemble
X = np.arange(1000).reshape(-1, 1)
y = np.random.binomial(1, 0.5, size=1000)
rf = ensemble.BalancedRandomForestClassifier(oob_score=True)
rf.fit(X, y)
rf.oob_score_
the output is 0.838
Expected Results
Since there is no relationship between the X, y (y are just independent coin flips) OOB should be around 0.5
Actual Results
Something in the range of 0.8, which is very significant on a sample size of 1000.
Versions
Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.5
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0
Metadata
Metadata
Assignees
Labels
Indicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavior