Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Overestimation of OOB score, probable bug in resampling? #655

Copy link
Copy link
Closed
@SvenWarnke

Description

@SvenWarnke
Issue body actions

Description

When I calculate out of bag score the out of bag score is quite high, even if there is no connection between features and labels. I assume, that something goes wrong in keeping track of which samples are out of bag for each tree. Hence, the samples get evaluated on some trees where they where in fact in the bag.

Steps/Code to Reproduce

Example:

import numpy as np
from imblearn import ensemble

X = np.arange(1000).reshape(-1, 1)
y = np.random.binomial(1, 0.5, size=1000)

rf = ensemble.BalancedRandomForestClassifier(oob_score=True)
rf.fit(X, y)
rf.oob_score_

the output is 0.838

Expected Results

Since there is no relationship between the X, y (y are just independent coin flips) OOB should be around 0.5

Actual Results

Something in the range of 0.8, which is very significant on a sample size of 1000.

Versions

Windows-10-10.0.18362-SP0
Python 3.7.4 (default, Aug 9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)]
NumPy 1.16.5
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: BugIndicates an unexpected problem or unintended behaviorIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.