Description
Describe the issue linked to the documentation
This issue concerns:
- sklearn.ensemble.RandomForestClassifier
- sklearn.ensemble.RandomForestRegressor
and probably: - sklearn.ensemble.ExtraTreesClassifier
- sklearn.ensemble.ExtraTreesRegressor
- sklearn.ensemble.BaggingClassifier
- sklearn.ensemble.BaggingRegressor
The documentation of oob_decision_function_
in RandomForestClassifier is:
Decision function computed with out-of-bag estimate on the training set. If n_estimators is small it might be possible that a data point was never left out during the bootstrap. In this case, oob_decision_function_ might contain NaN. This attribute exists only when oob_score is True.
The description of oob_prediction_
RandomForestRegressor is:
Prediction computed with out-of-bag estimate on the training set. This attribute exists only when oob_score is True.
However, PR #19162 changed the computation of OOB scores and NaN have been replaced with 0. The description in RandomForestRegressor does not even mention the issue that a data point could never be left out.
The PR also dropped support for multiclass-multioutput so I'm not sure if the shape of oob_decision_function_
may still be (n_samples, n_classes, n_outputs)
(not 100% sure on this one).
Suggest a potential alternative/fix
For random forest estimators, the new descriptions could be something like this:
oob_decision_function_ : ndarray of shape (n_samples, n_classes)
Decision function computed with out-of-bag estimate on the training
set. If ``n_estimators`` is small it might be possible that a data point
was never left out during the bootstrap. In this case, the probabilities
for such data points to belong to each class are 0. This attribute exists
only when ``oob_score`` is True.
and
oob_prediction_ : ndarray of shape (n_samples,) or (n_samples, n_outputs)
Prediction computed with out-of-bag estimate on the training set. If
``n_estimators`` is small it might be possible that a data point was never
left out during the bootstrap. In this case, the predictions for such data points
are 0. This attribute exists only when ``oob_score`` is True.
The descriptions for the other estimators may also require some updates.