Segfault when passing Criterion object to Forest ensembles with n_jobs>1

Description

When passing in a Criterion object to RandomForest or ExtraTrees as opposed to a Criterion string, I've observed segfaults when fitting when n_jobs is > 1. In my case, I've written a custom Criterion, but can reproduce the problem with one of the sklearn built in criterions if you pass in the Criterion object instead of the string.

I believe the problem is that when creating the list of estimators for the ensemble, the parameters aren't copied so that the same Criterion object is used for all the trees. When n_jobs=1, this is ok because the criterion is re-initialized at each split. However, when n_jobs>1, the same criterion is modified by multiple threads resulting in cases where pointers are freed and then accessed.

Steps/Code to Reproduce

The following code reproduces the segfault:

from sklearn.ensemble import ExtraTreesRegressor
from sklearn.tree.tree import CRITERIA_REG
import numpy as np

X = np.random.random((1000, 3))
y = np.random.random((1000, 1))

n_samples, n_outputs = y.shape
mse_criterion = CRITERIA_REG['mse'](n_outputs, n_samples)
rf = ExtraTreesRegressor(n_estimators=400, n_jobs=-1, criterion=mse_criterion)

rf.fit(X,y)

Versions

System

python: 3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 21:41:56)  [GCC 7.3.0]

Python deps

sklearn: 0.20.0
setuptools: 40.2.0
pip: 10.0.1
Cython: 0.28.5
numpy: 1.13.3
pandas: 0.23.4
scipy: 1.1.0

Discussion

I've tried adding a call to copy.deepcopy() around the getattr call for all the parameters accessed when making the estimators to fit which seems to fix the problem. Would that be an acceptable fix or are you interested in a deeper fix?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Segfault when passing Criterion object to Forest ensembles with n_jobs>1 #12623

Description

Steps/Code to Reproduce

Versions

System

Python deps

Discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Segfault when passing Criterion object to Forest ensembles with n_jobs>1 #12623

Description

Description

Steps/Code to Reproduce

Versions

System

Python deps

Discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions