Add a facility that allows random forest classifiers to be combined after training

Describe the workflow you want to enable

In a federated environment, I have federation elements that build private random forest classifiers, which I would like to combine after the fact into a single random forest.

Describe your proposed solution

See the "alternatives" section.

Describe alternatives you've considered, if relevant

Stacking might suffice as a work-around, although I'd like to avoid that.

As a throw-away experiment, simply concatenating all the constituent decision tree estimators into a common estimators_ array (and adjusting the count) seems to work superficially, but clearly isn't good practice.

In addition, this approach can fail, if, say, we try to combine random forest instance #1 which has classes_ of [dog, cat] and forest #2 which has classes_ of [cow, dog, cat]. To address that concern, I looked at forcing the union of all possible classes (over all the forests) into the resultant combined forest, and the underlying trees. This appears to work at some level, but doesn't handle misshapen oob_decision_function_ which is shaped according n_classes_.

Another approach to dealing with classes_ heterogeneity is to make sure each federation forest is exposed to the full gamut of potential classes during training. (Even then, one worries about the order of the elements found in classes_ : [dog,cat] vs [cat,dog]). It appears that classes_ is constructed before any bootstrap sampling, so, assuming we can rely on that implementation detail, and we expose each federation to a consistently ordered and specially constructed "gamut" pre-pended to their X, we can (hopefully) expect all forest instances to have identical classes_ with the same elements in the same order. That, in turn, would allow easier combining of the forests. Ensuring complete exposure via the "gamut" might also impact accuracy. (The training "gamut" is a minimal set of X records that produce all possible y categorical values).

Additional context

R provides a combine() operator.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a facility that allows random forest classifiers to be combined after training #26326

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Add a facility that allows random forest classifiers to be combined after training #26326

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions