Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Add a facility that allows random forest classifiers to be combined after training #26326

Copy link
Copy link
Open
@davedice

Description

@davedice
Issue body actions

Describe the workflow you want to enable

In a federated environment, I have federation elements that build private random forest classifiers, which I would like to combine after the fact into a single random forest.

Describe your proposed solution

See the "alternatives" section.

Describe alternatives you've considered, if relevant

Stacking might suffice as a work-around, although I'd like to avoid that.

As a throw-away experiment, simply concatenating all the constituent decision tree estimators into a common estimators_ array (and adjusting the count) seems to work superficially, but clearly isn't good practice.

In addition, this approach can fail, if, say, we try to combine random forest instance #1 which has classes_ of [dog, cat] and forest #2 which has classes_ of [cow, dog, cat]. To address that concern, I looked at forcing the union of all possible classes (over all the forests) into the resultant combined forest, and the underlying trees. This appears to work at some level, but doesn't handle misshapen oob_decision_function_ which is shaped according n_classes_.

Another approach to dealing with classes_ heterogeneity is to make sure each federation forest is exposed to the full gamut of potential classes during training. (Even then, one worries about the order of the elements found in classes_ : [dog,cat] vs [cat,dog]). It appears that classes_ is constructed before any bootstrap sampling, so, assuming we can rely on that implementation detail, and we expose each federation to a consistently ordered and specially constructed "gamut" pre-pended to their X, we can (hopefully) expect all forest instances to have identical classes_ with the same elements in the same order. That, in turn, would allow easier combining of the forests. Ensuring complete exposure via the "gamut" might also impact accuracy. (The training "gamut" is a minimal set of X records that produce all possible y categorical values).

Additional context

R provides a combine() operator.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.