Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

RFC: Cloning estimators in pipeline #8157

Copy link
Copy link
Open
@GaelVaroquaux

Description

@GaelVaroquaux
Issue body actions

This is an RFC on the lack of cloning of estimators in the pipeline.

Current situation

The Pipeline modifies in the fit method the estimators that are given in it's steps argument, in violation of the scikit-learn convention (bad bad coder).

Specific issue raised

To implement caching of the fit of transformers reliably, cloning of them is crucial (#7990), this way the caching is dependent only on the model parameters of the transformers.

In the PR (#7990), the first attempt was to implement a new class, CachedPipeline that would behave like the Pipeline but clone the transformers and cache them. The drawback is the multiplication of classes that are very similar, which makes features harder to discover and give surprises as there is a subttle difference between Pipeline and CachedPipeline.

Proposal

The proposal (put forward IRL by @ogrisel, but that has my favor too) is to deprecate the fact that pipeline.steps is modified and introduce a .steps_ attribute (and a .named_steps_ attribute). That way we could merge Pipeline and CachedPipeline.

Implementation

The difficulty is the deprecation path, as often. Two options:

  1. Make steps and named_steps be properties, and add a warning upon access. Make it so that, for two releases, they return the modified estimators, ie what is stored in steps_ and named_steps_. In two releases, named_steps dies in favor of named_steps_, and we remove the properties.

  2. Create a new class, for instance Pipe, that has the new logic with optional caching.

I have no strong opinion on which deprecation path is best. Option 1 is more work but may less intrusive on the users.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.