RFC: Cloning estimators in pipeline

This is an RFC on the lack of cloning of estimators in the pipeline.

Current situation

The Pipeline modifies in the fit method the estimators that are given in it's steps argument, in violation of the scikit-learn convention (bad bad coder).

Specific issue raised

To implement caching of the fit of transformers reliably, cloning of them is crucial (#7990), this way the caching is dependent only on the model parameters of the transformers.

In the PR (#7990), the first attempt was to implement a new class, CachedPipeline that would behave like the Pipeline but clone the transformers and cache them. The drawback is the multiplication of classes that are very similar, which makes features harder to discover and give surprises as there is a subttle difference between Pipeline and CachedPipeline.

Proposal

The proposal (put forward IRL by @ogrisel, but that has my favor too) is to deprecate the fact that pipeline.steps is modified and introduce a .steps_ attribute (and a .named_steps_ attribute). That way we could merge Pipeline and CachedPipeline.

Implementation

The difficulty is the deprecation path, as often. Two options:

Make steps and named_steps be properties, and add a warning upon access. Make it so that, for two releases, they return the modified estimators, ie what is stored in steps_ and named_steps_. In two releases, named_steps dies in favor of named_steps_, and we remove the properties.
Create a new class, for instance Pipe, that has the new logic with optional caching.

I have no strong opinion on which deprecation path is best. Option 1 is more work but may less intrusive on the users.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

RFC: Cloning estimators in pipeline #8157

Current situation

Specific issue raised

Proposal

Implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

RFC: Cloning estimators in pipeline #8157

Description

Current situation

Specific issue raised

Proposal

Implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions