Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[MRG+2] Pass predict attributes to last estimator in pipeline #9304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

brenolf
Copy link
Contributor

@brenolf brenolf commented Jul 9, 2017

Reference Issue

Fixes #9293.

What does this implement/fix? Explain your changes.

Simply passed the parameters from the predict call in the pipeline to the last estimator's predict call, just like it's done with other methods.

Any other comments?

No.

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from 16bd7ca to 542ce2a Compare July 9, 2017 04:01
@brenolf brenolf changed the title Pass predict attributes to last estimator in pipeline [MRG] Pass predict attributes to last estimator in pipeline Jul 9, 2017
@jnothman
Copy link
Member

jnothman commented Jul 10, 2017 via email

@brenolf
Copy link
Contributor Author

brenolf commented Jul 10, 2017

@jnothman I'm not sure I understood your concerns, do you mean uncertainties from the transformations propagated to the estimator?

@jnothman
Copy link
Member

jnothman commented Jul 10, 2017 via email

@brenolf
Copy link
Contributor Author

brenolf commented Jul 10, 2017

Apparently there's nothing we do to avoid such things, given that transform inside predict does not take any attributes. Should we be passing attributes to it as well?

@jnothman
Copy link
Member

jnothman commented Jul 10, 2017 via email

@brenolf
Copy link
Contributor Author

brenolf commented Jul 10, 2017

So we're not moving forward with that issue?

@jnothman
Copy link
Member

jnothman commented Jul 10, 2017 via email

@brenolf
Copy link
Contributor Author

brenolf commented Jul 11, 2017

I can definitely write down this caveats in the docs if you believe this fix is worth merging.

@jnothman
Copy link
Member

jnothman commented Jul 11, 2017 via email

@@ -296,7 +296,7 @@ def fit_transform(self, X, y=None, **fit_params):
return last_step.fit(Xt, y, **fit_params).transform(Xt)

@if_delegate_has_method(delegate='_final_estimator')
def predict(self, X):
def predict(self, X, **predict_params):
"""Apply transforms to the data, and predict with the final estimator

Parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parameter requires a description in any case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Done

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from 542ce2a to 809c516 Compare July 13, 2017 23:50
@brenolf
Copy link
Contributor Author

brenolf commented Jul 13, 2017

@jnothman As for the caveat, where do you think the docs for it should be?

@jnothman
Copy link
Member

Caveat could be in the pipeline parameter docstring. But perhaps should also be in places where predict_std is implemented.

I'd really like to hear from other core devs. Atm I think we have no overarching plans with respect to prediction with uncertainties. But I don't really want to make it harder for us to implement a more principled approach in the future.

@brenolf
Copy link
Contributor Author

brenolf commented Jul 14, 2017

I'll hold on to make these changes so that we can get feedback from the other devs.

@jnothman
Copy link
Member

jnothman commented Jul 14, 2017 via email

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from 809c516 to d7464c8 Compare July 16, 2017 22:11
@brenolf
Copy link
Contributor Author

brenolf commented Jul 16, 2017

Added a description to the pipeline docstring.

@@ -50,7 +50,9 @@ class Pipeline(_BaseComposition):
steps : list
List of (name, transform) tuples (implementing fit/transform) that are
chained, in the order in which they are chained, with the last object
an estimator.
an estimator. Note that uncertainties that are generated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this either belongs in the class dosctring under Notes, or in the method docstring for predict, below. It seems strange to have it here in steps.

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from d7464c8 to 56286fe Compare July 17, 2017 12:09
@brenolf
Copy link
Contributor Author

brenolf commented Jul 17, 2017

Fixed @jnothman

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from 56286fe to 65ce588 Compare July 17, 2017 17:18
"""Mock classifier that takes params on predict"""

def fit(self, X, y):
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this line is not covered by any test, maybe it is enough to call pipe.fit in your test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lesteve Done 👍

@codecov
Copy link

codecov bot commented Oct 3, 2017

Codecov Report

Merging #9304 into master will decrease coverage by <.01%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9304      +/-   ##
==========================================
- Coverage   96.16%   96.16%   -0.01%     
==========================================
  Files         336      336              
  Lines       62442    62452      +10     
==========================================
+ Hits        60046    60055       +9     
- Misses       2396     2397       +1
Impacted Files Coverage Δ
sklearn/pipeline.py 100% <100%> (ø) ⬆️
sklearn/tests/test_pipeline.py 99.47% <90%> (-0.17%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12cf7db...d73b311. Read the comment docs.

@lesteve lesteve force-pushed the add-prediction-params-pipeline branch from c0da326 to d73b311 Compare October 3, 2017 12:34
@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from d73b311 to b025a82 Compare October 3, 2017 13:53
Parameters passed to the final ``predict`` in the pipeline. Note
that uncertainties that are generated by the transformations
in the pipeline are not propagated to the final estimator when
this method is called in a pipeline object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is always called in a pipeline object.

@@ -305,6 +305,12 @@ def predict(self, X):
Data to predict on. Must fulfill input requirements of first step
of the pipeline.

**predict_params : dict of string -> object
Parameters passed to the final ``predict`` in the pipeline. Note
that uncertainties that are generated by the transformations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"while this may be used to return uncertainties from some models with return_std or return_cov, "

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from b025a82 to a3671e8 Compare October 9, 2017 15:19
@brenolf
Copy link
Contributor Author

brenolf commented Oct 9, 2017

@jnothman I changed the method description as requested

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is okay. But there might be a reason not to do this that I have not thought of.
LGTM

@jnothman jnothman changed the title [MRG] Pass predict attributes to last estimator in pipeline [MRG+1] Pass predict attributes to last estimator in pipeline Oct 9, 2017
@jnothman
Copy link
Member

Btw, we'd much rather see the commits added to the PR over time so that we can see the interaction with review, and we can squash them upon merge very easily. You shouldn't have to force-push.

@@ -305,6 +305,13 @@ def predict(self, X):
Data to predict on. Must fulfill input requirements of first step
of the pipeline.

**predict_params : dict of string -> object
Parameters passed to the final ``predict`` in the pipeline. Note
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the explanation a bit cryptic. It says it's only passed to the last step, right? Incidentally: why do we have that and not transform_params? And if we had transform_params would they be passed to all steps or only the last?

Separately, I might have preferred using a different method instead of having return_... in predict, but I guess that ship is sailed. I don't remember the argument for this.

Apart from the somewhat cryptic docstring lgtm, as it is a logical consequence of our current GP API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that if we had transform_params they would need to have be passed to all intermediate steps, yes. In this case, predict_params is passed for the only call of predict in the pipeline, which happens to be the last one. I tried making the docstring a little bit more clear. Thanks for the suggestion!

@jnothman
Copy link
Member

jnothman commented Oct 27, 2017 via email

@amueller
Copy link
Member

I'm ok to merge with a less cryptic docstring ;)

@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from a3671e8 to feb96fa Compare November 3, 2017 14:41
@brenolf
Copy link
Contributor Author

brenolf commented Nov 3, 2017

@amueller @jnothman I'm sorry for the delay on having this fixed. I just updated the docstring.

@jnothman
Copy link
Member

jnothman commented Nov 4, 2017

All good @amueller?

   **predict_params : dict of string -> object
       Parameters to the ``predict`` called at the end of all
       transformations in the pipeline. Note that while this may be 
       used to return uncertainties from some models with return_std 
       or return_cov, uncertainties that are generated by the 
       transformations in the pipeline are not propagated to the 
       final estimator.

@brenolf
Copy link
Contributor Author

brenolf commented Nov 27, 2017

cc/ @amueller is there anything else blocking this?

@munro
Copy link

munro commented Dec 9, 2017

Just wanna say this is very timely for me, I'm doing a pairwise model and would like to see predict_params added to cross_validate & BaseSearchCV as well.

The way the CV stuff works is really neat, when I do fit_params={'gbm__groups': df['group_ids']}, it's smart enough to slice the df for only the fold, instead of passing the entire df in! 💖

My work around for the interim is just stacking a custom estimator on top of what I'm trying to configure predict_params for.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] Pass predict attributes to last estimator in pipeline [MRG+2] Pass predict attributes to last estimator in pipeline Feb 14, 2018
@qinhanmin2014
Copy link
Member

@brenolf
Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

@sklearn-lgtm
Copy link

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

  • 37 for Explicit export is not defined

fixed alerts:

  • 6 for Mismatch between signature and use of an overridden method
  • 3 for Missing call to init during object initialization
  • 3 for Potentially uninitialized local variable
  • 2 for Comparison using is when operands support eq
  • 2 for Wrong number of arguments for format
  • 1 for Conflicting attributes in base classes
  • 1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

@jnothman
Copy link
Member

jnothman commented Feb 14, 2018 via email

Fixes scikit-learn#9293 by passing the attributes provided in `predict` to the last estimator.
@brenolf brenolf force-pushed the add-prediction-params-pipeline branch from feb96fa to 296a631 Compare February 14, 2018 04:29
@brenolf
Copy link
Contributor Author

brenolf commented Feb 14, 2018

@qinhanmin2014 @jnothman ready for another review

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please avoid amending previous commits, as it makes it hard to track changes.

@@ -156,6 +156,10 @@ Model evaluation and meta-estimators
group-based CV strategies. :issue:`9085` by :user:`Laurent Direr <ldirer>`
and `Andreas Müller`_.

- A paramenter `predict_params` was added to :class:`pipeline.Pipeline` allowing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not really a parameter. How about "Pipeline's predict method now passes keyword arguments on to the pipeline's last estimator, enabling the use of return_std in a Pipeline (with caution)."? Single backticks do nothing by themselves in our documentation, btw. You need double backtics.

@qinhanmin2014
Copy link
Member

@brenolf Please also fix the flake8 errors (See https://travis-ci.org/scikit-learn/scikit-learn/jobs/341271936)

@qinhanmin2014
Copy link
Member

I've resolved the conflicts, addressed the comments by jnothman and resolved pep8 issue.
Will merge when green.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @brenolf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: the predict method of Pipeline object does not use the exact predict method of final step estimator
7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.