[MRG+2] Pass predict attributes to last estimator in pipeline #9304

brenolf · Jul 9, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Simply passed the parameters from the predict call in the pipeline to the last estimator's predict call, just like it's done with other methods.

Any other comments?

No.

jnothman · Jul 10, 2017

now I'm a little concerned about this. How do we express the caveat that uncertainties from transformations are not propagated?

…

On 9 Jul 2017 1:54 pm, "Breno Freitas" ***@***.***> wrote: Reference Issue Fixes #9293 <#9293>. What does this implement/fix? Explain your changes. Simply passed the parameters from the predict call in the pipeline to the last estimator's predict call, just like it's done with other methods. Any other comments? No. ------------------------------ You can view, comment on, or merge this pull request online at: #9304 Commit Summary - Pass predict attributes to last estimator in pipeline File Changes - *M* sklearn/pipeline.py <https://github.com/scikit-learn/scikit-learn/pull/9304/files#diff-0> (4) - *M* sklearn/tests/test_pipeline.py <https://github.com/scikit-learn/scikit-learn/pull/9304/files#diff-1> (16) Patch Links: - https://github.com/scikit-learn/scikit-learn/pull/9304.patch - https://github.com/scikit-learn/scikit-learn/pull/9304.diff — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#9304>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz64OlgmNodi3zg99sWx7NGQuhqLrUks5sME70gaJpZM4OR8i4> .

brenolf · Jul 10, 2017

@jnothman I'm not sure I understood your concerns, do you mean uncertainties from the transformations propagated to the estimator?

jnothman · Jul 10, 2017

yes.

…

On 10 July 2017 at 14:01, Breno Freitas ***@***.***> wrote: @jnothman <https://github.com/jnothman> I'm not sure I understood your concerns, do you mean uncertainties from the transformations propagated to the estimator? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9304 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6-LlijVOzW31wEQc7phqcp5gYIxpks5sMaIEgaJpZM4OR8i4> .

brenolf · Jul 10, 2017

Apparently there's nothing we do to avoid such things, given that transform inside predict does not take any attributes. Should we be passing attributes to it as well?

jnothman · Jul 10, 2017

no... Really it's a caveat to using predict_std at all. On 10 Jul 2017 10:29 pm, "Breno Freitas" <notifications@github.com> wrote: Apparently there's nothing we do to avoid such things, given that transform inside predict does not take any attributes. Should we be passing attributes to it as well? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9304 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yroRrJ5J_APKmfwtJ8_pZ_SQ1l3ks5sMhkZgaJpZM4OR8i4> .

brenolf · Jul 10, 2017

So we're not moving forward with that issue?

jnothman · Jul 10, 2017

I'm a bit ambivalent, but I do like our data structures to encourage best practices. passing predict_cov to a gp will similarly return covariances with respect to the feature space at the end of the pipeline, not at the beginning of the pipeline. so if we adopt this fix, we need to document its caveats

…

On 11 Jul 2017 8:47 am, "Breno Freitas" ***@***.***> wrote: So we're not moving forward with that issue? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9304 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65kI67sWHwzORS1LMe8OFYMSCkP4ks5sMqnzgaJpZM4OR8i4> .

brenolf · Jul 11, 2017

I can definitely write down this caveats in the docs if you believe this fix is worth merging.

jnothman · Jul 11, 2017

I'll approve it if i cannot think of an alternative, but it won't be merged until another core contributor agrees

…

On 11 Jul 2017 10:52 am, "Breno Freitas" ***@***.***> wrote: I can definitely write down this caveats in the docs if you believe this fix is worth merging. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9304 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6_ZIhyw3Wiaq0UWzhUDRw0mwjeERks5sMsdpgaJpZM4OR8i4> .

jnothman · Jul 12, 2017

sklearn/pipeline.py

@@ -296,7 +296,7 @@ def fit_transform(self, X, y=None, **fit_params):
            return last_step.fit(Xt, y, **fit_params).transform(Xt)

    @if_delegate_has_method(delegate='_final_estimator')
-    def predict(self, X):
+    def predict(self, X, **predict_params):
        """Apply transforms to the data, and predict with the final estimator

        Parameters


The parameter requires a description in any case

brenolf · Jul 13, 2017

@jnothman As for the caveat, where do you think the docs for it should be?

jnothman · Jul 14, 2017

Caveat could be in the pipeline parameter docstring. But perhaps should also be in places where predict_std is implemented.

I'd really like to hear from other core devs. Atm I think we have no overarching plans with respect to prediction with uncertainties. But I don't really want to make it harder for us to implement a more principled approach in the future.

brenolf · Jul 14, 2017

I'll hold on to make these changes so that we can get feedback from the other devs.

jnothman · Jul 14, 2017

often it's better to make changes so reviewers have a concrete proposal to comment on.

…

On 14 Jul 2017 11:37 am, "Breno Freitas" ***@***.***> wrote: I'll hold on to make these changes so that we can get feedback from the other devs. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9304 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz62CJ-vXeobZswjiDBgaxYZfVbQ-dks5sNsZWgaJpZM4OR8i4> .

brenolf · Jul 16, 2017

Added a description to the pipeline docstring.

jnothman · Jul 17, 2017

sklearn/pipeline.py

@@ -50,7 +50,9 @@ class Pipeline(_BaseComposition):
    steps : list
        List of (name, transform) tuples (implementing fit/transform) that are
        chained, in the order in which they are chained, with the last object
-        an estimator.
+        an estimator. Note that uncertainties that are generated


I think that this either belongs in the class dosctring under Notes, or in the method docstring for predict, below. It seems strange to have it here in steps.

brenolf · Jul 17, 2017

Fixed @jnothman

lesteve · Oct 3, 2017

sklearn/tests/test_pipeline.py

+    """Mock classifier that takes params on predict"""
+
+    def fit(self, X, y):
+        return self


Looks like this line is not covered by any test, maybe it is enough to call pipe.fit in your test?

@lesteve Done 👍

codecov · Oct 3, 2017

Codecov Report

Merging #9304 into master will decrease coverage by <.01%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master    #9304      +/-   ##
==========================================
- Coverage   96.16%   96.16%   -0.01%     
==========================================
  Files         336      336              
  Lines       62442    62452      +10     
==========================================
+ Hits        60046    60055       +9     
- Misses       2396     2397       +1

Impacted Files	Coverage Δ
sklearn/pipeline.py	`100% <100%> (ø)`	⬆️
sklearn/tests/test_pipeline.py	`99.47% <90%> (-0.17%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 12cf7db...d73b311. Read the comment docs.

jnothman · Oct 9, 2017

sklearn/pipeline.py

+            Parameters passed to the final ``predict`` in the pipeline. Note
+            that uncertainties that are generated by the transformations
+            in the pipeline are not propagated to the final estimator when
+            this method is called in a pipeline object.


This method is always called in a pipeline object.

jnothman · Oct 9, 2017

sklearn/pipeline.py

@@ -305,6 +305,12 @@ def predict(self, X):
            Data to predict on. Must fulfill input requirements of first step
            of the pipeline.

+        **predict_params : dict of string -> object
+            Parameters passed to the final ``predict`` in the pipeline. Note
+            that uncertainties that are generated by the transformations


"while this may be used to return uncertainties from some models with return_std or return_cov, "

brenolf · Oct 9, 2017

@jnothman I changed the method description as requested

jnothman

Yes, I think this is okay. But there might be a reason not to do this that I have not thought of.
LGTM

jnothman · Oct 17, 2017

Btw, we'd much rather see the commits added to the PR over time so that we can see the interaction with review, and we can squash them upon merge very easily. You shouldn't have to force-push.

amueller · Oct 26, 2017

sklearn/pipeline.py

@@ -305,6 +305,13 @@ def predict(self, X):
            Data to predict on. Must fulfill input requirements of first step
            of the pipeline.

+        **predict_params : dict of string -> object
+            Parameters passed to the final ``predict`` in the pipeline. Note


I find the explanation a bit cryptic. It says it's only passed to the last step, right? Incidentally: why do we have that and not transform_params? And if we had transform_params would they be passed to all steps or only the last?

Separately, I might have preferred using a different method instead of having return_... in predict, but I guess that ship is sailed. I don't remember the argument for this.

Apart from the somewhat cryptic docstring lgtm, as it is a logical consequence of our current GP API.

I believe that if we had transform_params they would need to have be passed to all intermediate steps, yes. In this case, predict_params is passed for the only call of predict in the pipeline, which happens to be the last one. I tried making the docstring a little bit more clear. Thanks for the suggestion!

jnothman · Oct 27, 2017

in the sense of altering output type, last step only makes sense

amueller · Oct 27, 2017

I'm ok to merge with a less cryptic docstring ;)

brenolf · Nov 3, 2017

@amueller @jnothman I'm sorry for the delay on having this fixed. I just updated the docstring.

jnothman · Nov 4, 2017

All good @amueller?

   **predict_params : dict of string -> object
       Parameters to the ``predict`` called at the end of all
       transformations in the pipeline. Note that while this may be 
       used to return uncertainties from some models with return_std 
       or return_cov, uncertainties that are generated by the 
       transformations in the pipeline are not propagated to the 
       final estimator.

brenolf · Nov 27, 2017

cc/ @amueller is there anything else blocking this?

munro · Dec 9, 2017

Just wanna say this is very timely for me, I'm doing a pairwise model and would like to see predict_params added to cross_validate & BaseSearchCV as well.

The way the CV stuff works is really neat, when I do fit_params={'gbm__groups': df['group_ids']}, it's smart enough to slice the df for only the fold, instead of passing the entire df in! 💖

My work around for the interim is just stacking a custom estimator on top of what I'm trying to configure predict_params for.

qinhanmin2014

LGTM.

qinhanmin2014 · Feb 14, 2018

@brenolf
Please add an entry to the change log at doc/whats_new/v*.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:.

sklearn-lgtm · Feb 14, 2018

This pull request introduces 37 alerts and fixes 18 - view on lgtm.com

new alerts:

37 for Explicit export is not defined

fixed alerts:

6 for Mismatch between signature and use of an overridden method
3 for Missing call to init during object initialization
3 for Potentially uninitialized local variable
2 for Comparison using is when operands support eq
2 for Wrong number of arguments for format
1 for Conflicting attributes in base classes
1 for Mismatch between signature and use of an overriding method

Comment posted by lgtm.com

jnothman · Feb 14, 2018

@Benolf please ignore the lgtm.com mess... They're looking into it.

Fixes scikit-learn#9293 by passing the attributes provided in `predict` to the last estimator.

brenolf · Feb 14, 2018

@qinhanmin2014 @jnothman ready for another review

jnothman

Please avoid amending previous commits, as it makes it hard to track changes.

jnothman · Feb 14, 2018

doc/whats_new/v0.20.rst

@@ -156,6 +156,10 @@ Model evaluation and meta-estimators
  group-based CV strategies. :issue:`9085` by :user:`Laurent Direr <ldirer>`
  and `Andreas Müller`_.

+- A paramenter `predict_params` was added to :class:`pipeline.Pipeline` allowing


it's not really a parameter. How about "Pipeline's predict method now passes keyword arguments on to the pipeline's last estimator, enabling the use of return_std in a Pipeline (with caution)."? Single backticks do nothing by themselves in our documentation, btw. You need double backtics.

qinhanmin2014 · Feb 14, 2018

@brenolf Please also fix the flake8 errors (See https://travis-ci.org/scikit-learn/scikit-learn/jobs/341271936)

qinhanmin2014 · Mar 5, 2018

I've resolved the conflicts, addressed the comments by jnothman and resolved pep8 issue.
Will merge when green.

qinhanmin2014

LGTM, thanks @brenolf

brenolf force-pushed the add-prediction-params-pipeline branch from 16bd7ca to 542ce2a Compare July 9, 2017 04:01

brenolf changed the title ~~Pass predict attributes to last estimator in pipeline~~ [MRG] Pass predict attributes to last estimator in pipeline Jul 9, 2017

jnothman reviewed Jul 12, 2017

View reviewed changes

brenolf force-pushed the add-prediction-params-pipeline branch from 542ce2a to 809c516 Compare July 13, 2017 23:50

brenolf force-pushed the add-prediction-params-pipeline branch from 809c516 to d7464c8 Compare July 16, 2017 22:11

jnothman reviewed Jul 17, 2017

View reviewed changes

brenolf force-pushed the add-prediction-params-pipeline branch from d7464c8 to 56286fe Compare July 17, 2017 12:09

brenolf force-pushed the add-prediction-params-pipeline branch from 56286fe to 65ce588 Compare July 17, 2017 17:18

lesteve reviewed Oct 3, 2017

View reviewed changes

lesteve force-pushed the add-prediction-params-pipeline branch from c0da326 to d73b311 Compare October 3, 2017 12:34

brenolf force-pushed the add-prediction-params-pipeline branch from d73b311 to b025a82 Compare October 3, 2017 13:53

jnothman reviewed Oct 9, 2017

View reviewed changes

brenolf force-pushed the add-prediction-params-pipeline branch from b025a82 to a3671e8 Compare October 9, 2017 15:19

jnothman approved these changes Oct 9, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Pass predict attributes to last estimator in pipeline~~ [MRG+1] Pass predict attributes to last estimator in pipeline Oct 9, 2017

jnothman mentioned this pull request Oct 17, 2017

Bug: the predict method of Pipeline object does not use the exact predict method of final step estimator #9293

Closed

amueller reviewed Oct 26, 2017

View reviewed changes

amueller mentioned this pull request Oct 27, 2017

Added kwargs to predict method of Pipeline #10030

Closed

brenolf force-pushed the add-prediction-params-pipeline branch from a3671e8 to feb96fa Compare November 3, 2017 14:41

qinhanmin2014 approved these changes Feb 14, 2018

View reviewed changes

qinhanmin2014 changed the title ~~[MRG+1] Pass predict attributes to last estimator in pipeline~~ [MRG+2] Pass predict attributes to last estimator in pipeline Feb 14, 2018

Pass predict attributes to last estimator in pipeline

296a631

Fixes scikit-learn#9293 by passing the attributes provided in `predict` to the last estimator.

brenolf force-pushed the add-prediction-params-pipeline branch from feb96fa to 296a631 Compare February 14, 2018 04:29

jnothman reviewed Feb 14, 2018

View reviewed changes

qinhanmin2014 added 2 commits March 5, 2018 15:39

Merge branch 'master' into 9304

0ba4dae

pep8

d9050a8

qinhanmin2014 approved these changes Mar 5, 2018

View reviewed changes

qinhanmin2014 merged commit 172d652 into scikit-learn:master Mar 5, 2018

Noctiphobia mentioned this pull request Sep 5, 2018

Inconsistency between predict and predict_proba parameters in Pipeline #12006

Open

Search code, repositories, users, issues, pull requests...

Uh oh!

[MRG+2] Pass predict attributes to last estimator in pipeline #9304

[MRG+2] Pass predict attributes to last estimator in pipeline #9304

Uh oh!

Conversation

brenolf commented Jul 9, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Jul 10, 2017 via email

Uh oh!

brenolf commented Jul 10, 2017

Uh oh!

jnothman commented Jul 10, 2017 via email

Uh oh!

brenolf commented Jul 10, 2017

Uh oh!

jnothman commented Jul 10, 2017 via email

Uh oh!

brenolf commented Jul 10, 2017

Uh oh!

jnothman commented Jul 10, 2017 via email

Uh oh!

brenolf commented Jul 11, 2017

Uh oh!

jnothman commented Jul 11, 2017 via email

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brenolf commented Jul 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 14, 2017

Uh oh!

brenolf commented Jul 14, 2017

Uh oh!

jnothman commented Jul 14, 2017 via email

Uh oh!

brenolf commented Jul 16, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brenolf commented Jul 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brenolf commented Oct 9, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 17, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 27, 2017 via email

Uh oh!

amueller commented Oct 27, 2017

Uh oh!

brenolf commented Nov 3, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Nov 4, 2017

Uh oh!

brenolf commented Jul 13, 2017 •

edited

Loading

codecov bot commented Oct 3, 2017 •

edited

Loading

brenolf commented Nov 3, 2017 •

edited

Loading