-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+2] Pass predict attributes to last estimator in pipeline #9304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+2] Pass predict attributes to last estimator in pipeline #9304
Conversation
16bd7ca
to
542ce2a
Compare
now I'm a little concerned about this. How do we express the caveat that
uncertainties from transformations are not propagated?
…On 9 Jul 2017 1:54 pm, "Breno Freitas" ***@***.***> wrote:
Reference Issue
Fixes #9293 <#9293>.
What does this implement/fix? Explain your changes.
Simply passed the parameters from the predict call in the pipeline to the
last estimator's predict call, just like it's done with other methods.
Any other comments?
No.
------------------------------
You can view, comment on, or merge this pull request online at:
#9304
Commit Summary
- Pass predict attributes to last estimator in pipeline
File Changes
- *M* sklearn/pipeline.py
<https://github.com/scikit-learn/scikit-learn/pull/9304/files#diff-0>
(4)
- *M* sklearn/tests/test_pipeline.py
<https://github.com/scikit-learn/scikit-learn/pull/9304/files#diff-1>
(16)
Patch Links:
- https://github.com/scikit-learn/scikit-learn/pull/9304.patch
- https://github.com/scikit-learn/scikit-learn/pull/9304.diff
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#9304>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEz64OlgmNodi3zg99sWx7NGQuhqLrUks5sME70gaJpZM4OR8i4>
.
|
@jnothman I'm not sure I understood your concerns, do you mean uncertainties from the transformations propagated to the estimator? |
yes.
…On 10 July 2017 at 14:01, Breno Freitas ***@***.***> wrote:
@jnothman <https://github.com/jnothman> I'm not sure I understood your
concerns, do you mean uncertainties from the transformations propagated to
the estimator?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9304 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6-LlijVOzW31wEQc7phqcp5gYIxpks5sMaIEgaJpZM4OR8i4>
.
|
Apparently there's nothing we do to avoid such things, given that |
no... Really it's a caveat to using predict_std at all.
On 10 Jul 2017 10:29 pm, "Breno Freitas" <notifications@github.com> wrote:
Apparently there's nothing we do to avoid such things, given that transform
inside predict does not take any attributes. Should we be passing
attributes to it as well?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9304 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yroRrJ5J_APKmfwtJ8_pZ_SQ1l3ks5sMhkZgaJpZM4OR8i4>
.
|
So we're not moving forward with that issue? |
I'm a bit ambivalent, but I do like our data structures to encourage best
practices. passing predict_cov to a gp will similarly return covariances
with respect to the feature space at the end of the pipeline, not at the
beginning of the pipeline.
so if we adopt this fix, we need to document its caveats
…On 11 Jul 2017 8:47 am, "Breno Freitas" ***@***.***> wrote:
So we're not moving forward with that issue?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9304 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65kI67sWHwzORS1LMe8OFYMSCkP4ks5sMqnzgaJpZM4OR8i4>
.
|
I can definitely write down this caveats in the docs if you believe this fix is worth merging. |
I'll approve it if i cannot think of an alternative, but it won't be merged
until another core contributor agrees
…On 11 Jul 2017 10:52 am, "Breno Freitas" ***@***.***> wrote:
I can definitely write down this caveats in the docs if you believe this
fix is worth merging.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9304 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_ZIhyw3Wiaq0UWzhUDRw0mwjeERks5sMsdpgaJpZM4OR8i4>
.
|
@@ -296,7 +296,7 @@ def fit_transform(self, X, y=None, **fit_params): | ||
return last_step.fit(Xt, y, **fit_params).transform(Xt) | ||
|
||
@if_delegate_has_method(delegate='_final_estimator') | ||
def predict(self, X): | ||
def predict(self, X, **predict_params): | ||
"""Apply transforms to the data, and predict with the final estimator | ||
|
||
Parameters |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter requires a description in any case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Done
542ce2a
to
809c516
Compare
@jnothman As for the caveat, where do you think the docs for it should be? |
Caveat could be in the pipeline parameter docstring. But perhaps should also be in places where predict_std is implemented. I'd really like to hear from other core devs. Atm I think we have no overarching plans with respect to prediction with uncertainties. But I don't really want to make it harder for us to implement a more principled approach in the future. |
I'll hold on to make these changes so that we can get feedback from the other devs. |
often it's better to make changes so reviewers have a concrete proposal to
comment on.
…On 14 Jul 2017 11:37 am, "Breno Freitas" ***@***.***> wrote:
I'll hold on to make these changes so that we can get feedback from the
other devs.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9304 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz62CJ-vXeobZswjiDBgaxYZfVbQ-dks5sNsZWgaJpZM4OR8i4>
.
|
809c516
to
d7464c8
Compare
Added a description to the pipeline docstring. |
sklearn/pipeline.py
Outdated
@@ -50,7 +50,9 @@ class Pipeline(_BaseComposition): | ||
steps : list | ||
List of (name, transform) tuples (implementing fit/transform) that are | ||
chained, in the order in which they are chained, with the last object | ||
an estimator. | ||
an estimator. Note that uncertainties that are generated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this either belongs in the class dosctring under Notes, or in the method docstring for predict, below. It seems strange to have it here in steps.
d7464c8
to
56286fe
Compare
Fixed @jnothman |
56286fe
to
65ce588
Compare
"""Mock classifier that takes params on predict""" | ||
|
||
def fit(self, X, y): | ||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this line is not covered by any test, maybe it is enough to call pipe.fit
in your test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lesteve Done 👍
Codecov Report
@@ Coverage Diff @@
## master #9304 +/- ##
==========================================
- Coverage 96.16% 96.16% -0.01%
==========================================
Files 336 336
Lines 62442 62452 +10
==========================================
+ Hits 60046 60055 +9
- Misses 2396 2397 +1
Continue to review full report at Codecov.
|
c0da326
to
d73b311
Compare
d73b311
to
b025a82
Compare
sklearn/pipeline.py
Outdated
Parameters passed to the final ``predict`` in the pipeline. Note | ||
that uncertainties that are generated by the transformations | ||
in the pipeline are not propagated to the final estimator when | ||
this method is called in a pipeline object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is always called in a pipeline object.
sklearn/pipeline.py
Outdated
@@ -305,6 +305,12 @@ def predict(self, X): | ||
Data to predict on. Must fulfill input requirements of first step | ||
of the pipeline. | ||
|
||
**predict_params : dict of string -> object | ||
Parameters passed to the final ``predict`` in the pipeline. Note | ||
that uncertainties that are generated by the transformations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"while this may be used to return uncertainties from some models with return_std or return_cov, "
b025a82
to
a3671e8
Compare
@jnothman I changed the method description as requested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is okay. But there might be a reason not to do this that I have not thought of.
LGTM
Btw, we'd much rather see the commits added to the PR over time so that we can see the interaction with review, and we can squash them upon merge very easily. You shouldn't have to force-push. |
sklearn/pipeline.py
Outdated
@@ -305,6 +305,13 @@ def predict(self, X): | ||
Data to predict on. Must fulfill input requirements of first step | ||
of the pipeline. | ||
|
||
**predict_params : dict of string -> object | ||
Parameters passed to the final ``predict`` in the pipeline. Note |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the explanation a bit cryptic. It says it's only passed to the last step, right? Incidentally: why do we have that and not transform_params
? And if we had transform_params
would they be passed to all steps or only the last?
Separately, I might have preferred using a different method instead of having return_...
in predict
, but I guess that ship is sailed. I don't remember the argument for this.
Apart from the somewhat cryptic docstring lgtm, as it is a logical consequence of our current GP API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that if we had transform_params
they would need to have be passed to all intermediate steps, yes. In this case, predict_params
is passed for the only call of predict in the pipeline, which happens to be the last one. I tried making the docstring a little bit more clear. Thanks for the suggestion!
in the sense of altering output type, last step only makes sense
|
I'm ok to merge with a less cryptic docstring ;) |
a3671e8
to
feb96fa
Compare
All good @amueller?
|
cc/ @amueller is there anything else blocking this? |
Just wanna say this is very timely for me, I'm doing a pairwise model and would like to see The way the CV stuff works is really neat, when I do My work around for the interim is just stacking a custom estimator on top of what I'm trying to configure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@brenolf |
This pull request introduces 37 alerts and fixes 18 - view on lgtm.com new alerts:
fixed alerts:
Comment posted by lgtm.com |
@Benolf please ignore the lgtm.com mess... They're looking into it.
|
Fixes scikit-learn#9293 by passing the attributes provided in `predict` to the last estimator.
feb96fa
to
296a631
Compare
@qinhanmin2014 @jnothman ready for another review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid amending previous commits, as it makes it hard to track changes.
doc/whats_new/v0.20.rst
Outdated
@@ -156,6 +156,10 @@ Model evaluation and meta-estimators | ||
group-based CV strategies. :issue:`9085` by :user:`Laurent Direr <ldirer>` | ||
and `Andreas Müller`_. | ||
|
||
- A paramenter `predict_params` was added to :class:`pipeline.Pipeline` allowing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not really a parameter. How about "Pipeline's predict method now passes keyword arguments on to the pipeline's last estimator, enabling the use of return_std
in a Pipeline (with caution)."? Single backticks do nothing by themselves in our documentation, btw. You need double backtics.
@brenolf Please also fix the flake8 errors (See https://travis-ci.org/scikit-learn/scikit-learn/jobs/341271936) |
I've resolved the conflicts, addressed the comments by jnothman and resolved pep8 issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @brenolf
Reference Issue
Fixes #9293.
What does this implement/fix? Explain your changes.
Simply passed the parameters from the
predict
call in the pipeline to the last estimator'spredict
call, just like it's done with other methods.Any other comments?
No.