-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689
Conversation
…arestCentroid was fitted with feature names'
@adrinjalali can you please review this when you get a chance? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really partial review but I think factorizing some code with the discriminant analysis would be great.
@glemaitre Left comments on two issues, but everything else, OK. Thank you for taking the time to review this. |
Thanks for the suggestions, I've incorporated all of them. |
@glemaitre just checking in, what's the status of this? are there a lot of ongoing works on affected files? any chance you can approve and merge this before more potential merge conflicts for 1.6 release? let me know how I can help. |
My approval is still standing but we need a second review for merge. Could you solve the merge conflict. Let me ping @adrinjalali to try to get a second review on this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say this also requires an example to show how this works.
And I see some unresolved comments left by @glemaitre
sklearn/discriminant_analysis.py
Outdated
The decision function is equal (up to a constant factor) to the | ||
log-posterior of the model, i.e. `log p(y = k | x)`. In a binary | ||
classification setting this instead corresponds to the difference | ||
`log p(y = 1 | x) - log p(y = 0 | x)`. See :ref:`lda_qda_math`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of information is now removed from the docstrings. I think we should put back what the decision function does.
In terms of example, we could plot the same decision boundary but based on the estimated probabilities: https://scikit-learn.org/1.5/auto_examples/neighbors/plot_nearest_centroid.html#sphx-glr-auto-examples-neighbors-plot-nearest-centroid-py So it will be complementary. |
Hi all, I'm swamped with works but if you don't mind, I can take a look this week. Thanks for your comments, I'll read them again carefully later. |
I actually resolved most of them without checking the resolve checkboxes, but I just marked them as resolved for thoroughness. There's a few pending comments and I pinged you in them. |
Are you talking about simple example for API page or complex one like @glemaitre suggested for tutorial page? Because, a while back, @glemaitre said the updated tutorial one should be a separate PR. |
doc/whats_new/upcoming_changes/sklearn.neighbors/26689.enhancement.rst
Outdated
Show resolved
Hide resolved
Merging upon the two approvals. Thanks @NoPenguinsLand |
Thanks 🙏 @glemaitre @adrinjalali |
Reference Issues/PRs
closes #17711 and #26659
supersede #17711
What does this implement/fix? Explain your changes.
Add
decision_function
,predict_proba
andpredict_log_proba
methods for NearestCentroid estimator class for usage withroc_curve
function.The NearestCentroid class constructor now has an additional parameter called priors. By default, priors is set to
None
. When priors is set toNone
, class priors will be estimated using the sample data X. Thus, it is not backward compatible with the old version of NearestCentroid. The old and new NearestCentroid estimators will yield different results because the old NearestCentroid estimator assumed equal class priors.Any other comments?
I try to keep the design as consistent as possible with the LinearDiscriminantAnalysis estimator. So, the priors parameter in both classes are the same. Open to feedback and advise.
References
Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microaarrays. Statistical Science 18(1), p. 104-117.