Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 77 commits into from
Oct 28, 2024

Conversation

NoPenguinsLand
Copy link
Contributor

@NoPenguinsLand NoPenguinsLand commented Jun 23, 2023

Reference Issues/PRs

closes #17711 and #26659
supersede #17711

What does this implement/fix? Explain your changes.

Add decision_function, predict_proba and predict_log_proba methods for NearestCentroid estimator class for usage with roc_curve function.

The NearestCentroid class constructor now has an additional parameter called priors. By default, priors is set to None. When priors is set to None, class priors will be estimated using the sample data X. Thus, it is not backward compatible with the old version of NearestCentroid. The old and new NearestCentroid estimators will yield different results because the old NearestCentroid estimator assumed equal class priors.

Any other comments?

I try to keep the design as consistent as possible with the LinearDiscriminantAnalysis estimator. So, the priors parameter in both classes are the same. Open to feedback and advise.

References

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microaarrays. Statistical Science 18(1), p. 104-117.

@github-actions
Copy link

github-actions bot commented Jun 23, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 4abc04f. Link to the linter CI: here

@NoPenguinsLand NoPenguinsLand changed the title Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator [WIP] Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 23, 2023
@NoPenguinsLand NoPenguinsLand changed the title [WIP] Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator WIP Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 23, 2023
@NoPenguinsLand
Copy link
Contributor Author

@adrinjalali can you please review this when you get a chance?

@NoPenguinsLand NoPenguinsLand changed the title WIP Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator MRG Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 26, 2023
@glemaitre glemaitre self-requested a review November 3, 2023 21:48
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really partial review but I think factorizing some code with the discriminant analysis would be great.

sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
@NoPenguinsLand
Copy link
Contributor Author

@glemaitre Left comments on two issues, but everything else, OK. Thank you for taking the time to review this.

@NoPenguinsLand
Copy link
Contributor Author

Thanks for the suggestions, I've incorporated all of them.

@NoPenguinsLand
Copy link
Contributor Author

@glemaitre just checking in, what's the status of this? are there a lot of ongoing works on affected files? any chance you can approve and merge this before more potential merge conflicts for 1.6 release? let me know how I can help.

@glemaitre
Copy link
Member

any chance you can approve and merge this before more potential merge conflicts for 1.6 release?

My approval is still standing but we need a second review for merge. Could you solve the merge conflict. Let me ping @adrinjalali to try to get a second review on this PR.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say this also requires an example to show how this works.

And I see some unresolved comments left by @glemaitre

sklearn/discriminant_analysis.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
sklearn/neighbors/_nearest_centroid.py Outdated Show resolved Hide resolved
Comment on lines 981 to 984
The decision function is equal (up to a constant factor) to the
log-posterior of the model, i.e. `log p(y = k | x)`. In a binary
classification setting this instead corresponds to the difference
`log p(y = 1 | x) - log p(y = 0 | x)`. See :ref:`lda_qda_math`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of information is now removed from the docstrings. I think we should put back what the decision function does.

@glemaitre
Copy link
Member

In terms of example, we could plot the same decision boundary but based on the estimated probabilities: https://scikit-learn.org/1.5/auto_examples/neighbors/plot_nearest_centroid.html#sphx-glr-auto-examples-neighbors-plot-nearest-centroid-py

So it will be complementary.

@NoPenguinsLand
Copy link
Contributor Author

Hi all, I'm swamped with works but if you don't mind, I can take a look this week. Thanks for your comments, I'll read them again carefully later.

@NoPenguinsLand
Copy link
Contributor Author

And I see some unresolved comments left by @glemaitre

I actually resolved most of them without checking the resolve checkboxes, but I just marked them as resolved for thoroughness. There's a few pending comments and I pinged you in them.

@NoPenguinsLand
Copy link
Contributor Author

I'd say this also requires an example to show how this works.

Are you talking about simple example for API page or complex one like @glemaitre suggested for tutorial page? Because, a while back, @glemaitre said the updated tutorial one should be a separate PR.

sklearn/discriminant_analysis.py Show resolved Hide resolved
@glemaitre glemaitre merged commit f3b1da3 into scikit-learn:main Oct 28, 2024
30 checks passed
@glemaitre
Copy link
Member

Merging upon the two approvals. Thanks @NoPenguinsLand

@NoPenguinsLand
Copy link
Contributor Author

Thanks 🙏 @glemaitre @adrinjalali

@NoPenguinsLand NoPenguinsLand deleted the nearestcentroidfeat_02 branch October 28, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.