ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689

NoPenguinsLand · Jun 23, 2023

Reference Issues/PRs

closes #17711 and #26659
supersede #17711

What does this implement/fix? Explain your changes.

Add decision_function, predict_proba and predict_log_proba methods for NearestCentroid estimator class for usage with roc_curve function.

The NearestCentroid class constructor now has an additional parameter called priors. By default, priors is set to None. When priors is set to None, class priors will be estimated using the sample data X. Thus, it is not backward compatible with the old version of NearestCentroid. The old and new NearestCentroid estimators will yield different results because the old NearestCentroid estimator assumed equal class priors.

Any other comments?

I try to keep the design as consistent as possible with the LinearDiscriminantAnalysis estimator. So, the priors parameter in both classes are the same. Open to feedback and advise.

References

Tibshirani, R., Hastie, T., Narasimhan, B., & Chu, G. (2002). Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microaarrays. Statistical Science 18(1), p. 104-117.

…Centroid class

…ntroid estimator class

github-actions · Jun 23, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 4abc04f. Link to the linter CI: here}

…arestCentroid was fitted with feature names'

NoPenguinsLand · Jun 26, 2023

@adrinjalali can you please review this when you get a chance?

glemaitre

This is really partial review but I think factorizing some code with the discriminant analysis would be great.

sklearn/neighbors/_nearest_centroid.py

NoPenguinsLand · Nov 5, 2023

@glemaitre Left comments on two issues, but everything else, OK. Thank you for taking the time to review this.

NoPenguinsLand · May 21, 2024

Thanks for the suggestions, I've incorporated all of them.

NoPenguinsLand · Oct 9, 2024

@glemaitre just checking in, what's the status of this? are there a lot of ongoing works on affected files? any chance you can approve and merge this before more potential merge conflicts for 1.6 release? let me know how I can help.

glemaitre · Oct 11, 2024

any chance you can approve and merge this before more potential merge conflicts for 1.6 release?

My approval is still standing but we need a second review for merge. Could you solve the merge conflict. Let me ping @adrinjalali to try to get a second review on this PR.

adrinjalali

I'd say this also requires an example to show how this works.

And I see some unresolved comments left by @glemaitre

sklearn/discriminant_analysis.py

sklearn/neighbors/_nearest_centroid.py

adrinjalali · Oct 15, 2024

sklearn/discriminant_analysis.py

-        The decision function is equal (up to a constant factor) to the
-        log-posterior of the model, i.e. `log p(y = k | x)`. In a binary
-        classification setting this instead corresponds to the difference
-        `log p(y = 1 | x) - log p(y = 0 | x)`. See :ref:`lda_qda_math`.


This kind of information is now removed from the docstrings. I think we should put back what the decision function does.

glemaitre · Oct 16, 2024

In terms of example, we could plot the same decision boundary but based on the estimated probabilities: https://scikit-learn.org/1.5/auto_examples/neighbors/plot_nearest_centroid.html#sphx-glr-auto-examples-neighbors-plot-nearest-centroid-py

So it will be complementary.

NoPenguinsLand · Oct 16, 2024

Hi all, I'm swamped with works but if you don't mind, I can take a look this week. Thanks for your comments, I'll read them again carefully later.

NoPenguinsLand · Oct 18, 2024

And I see some unresolved comments left by @glemaitre

I actually resolved most of them without checking the resolve checkboxes, but I just marked them as resolved for thoroughness. There's a few pending comments and I pinged you in them.

NoPenguinsLand · Oct 18, 2024

I'd say this also requires an example to show how this works.

Are you talking about simple example for API page or complex one like @glemaitre suggested for tutorial page? Because, a while back, @glemaitre said the updated tutorial one should be a separate PR.

doc/whats_new/upcoming_changes/sklearn.neighbors/26689.enhancement.rst

sklearn/discriminant_analysis.py

glemaitre · Oct 28, 2024

Merging upon the two approvals. Thanks @NoPenguinsLand

NoPenguinsLand · Oct 28, 2024

Thanks 🙏 @glemaitre @adrinjalali

NoPenguinsLand added 4 commits June 23, 2023 13:18

add decision_function, predidct_proba & predict_log_proba for Nearest…

06461d4

…Centroid class

add decision_function, predict_proba & predict_log_proba to NearestCe…

c9b12d2

…ntroid estimator class

add more tests for predict_proba for NearestCentroid estimator class

84148f7

Improve Doc 01

713651b

github-actions bot added the module:neighbors label Jun 23, 2023

Merge remote-tracking branch 'upstream/main' into nearestcentroidfeat_02

696963b

NoPenguinsLand changed the title ~~Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator~~ [WIP] Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 23, 2023

NoPenguinsLand changed the title ~~[WIP] Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator~~ WIP Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 23, 2023

NoPenguinsLand added 15 commits June 23, 2023 17:27

test pre-commit hooks attempt03

b7b2fea

update list contributors

6ba86cc

testing the 'UserWarning: X does not have valid feature names, but Ne…

f2b6343

…arestCentroid was fitted with feature names'

Fix formatting.

5b1a240

Fix formatting for changelog

bf8d89e

Fix 'WARNING: Title underline too short.'

dee6a7e

Fix formatting so links in changelog are working.

183a00c

Formatting _nearest_centroid.py

cd322db

Fix formatting in changelog.

93d6144

fix links in changelog again

45071c6

Fix documentation formatting.

8634248

Merge branch 'main' into nearestcentroidfeat_02

159e215

black formatting

44e9de8

add v1.4.rst to resolve merge conflict

f0c5df7

update changelog

e0812c3

NoPenguinsLand changed the title ~~WIP Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator~~ MRG Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator Jun 26, 2023

glemaitre self-requested a review November 3, 2023 21:48

glemaitre reviewed Nov 4, 2023

View reviewed changes

Resolved merge conflict in test_nearest_centroid.py

fbf12e3

NoPenguinsLand added 5 commits May 20, 2024 22:41

pull latest updates and test push/merge conflicts

13cf0e4

Merge branch 'main' into nearestcentroidfeat_02

c167ab1

pull latest updates and test push/merge conflict

6cbd5ea

minor fixes

6c701d3

fix class_prior_ attribute

912104e

adrinjalali reviewed Oct 15, 2024

View reviewed changes

NoPenguinsLand added 7 commits October 18, 2024 02:07

Merge branch 'main' into nearestcentroidfeat_02

f15cdd1

pre-commit hook auto-fix

3129b4e

fix merge conflict AGAIN

585f2c5

remove files

64b184b

import validate data

ccfc8c0

pre-commit hooks and whatnots

36c7f03

replace self._validate_data with validate_data

25c9910

adrinjalali reviewed Oct 23, 2024

View reviewed changes

doc/whats_new/upcoming_changes/sklearn.neighbors/26689.enhancement.rst Outdated Show resolved Hide resolved

sklearn/discriminant_analysis.py Show resolved Hide resolved

NoPenguinsLand added 3 commits October 23, 2024 23:30

Merge remote-tracking branch 'upstream/main' into nearestcentroidfeat_02

ca57d5a

pre-commit hooks and whatnots

f01c8d4

fix doc

4abc04f

adrinjalali approved these changes Oct 24, 2024

View reviewed changes

glemaitre approved these changes Oct 28, 2024

View reviewed changes

glemaitre merged commit f3b1da3 into scikit-learn:main Oct 28, 2024
30 checks passed

NoPenguinsLand deleted the nearestcentroidfeat_02 branch October 28, 2024 18:56

Search code, repositories, users, issues, pull requests...

Uh oh!

ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689

ENH Add decision_function, predict_proba and predict_log_proba for NearestCentroid estimator #26689

Uh oh!

Conversation

NoPenguinsLand commented Jun 23, 2023 • edited by glemaitre Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

References

Uh oh!

github-actions bot commented Jun 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

NoPenguinsLand commented Jun 26, 2023

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NoPenguinsLand commented Nov 5, 2023

Uh oh!

NoPenguinsLand commented May 21, 2024

Uh oh!

NoPenguinsLand commented Oct 9, 2024

Uh oh!

glemaitre commented Oct 11, 2024

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali Oct 15, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Oct 16, 2024

Uh oh!

NoPenguinsLand commented Oct 16, 2024

Uh oh!

NoPenguinsLand commented Oct 18, 2024

Uh oh!

NoPenguinsLand commented Oct 18, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Oct 28, 2024

Uh oh!

NoPenguinsLand commented Oct 28, 2024

Uh oh!

Uh oh!

NoPenguinsLand commented Jun 23, 2023 •

edited by glemaitre

Loading

github-actions bot commented Jun 23, 2023 •

edited

Loading