Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[MRG] Create public kmeans_plusplus including index output #17937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Nov 11, 2020

Conversation

g-walsh
Copy link
Contributor

@g-walsh g-walsh commented Jul 16, 2020

Reference Issues/PRs

Fixes #17915
See also #11101

What does this implement/fix? Explain your changes.

Implement a public kmeans_plusplus function so that it can be used more widely (as explained in #17915) and for use as initialisation outside of KMeans (for example for gmm in #11101)

Any other comments?

Still very much a WIP but looking for quick review/suggestions on this before much further is done. It's currently a function in _kmeans.py but unsure if this is an appropriate implementation.

Still todo:

  • Docstrings (essentially the same as the private _k_init ?)
  • Tests
  • Full documentation and examples

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @g-walsh, looking good. I made a first pass of comments.

I think we can also rename it to be more explicit, like _kmeans_plusplus.
It will require some tests.

Also making it public also requires adding it to the doc, especially in doc/modules/classes.rst.
There's also a little paragraph about k-means++ in doc/module/clustering.rst, in KMeans' section. We could mention the kmeans_plusplus function here.

sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
@g-walsh
Copy link
Contributor Author

g-walsh commented Jul 17, 2020

Thanks for the review

I think we can also rename it to be more explicit, like _kmeans_plusplus.

@jeremiedbb just to be clear you mean renaming _k_init to _kmeans_plusplus?

One question for the docstrings, are you happy for them to be essentially the same text for both _k_init and kmeans_plusplus?

I'll work on the testing and full documentation/examples next.

@jeremiedbb
Copy link
Member

just to be clear you mean renaming _k_init to _kmeans_plusplus?

yes :)

One question for the docstrings, are you happy for them to be essentially the same text for both _k_init and kmeans_plusplus?

It should yes, the exception being the type of the parameters. For instance, random_state doesn't have to be a RandomState instance but can also be an int.

sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A new batch of comments.

sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
g-walsh and others added 11 commits July 22, 2020 09:24
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@g-walsh
Copy link
Contributor Author

g-walsh commented Jul 22, 2020

If you're happy with the test coverage then I'll next start working on adding the documentation and examples

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Just small comments.

sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
@jeremiedbb
Copy link
Member

@g-walsh that's nice !

I'm not sure that it requires a example on its own but others might disagree, let's wait for the opinion of other reviewers.
It also needs an example in the docstring. We are trying to put examples in the docstring of each public function/class of sklearn. You could simply adapt this example:

other than that looks good to me, thanks !

@glemaitre glemaitre self-requested a review October 20, 2020 13:50
@ogrisel ogrisel requested review from ogrisel and removed request for glemaitre October 20, 2020 13:50
@ogrisel ogrisel changed the title [WIP] Create public kmeans_plusplus including index output [MRG] Create public kmeans_plusplus including index output Oct 20, 2020
@ogrisel
Copy link
Member

ogrisel commented Oct 20, 2020

I merged master to trigger a rebuild.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides the following comments.

sklearn/cluster/tests/test_k_means.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/_kmeans.py Outdated Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Show resolved Hide resolved
sklearn/cluster/tests/test_k_means.py Show resolved Hide resolved
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm also. @g-walsh please add a what's entry.

@jeremiedbb jeremiedbb merged commit 11f9742 into scikit-learn:master Nov 11, 2020
@jeremiedbb
Copy link
Member

Thanks @g-walsh !

alceballosa added a commit to alceballosa/scikit-learn that referenced this pull request Jun 26, 2021
…xture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
alceballosa added a commit to alceballosa/scikit-learn that referenced this pull request Aug 3, 2021
…sian Mixture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

K-means++ initialization: return indices
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.