-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Create public kmeans_plusplus including index output #17937
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Create public kmeans_plusplus including index output #17937
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @g-walsh, looking good. I made a first pass of comments.
I think we can also rename it to be more explicit, like _kmeans_plusplus
.
It will require some tests.
Also making it public also requires adding it to the doc, especially in doc/modules/classes.rst
.
There's also a little paragraph about k-means++ in doc/module/clustering.rst
, in KMeans' section. We could mention the kmeans_plusplus
function here.
Thanks for the review
@jeremiedbb just to be clear you mean renaming One question for the docstrings, are you happy for them to be essentially the same text for both I'll work on the testing and full documentation/examples next. |
yes :)
It should yes, the exception being the type of the parameters. For instance, random_state doesn't have to be a RandomState instance but can also be an int. |
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
…eansplusplus_public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A new batch of comments.
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
If you're happy with the test coverage then I'll next start working on adding the documentation and examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Just small comments.
@g-walsh that's nice ! I'm not sure that it requires a example on its own but others might disagree, let's wait for the opinion of other reviewers. scikit-learn/sklearn/cluster/_kmeans.py Line 746 in f605021
other than that looks good to me, thanks ! |
…eansplusplus_public
I merged master to trigger a rebuild. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the following comments.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm also. @g-walsh please add a what's entry.
Thanks @g-walsh ! |
…xture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
…sian Mixture Model PR#11101, focused on allowing sample-based and k-means++ based GMM initialization.. Also made the enhancements compatible with PRs scikit-learn#17937 and scikit-learn#20030.
Reference Issues/PRs
Fixes #17915
See also #11101
What does this implement/fix? Explain your changes.
Implement a public
kmeans_plusplus
function so that it can be used more widely (as explained in #17915) and for use as initialisation outside of KMeans (for example for gmm in #11101)Any other comments?
Still very much a WIP but looking for quick review/suggestions on this before much further is done. It's currently a function in
_kmeans.py
but unsure if this is an appropriate implementation.Still todo: