Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FIX GaussianMixture now normalizes weights_ directly instead of by n_samples #23034

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 9, 2022

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Apr 2, 2022

Reference Issues/PRs

Fixes #23032

What does this implement/fix? Explain your changes.

The implementation on main normalizes the weights_ attribute (which should form a pmf) by n_samples which lead to floating-point precision errors when fitting a single-component gaussian mixture on few samples (e.g. n_samples < 32). Now the weights_ array is normalized directly w.r.t its sum reduction. Added non-regression test.

Any other comments?

@Micky774 Micky774 changed the title |FIX| GaussianMixture now normalizes weights_ directly instead of by n_samples [FIX] GaussianMixture now normalizes weights_ directly instead of by n_samples Apr 3, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @Micky774

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
sklearn/mixture/tests/test_gaussian_mixture.py Outdated Show resolved Hide resolved
@jeremiedbb
Copy link
Member

jeremiedbb commented Apr 4, 2022

Note for future reviewers:
The error came from np.random.multinomial. When there are more than 1 component, multinomial ignores the last one and makes it such that probas sum to 1, so no issue there. When there a single component, proba has to be exactly 1. It was not the case before due to rounding errors. When weights has lenght 1, doing weights /= weights.sum() will give exactly 1.

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
Micky774 and others added 4 commits April 7, 2022 13:41
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb added the Quick Review For PRs that are quick to review label Apr 8, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

LGTM

@thomasjpfan thomasjpfan changed the title [FIX] GaussianMixture now normalizes weights_ directly instead of by n_samples FIX GaussianMixture now normalizes weights_ directly instead of by n_samples Apr 9, 2022
@thomasjpfan thomasjpfan merged commit 3872cd2 into scikit-learn:main Apr 9, 2022
@Micky774 Micky774 deleted the gmix_norm branch April 9, 2022 20:22
jjerphan pushed a commit to jjerphan/scikit-learn that referenced this pull request Apr 29, 2022
…y `n_samples` (scikit-learn#23034)

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:mixture Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GaussianMixture sample() ValueError on Models with 1 Component Fitted on <32 samples
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.