Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FIX Correct GaussianMixture.weights_ normalization #24119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
Loading
from

Conversation

kasmith11
Copy link
Contributor

Reference Issues/PRs

Fixes #24085

What does this implement/fix? Explain your changes.

Weights that were normalized using n_samples now use weights_.sum()

Any other comments?

@Micky774 Micky774 added the Quick Review For PRs that are quick to review label Aug 5, 2022
@Micky774
Copy link
Contributor

Micky774 commented Aug 5, 2022

Do we know if we can actually attribute any errors to the old method? While the new one is safer and more aligned with some recent changes I'm just wondering if there's a viable non-regression test to include in this PR.

@jjerphan jjerphan changed the title #24085 - Gaussian Mixture - weights now computed using self.weights_.sum() FIX Correct GaussianMixture.weights_ normalization Aug 8, 2022
@jjerphan
Copy link
Member

jjerphan commented Aug 8, 2022

To me, we should choose a normalizer term such that after normalization np.abs(np.sum(self.weights_) - 1.0) is minimized. I think normalizing by the sum of weights makes sense in this case.

Regarding a viable non-regression test, I think it's fair to come with a minimal reproducible example (or wait for the issue's author to know if they already have one in response to #24085 (comment)) showing a faulty behaviour and then turn it into a test this PR would fix.

@kasmith11
Copy link
Contributor Author

Would it be helpful if I develop a minimal reproducible example? My plan was to do something similar to the gist linked in #23032 under Steps/Code to Reproduce? Thoughts?

@jjerphan
Copy link
Member

@kasmith11: Yes, definitely: it would be helpful!

@kasmith11
Copy link
Contributor Author

@jjerphan I've created a minimal reproducible example that prints self.weights_ of GaussianMixture in the following gist that heavily leans on the example outlined in the Sklearn documentation. This prints the weights of a trained GaussianMixture model before any adjustments to the way self.weights_ is calculated.

However, I'm having trouble discerning how I can attribute any errors to the older method as outlined by @Micky774? Would I need to add an additional gist of the weights after changing how self.weights_ is calculated?

@jjerphan jjerphan added Waiting for Reviewer and removed Quick Review For PRs that are quick to review labels Aug 23, 2022
@Micky774
Copy link
Contributor

@jjerphan I've created a minimal reproducible example that prints self.weights_ of GaussianMixture in the following gist that heavily leans on the example outlined in the Sklearn documentation. This prints the weights of a trained GaussianMixture model before any adjustments to the way self.weights_ is calculated.

However, I'm having trouble discerning how I can attribute any errors to the older method as outlined by @Micky774? Would I need to add an additional gist of the weights after changing how self.weights_ is calculated?

The goal is to have some script that demonstrates "that's where the old method went wrong". For example, even just finding a reasonable use-case where the weights sum up to be something other than 1 would be sufficient.

In my opinion, this is a really simple change that is made out of an abundance of caution, and well-motivated abstractly, so I think it's worth including with or without a demonstrable failure. We still need input from core maintainers though.

@betatim
Copy link
Member

betatim commented Nov 1, 2022

Should we merge this? It seems like there is consensus that even without a example of where the old behaviour is incorrect this change is desirable. In particular if other parts of this estimator normalise the weights to sum to 1, it makes sense to me to do this everywhere.

I can't construct an example off the top of my head either. However in particle physics the question of how to normalise weights comes up a bit. I think there the thing to get right is that the normalisation of different groups of samples (e.g. for different classes) is consistent with each other. Otherwise you end up giving one group a much higher weight relative to the other. But, I can't make an example right now :-/

@Micky774
Copy link
Contributor

Micky774 commented Nov 1, 2022

Should we merge this? It seems like there is consensus that even without a example of where the old behaviour is incorrect this change is desirable.

I am fine with advancing this PR. I wonder if the code on main generates incorrect behavior when log_resp contains zeros? I can check later, but if so we may be able to add that as a non-regression test.

In particular if other parts of this estimator normalise the weights to sum to 1, it makes sense to me to do this everywhere.

For clarification, this still does normalize to one, since at this stage it is expected that weights are almost exactly n_samples, it's more of a semantic consideration regarding normalize to one explicitly (this PR's change) vs implicitly (what is on main).


self.weights_ = weights if self.weights_init is None else self.weights_init
self.means_ = means if self.means_init is None else self.means_init
self.weights_ /= self.weights_.sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can weights be negative? If so, I think this might be more appropriate:

Suggested change
self.weights_ /= self.weights_.sum()
self.weights_ /= np.abs(self.weights_).sum()

cc @jeremiedbb which whom we discussed normalization with negative weights.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC self.weights_ should be non-negative at this point. Not completely sure. I will try to take a look later, time permitting (though this isn't too high on my list)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: does anyone know of applications that are wide spread and use negative sample weights?

The only people I know who have negative sample weights are (some) researchers in High Energy Particle physics. From my memory the interpretation of what those samples with negative weights mean is tricky. The consensus I remember is that a group of samples should always have a net positive weight (but this isn't always true), but that you shouldn't take the absolute value of a individual sample's weight. The reason for not using abs(weight) is that a possible interpretation of a negative sample weight is that you are trying to "subtract" that sample from your final group of samples. (I think the "deep physics" reasons for how you end up with them is that quantum mechanics has constructive and destructive interference terms, and depending on how your sample generator is written you end up with samples with a negative weight that are meant to represent negative interference, or something like that.)

Basically, are negative weights something worth spending time on or best to punt on it till there are use-cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, are negative weights something worth spending time on or best to punt on it till there are use-cases?

AFAIK not really, but I wonder what prompted @jjerphan and @jeremiedbb's conversations on normalizing negative weights.

Copy link
Member

@jjerphan jjerphan Nov 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @betatim described prompted those discussions; I think @betatim definitely is more knowledgeable than me on this subject.

I think, use-cases relying on negative weights are relatively rare, and this might not be something we would like to maintain relevant normalization for if this normalization is complex.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good to assume that weights are always positive and use that to reduce complexity/additional computation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we avoid (re-normalizing) user provided weights_init (see my review below), with the current capabilities of Mixture classes like GMM, I don't see how its possible to get negative weights. Thus I hope this isn't something we have to consider one way or another.

And I would really push to not re-normalize user provided weights_init, and especially to not make rigid decisions on how to re-normalize negative vs. positive weights, etc. Otherwise we take away the ability to precisely define an initial state for the EM algorithm.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The negative weights come back all the time ;). The last report was the following: #18934

If really only want positive weight, we should use only_non_negative=True in _check_sample_weight. I assume that we should come up with a PR allowing negative weight for people that really want it but without any guarantee that the normalization makes any sense. If an algorithm was written to take into account negative weight then we can always make sure that the normalization is properly done as per the literature.

Copy link
Contributor

@emirkmo emirkmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the normalization step should re-normalize user-provided weights. We expect the user to do this and they may have normalization preferences. So I think the normalization should move back up and only apply to the initializer, not to pre-initialized user input.

However if the user provides responsibilities (which is currently not possible but hopefully added in PR #24812), then I guess some normalization can be expected similar to passing in 'kmeans' or whatever.


self.weights_ = weights if self.weights_init is None else self.weights_init
self.means_ = means if self.means_init is None else self.means_init
self.weights_ /= self.weights_.sum()
Copy link
Contributor

@emirkmo emirkmo Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The normalization should not be done after the user input initialization! We expect normalized weights to be given in weights_init. It's previous location looks correct to me.

Suggested change
self.weights_ /= self.weights_.sum()

@@ -715,10 +715,10 @@ def _initialize(self, X, resp):
weights, means, covariances = _estimate_gaussian_parameters(
X, resp, self.reg_covar, self.covariance_type
)
weights /= n_samples
Copy link
Contributor

@emirkmo emirkmo Nov 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that, as with the other comment, the weights normalization is done here. We expect normalized weights to be input in weights_init.

Suggested change
weights /= n_samples
weights /= weights.sum()

As per the negative weights, by moving the normalization step here, we largely avoid this problem, as the current inputs shouldn't really provide negative weights as far as I am aware. (only applying normalization if we run the _estimate_gaussian_parameters step.), but not re-normalizing what might be carefully normalized user weights),

However, PR #24812 making GMM initialization more robust (allowing users to pass in their own responsibilities matrix, or to use a custom callable, such as a pre-fitted K-means) will make this matter. So it's worth considering the issue here.

@glemaitre glemaitre self-requested a review December 29, 2022 10:09
@glemaitre
Copy link
Member

I agree with @emirkmo regarding the location of the normalization. I would not expect a to renormalize user-defined weights.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned in the issue, I find it OK to delegate the responsibility of the normalization to the user. However, algorithmically since each M subsequent step will renormalize using the sum of the weight, I intuitively don't understand how a normalization that does not use this approach would be helpful.

If this is indeed hurting the convergence we should either raise a warning if we want to allow unnormalized weights or an error if we don't want to.

@adrinjalali adrinjalali added Stalled Needs Decision Requires decision Moderate Anything that requires some knowledge of conventions and best practices and removed Waiting for Reviewer labels Jul 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Moderate Anything that requires some knowledge of conventions and best practices module:mixture Needs Decision Requires decision Stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Weights are being normalized using number of samples as opposed to sum in GaussianMixture
7 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.