-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX Correct GaussianMixture.weights_
normalization
#24119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Do we know if we can actually attribute any errors to the old method? While the new one is safer and more aligned with some recent changes I'm just wondering if there's a viable non-regression test to include in this PR. |
GaussianMixture.weights_
normalization
To me, we should choose a normalizer term such that after normalization Regarding a viable non-regression test, I think it's fair to come with a minimal reproducible example (or wait for the issue's author to know if they already have one in response to #24085 (comment)) showing a faulty behaviour and then turn it into a test this PR would fix. |
Would it be helpful if I develop a minimal reproducible example? My plan was to do something similar to the gist linked in #23032 under |
@kasmith11: Yes, definitely: it would be helpful! |
@jjerphan I've created a minimal reproducible example that prints However, I'm having trouble discerning how I can attribute any errors to the older method as outlined by @Micky774? Would I need to add an additional gist of the weights after changing how self.weights_ is calculated? |
The goal is to have some script that demonstrates "that's where the old method went wrong". For example, even just finding a reasonable use-case where the weights sum up to be something other than 1 would be sufficient. In my opinion, this is a really simple change that is made out of an abundance of caution, and well-motivated abstractly, so I think it's worth including with or without a demonstrable failure. We still need input from core maintainers though. |
Should we merge this? It seems like there is consensus that even without a example of where the old behaviour is incorrect this change is desirable. In particular if other parts of this estimator normalise the weights to sum to 1, it makes sense to me to do this everywhere. I can't construct an example off the top of my head either. However in particle physics the question of how to normalise weights comes up a bit. I think there the thing to get right is that the normalisation of different groups of samples (e.g. for different classes) is consistent with each other. Otherwise you end up giving one group a much higher weight relative to the other. But, I can't make an example right now :-/ |
I am fine with advancing this PR. I wonder if the code on main generates incorrect behavior when
For clarification, this still does normalize to one, since at this stage it is expected that weights are almost exactly |
|
||
self.weights_ = weights if self.weights_init is None else self.weights_init | ||
self.means_ = means if self.means_init is None else self.means_init | ||
self.weights_ /= self.weights_.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can weights
be negative? If so, I think this might be more appropriate:
self.weights_ /= self.weights_.sum() | |
self.weights_ /= np.abs(self.weights_).sum() |
cc @jeremiedbb which whom we discussed normalization with negative weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC self.weights_
should be non-negative at this point. Not completely sure. I will try to take a look later, time permitting (though this isn't too high on my list)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side comment: does anyone know of applications that are wide spread and use negative sample weights?
The only people I know who have negative sample weights are (some) researchers in High Energy Particle physics. From my memory the interpretation of what those samples with negative weights mean is tricky. The consensus I remember is that a group of samples should always have a net positive weight (but this isn't always true), but that you shouldn't take the absolute value of a individual sample's weight. The reason for not using abs(weight)
is that a possible interpretation of a negative sample weight is that you are trying to "subtract" that sample from your final group of samples. (I think the "deep physics" reasons for how you end up with them is that quantum mechanics has constructive and destructive interference terms, and depending on how your sample generator is written you end up with samples with a negative weight that are meant to represent negative interference, or something like that.)
Basically, are negative weights something worth spending time on or best to punt on it till there are use-cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, are negative weights something worth spending time on or best to punt on it till there are use-cases?
AFAIK not really, but I wonder what prompted @jjerphan and @jeremiedbb's conversations on normalizing negative weights.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What @betatim described prompted those discussions; I think @betatim definitely is more knowledgeable than me on this subject.
I think, use-cases relying on negative weights are relatively rare, and this might not be something we would like to maintain relevant normalization for if this normalization is complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good to assume that weights are always positive and use that to reduce complexity/additional computation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we avoid (re-normalizing) user provided weights_init
(see my review below), with the current capabilities of Mixture classes like GMM, I don't see how its possible to get negative weights. Thus I hope this isn't something we have to consider one way or another.
And I would really push to not re-normalize user provided weights_init
, and especially to not make rigid decisions on how to re-normalize negative vs. positive weights, etc. Otherwise we take away the ability to precisely define an initial state for the EM algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The negative weights come back all the time ;). The last report was the following: #18934
If really only want positive weight, we should use only_non_negative=True
in _check_sample_weight
. I assume that we should come up with a PR allowing negative weight for people that really want it but without any guarantee that the normalization makes any sense. If an algorithm was written to take into account negative weight then we can always make sure that the normalization is properly done as per the literature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the normalization step should re-normalize user-provided weights. We expect the user to do this and they may have normalization preferences. So I think the normalization should move back up and only apply to the initializer, not to pre-initialized user input.
However if the user provides responsibilities (which is currently not possible but hopefully added in PR #24812), then I guess some normalization can be expected similar to passing in 'kmeans' or whatever.
|
||
self.weights_ = weights if self.weights_init is None else self.weights_init | ||
self.means_ = means if self.means_init is None else self.means_init | ||
self.weights_ /= self.weights_.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The normalization should not be done after the user input initialization! We expect normalized weights to be given in weights_init
. It's previous location looks correct to me.
self.weights_ /= self.weights_.sum() |
@@ -715,10 +715,10 @@ def _initialize(self, X, resp): | ||
weights, means, covariances = _estimate_gaussian_parameters( | ||
X, resp, self.reg_covar, self.covariance_type | ||
) | ||
weights /= n_samples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that, as with the other comment, the weights normalization is done here. We expect normalized weights to be input in weights_init
.
weights /= n_samples | |
weights /= weights.sum() |
As per the negative weights, by moving the normalization step here, we largely avoid this problem, as the current inputs shouldn't really provide negative weights as far as I am aware. (only applying normalization if we run the _estimate_gaussian_parameters step.), but not re-normalizing what might be carefully normalized user weights),
However, PR #24812 making GMM initialization more robust (allowing users to pass in their own responsibilities matrix, or to use a custom callable
, such as a pre-fitted K-means) will make this matter. So it's worth considering the issue here.
I agree with @emirkmo regarding the location of the normalization. I would not expect a to renormalize user-defined weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in the issue, I find it OK to delegate the responsibility of the normalization to the user. However, algorithmically since each M subsequent step will renormalize using the sum of the weight, I intuitively don't understand how a normalization that does not use this approach would be helpful.
If this is indeed hurting the convergence we should either raise a warning if we want to allow unnormalized weights or an error if we don't want to.
Reference Issues/PRs
Fixes #24085
What does this implement/fix? Explain your changes.
Weights that were normalized using
n_samples
now useweights_.sum()
Any other comments?