Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Gaussian Mixture - weighted implementation #17130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
Loading
from

Conversation

dupuisIRT
Copy link

@dupuisIRT dupuisIRT commented May 5, 2020

PR created to work on #15647

Implementation of a weighted GMM version. If a sample has a weight w, it means the sample is observed w times.

Based on the two papers:

Replicating the structure of Kmeans with a sample_weight argument.
Two main tests are added:

  • samples with zero weights do not have an impact
  • samples with integer weights should give the same result than a duplicated sample (ex: if a sample X1 has a weight of 3 and all other samples have a unit weight, weighted gmm should give the same result that the classical GMM with three X1 samples + the other samples with no modification)

For the moment, the work is still in progress, in particular regarding:

  • PEP8
  • the validity of the EM algorithm
  • the tests
  • the default behavior of sample_weight (if None)

@dupuisIRT dupuisIRT marked this pull request as draft May 5, 2020 10:45
@jnothman
Copy link
Member

For the moment, the work is still in progress, in particular regarding ...

Are you looking for feedback at this point? At a glance this is looking quite nice

- in test_weighted_vs_repeated
@dupuisIRT
Copy link
Author

Hi @jnothman , I think a quick feedback for the test test_weighted_vs_repeated in mixture/tests/test_gaussian_mixture.py is welcome.

@jnothman
Copy link
Member

(We have over time proposed generic testing for sample weight invariances, but it's not really happened outside of evaluation metrics.)

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments!

reg_covar=0, random_state=rng,
covariance_type=covar_type)

g1 = GaussianMixture(n_components=n_components, n_init=20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be a bit neater to just use sklearn.base.clone rather than repeat the construction.

g_weighted = g.fit(X, sample_weight=sample_weight)
g_repeated = g1.fit(X_repeat)

assert_allclose(np.sort(g_weighted.weights_),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use the stricter assert_allclose(g_weighted.weights_[arg_idx1], g_repeated.weights_[arg_idx2])?

n_features = rand_data.n_features
n_components = rand_data.n_components

for covar_type in COVARIANCE_TYPE:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using pytest.mark.parametrize might simplify your test a little.

Copy link
Author

@dupuisIRT dupuisIRT May 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reproduced this structure from other tests (for instance test_gaussian_mixture_fit_best_params).

I will generalize @pytest.mark.parametrize to the other tests.

assert_allclose(g_weighted.means_[arg_idx1],
g_repeated.means_[arg_idx2])

if covar_type == 'full':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't say we'd always go to this level of detail in ensuring models are identical.

It would be less repetitive if you defined the transformation once, then applied it to both precisions_ attributes.

def uniform_precisions(precisions):
    if covar_type == "tied":
        ...
    ...
    return np.trace(precisions, axis1=1, axis2=2).argsort()

But I also wonder whether we should just more approximately evaluate that the two models have identical likelihood functions applied to held-out data.

ecov = EmpiricalCovariance()
ecov.covariance_ = prec_weighted[h]
# the accuracy depends on the number of data and randomness, rng
assert_allclose(ecov.error_norm(prec_repeated[k]), 0, atol=0.001)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we not able to compare precisions directly once indexed by the appropriate arg_idx? I.e. can't we just assert_allclose(g_weighted.precisions_[arg_idx1], g_repeated.precisions_[arg_idx2]) except where tied? I don't know this code well, admittedly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I used a conservative approach as the Frobenius error norm was used in other tests, such as test_gaussian_mixture_fit. I changed for assert_allclose(g_weighted.precisions_[arg_idx1], g_repeated.precisions_[arg_idx2]) and it failed for some tests.

dupuisIRT added 2 commits May 11, 2020 19:20
- Add clone to avoid repeating GMM instance
- Change the way to assert weights_
- Add parametrize
- Add uniform_precisions function
@dupuisIRT
Copy link
Author

Thanks @jnothman for the comments, I tried to take them in consideration as much as possible.

@dupuisIRT dupuisIRT marked this pull request as ready for review May 12, 2020 12:16
@dupuisIRT
Copy link
Author

Hey @jnothman , I implemented the majority of your comments. Should I wait for other review or maybe should I try to reach some GMM developers?

@dupuisIRT dupuisIRT changed the title [WIP] Gaussian Mixture - weighted implementation Gaussian Mixture - weighted implementation Jun 24, 2020
Base automatically changed from master to main January 22, 2021 10:52
@cmarmo
Copy link
Contributor

cmarmo commented Mar 28, 2022

Hi @dupuisIRT , I'm sorry your pull request got lost. If you are still interested in finalize this work, do you mind synchronizing with upstream? Please note that in the meanwhile the branch master has been renamed as main.
Thank you very much for your work so far and your patience.

@dupuisIRT
Copy link
Author

Sure, I'll synchronize with upstream.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.