Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[WIP] Example of multiple imputation with IterativeImputer #13025

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
Loading
from

Conversation

sergeyf
Copy link
Contributor

@sergeyf sergeyf commented Jan 21, 2019

Adding to #11977. This PR is a restart of #11370, which got messy.

Here is a quote from #11370 that explains what this PR does:

This PR is an example that shows how to use IterativeImputer for Multiple Imputation.

As discussed in #11259, the defaults of IterativeImputer are such that single imputation is performed. Because the method is also quite powerful for Multiple Imputation, we agreed to make an example that shows the user how to use ImputerImputer to perform Multiple Imputation.

I made the document: examples/impute/plot_multiple_imputation.py and it shows 2 things:

Estimation of beta estimates and their standard error: compare IterativeImputer with using IterativeImputer as a MICE Imputer.
How to use IterativeImputer as a MICE Imputer when making a prediction model (with train and test datasets).

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 21, 2019

Paging @jnothman and @RianneSchouten.

@jnothman
Copy link
Member

It might be good to amend that first commit with --author 'Rianne Schouten' etc. Thanks for this! Will look soon!

@sergeyf sergeyf force-pushed the iterativeimputer_mice_example branch from ecfdfc5 to 4f59d37 Compare January 21, 2019 23:35
@sergeyf sergeyf force-pushed the iterativeimputer_mice_example branch from 4f59d37 to 999bfa0 Compare January 21, 2019 23:36
@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 21, 2019

OK, I think that worked.

@jnothman jnothman changed the title first commit Example of multiple imputation with IterativeImputer Jan 22, 2019
@jnothman
Copy link
Member

I suspect that without examples/impute/README.txt the example won't render.

@jnothman
Copy link
Member

I've created examples/impute/README.txt in the iterativeimputer branch.

@jnothman
Copy link
Member

Oh, no. Did I break the doctest again?

@sergeyf sergeyf closed this Aug 26, 2019
@sergeyf sergeyf reopened this Aug 26, 2019
@sergeyf
Copy link
Contributor Author

sergeyf commented Sep 25, 2020

@jnothman I wondered back to this PR, and now it passes tests!

Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes.

@sergeyf
Copy link
Contributor Author

sergeyf commented Dec 4, 2020

@jnothman @glemaitre Any thoughts on my last comment? Repeated here: "Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes."

Base automatically changed from master to main January 22, 2021 10:50
@nxorable
Copy link
Contributor

nxorable commented Jan 3, 2022

I found the current docs ambiguous and believe the community would value this work.

@sergeyf sergeyf closed this Jan 3, 2022
@sergeyf sergeyf reopened this Jan 3, 2022
@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 3, 2022

I'm not familiar with these build trigger checks. Can anyone please suggest how to fix it?

@thomasjpfan
Copy link
Member

Can anyone please suggest how to fix it?

Syncing with the main branch should fix the build trigger error.

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 4, 2022

Thank you @thomasjpfan! Any idea if we can get this merged once all the tests pass?

@thomasjpfan
Copy link
Member

Any idea if we can get this merged once all the tests pass?

This PR still need two approvals to get merged and I do not have a time estimate for that to happen.

At a glance, I see two big tasks for this PR:

  1. It needs to use another dataset instead of load_boston because load_boston has been deprecated and will be removed in 1.2
  2. In the recent years, we have been moving to more narrative driven examples, where text to place together with code. For example: Common pitfalls in the interpretation of coefficients of linear models. This means moving text from the big paragraph in the beginning and placing them into the text to create a narrative.

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 4, 2022

Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset.

@nxorable
Copy link
Contributor

nxorable commented Jan 4, 2022 via email

@glemaitre
Copy link
Member

Any idea if we can get this merged once all the tests pass?

I would like to see how this example integrates within the proposal in #21967.

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 4, 2022

@glemaitre MICE is in the family of multiple imputation - perform imputation multiple times, then apply your subsequent pipeline multiple times also, and then have multiple solutions. For sklearn users the subsequent pipeline will often be "train/val/test a ML alg". I read through #21967 and multiple imputation isn't mentioned at all, but it is common in the stats world as far as I understand. This example would be useful because people coming from stats might want to do what the mice R package does, but don't know how with IterativeImputer as it does not do it out of the box.

To summarize:

@glemaitre
Copy link
Member

I agree that having an example defining what is "multiple imputations" is important to remove the confusion with the iterative procedure of the IterativeImputer and that there is no confusion. In this regard, I find the example too complex.

In this regard, I would prefer to have a single pipeline to make single imputation and then create a specific estimator to show how to make multiple imputations. We would not even need to use an IterativeImputer in this case. This would speed up the example that is currently taking up to 3 minutes while we usually try to have examples running under 30 seconds.

I think that this is super important to point out in the discussion that the example stands at providing a definition of "multiple imputations with code" rather than showing that multiple imputations work better. I am not sure that currently in ML setting there is any evidence that multiple imputations are working better than using a strong learner (@GaelVaroquaux and @marineLM have better insights than me on this).

@GaelVaroquaux
Copy link
Member

Cc @A-pl (we need to put your paper on HAL)

@sergeyf
Copy link
Contributor Author

sergeyf commented Jan 5, 2022

I'm a bit confused. We do have a single pipeline in example 2: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L303

And it's used multiple times to do MICE: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L315

Can you please clarify what you'd like changed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Morty Proxy This is a proxified and sanitized view of the page, visit original site.