-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] Example of multiple imputation with IterativeImputer #13025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Paging @jnothman and @RianneSchouten. |
It might be good to amend that first commit with |
ecfdfc5
to
4f59d37
Compare
4f59d37
to
999bfa0
Compare
OK, I think that worked. |
I suspect that without |
I've created examples/impute/README.txt in the iterativeimputer branch. |
Oh, no. Did I break the doctest again? |
@jnothman I wondered back to this PR, and now it passes tests! Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes. |
@jnothman @glemaitre Any thoughts on my last comment? Repeated here: "Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes." |
I found the current docs ambiguous and believe the community would value this work. |
I'm not familiar with these build trigger checks. Can anyone please suggest how to fix it? |
Syncing with the |
Thank you @thomasjpfan! Any idea if we can get this merged once all the tests pass? |
This PR still need two approvals to get merged and I do not have a time estimate for that to happen. At a glance, I see two big tasks for this PR:
|
Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset. |
fetch_california is similar to load_boston, albeit larger
…On Mon, Jan 3, 2022 at 10:12 PM Sergey Feldman ***@***.***> wrote:
Thanks! I can make those changes. What regression dataset do you recommend
to replace Boston? Smaller is best because this example is hefty, but I can
always subsample any dataset.
—
Reply to this email directly, view it on GitHub
<#13025 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABURKX3GKTIT3F6WM3WK5DDUUJQQNANCNFSM4GRNRPRA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
I would like to see how this example integrates within the proposal in #21967. |
@glemaitre MICE is in the family of multiple imputation - perform imputation multiple times, then apply your subsequent pipeline multiple times also, and then have multiple solutions. For To summarize:
|
I agree that having an example defining what is "multiple imputations" is important to remove the confusion with the iterative procedure of the In this regard, I would prefer to have a single pipeline to make single imputation and then create a specific estimator to show how to make multiple imputations. We would not even need to use an I think that this is super important to point out in the discussion that the example stands at providing a definition of "multiple imputations with code" rather than showing that multiple imputations work better. I am not sure that currently in ML setting there is any evidence that multiple imputations are working better than using a strong learner (@GaelVaroquaux and @marineLM have better insights than me on this). |
Cc @A-pl (we need to put your paper on HAL) |
I'm a bit confused. We do have a single pipeline in example 2: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L303 And it's used multiple times to do MICE: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L315 Can you please clarify what you'd like changed? |
Adding to #11977. This PR is a restart of #11370, which got messy.
Here is a quote from #11370 that explains what this PR does: