Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign up[MRG] Adds KNNImputer #12852
[MRG] Adds KNNImputer #12852
Conversation
|
This is another comment I had pending. Not sure when I'll give it another full pass |
| return X[:, valid_idx] | ||
|
|
||
| def _more_tags(self): | ||
| return {'allow_nan': True} |
This comment has been minimized.
This comment has been minimized.
jnothman
Aug 18, 2019
Member
| return {'allow_nan': True} | |
| return {'allow_nan': is_scalar_nan(self.missing_values)} |
| Parameters | ||
| ---------- | ||
| dist_pot_donors : array-like, shape=(n_receivers, n_train_samples) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
|
@jnothman scikit-learn/sklearn/utils/validation.py Line 927 in 35c0ca0 |
This comment has been minimized.
This comment has been minimized.
|
tests are failing... |
This comment has been minimized.
This comment has been minimized.
|
Test failure was unrelated, now that #14689 is merged, merging this with master should fix it. |
| least one neighbor with a defined distance, the weighted or unweighted average | ||
| of the remaining neighbors will be used during imputation. If a feature is | ||
| always missing in training, it is removed during `transform`. For more | ||
| information on the methodology, see ref. [OL2001]_. |
This comment has been minimized.
This comment has been minimized.
jnothman
Aug 28, 2019
Member
Do they consider distance weighting? It might be worth noting the differences...
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
thomasjpfan
Sep 3, 2019
•
Author
Member
Gerhard Tutz, Shahla Ramzan,
Improved Methods for the Imputation of Missing Data by Nearest Neighbor Methods shows in their experiments that weighted kNN performs a little better than unweighted.
This comment has been minimized.
This comment has been minimized.
thomasjpfan
Sep 3, 2019
Author
Member
Lorenzo Beretta* and Alessandro Santaniello, Nearest neighbor imputation algorithms: a critical evaluation, shows that weighted is slightly better than unweighted.
| ]) | ||
| knn = KNNImputer(missing_values=na, n_neighbors=2).fit(X) | ||
|
|
||
| X_transform = knn.transform(X) |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
| dist = nan_euclidean_distances(X) | ||
| r1c1_nbor_dists = dist[1, [0, 2, 3, 4, 5]] | ||
| r1c3_nbor_dists = dist[1, [0, 3, 4, 5, 6]] | ||
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) |
This comment has been minimized.
This comment has been minimized.
jnothman
Aug 28, 2019
Member
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) | |
| r1c1_nbor_wt = 1 / r1c1_nbor_dists |
| r1c1_nbor_dists = dist[1, [0, 2, 3, 4, 5]] | ||
| r1c3_nbor_dists = dist[1, [0, 3, 4, 5, 6]] | ||
| r1c1_nbor_wt = (1 / r1c1_nbor_dists) | ||
| r1c3_nbor_wt = (1 / r1c3_nbor_dists) |
This comment has been minimized.
This comment has been minimized.
jnothman
Aug 28, 2019
Member
| r1c3_nbor_wt = (1 / r1c3_nbor_dists) | |
| r1c3_nbor_wt = 1 / r1c3_nbor_dists |
|
|
||
|
|
||
| @pytest.mark.parametrize("na", [-1, np.nan]) | ||
| def test_knn_imputer_with_weighted_features(na): |
This comment has been minimized.
This comment has been minimized.
jnothman
Aug 28, 2019
Member
"weighted features" seems to be a strange name. This also appears to overlap in purpose with test_knn_imputer_weight_distance. What's the distinction?
This comment has been minimized.
This comment has been minimized.
thomasjpfan
Sep 3, 2019
Author
Member
There was not a distinction. test_knn_imputer_with_weighted_features was removed and the tests from there was moved into test_knn_imputer_weight_distance
|
Otherwise LGTM!!! |
This comment has been minimized.
This comment has been minimized.
|
The mask of |
|
I think this is good to hit the road. Congrats @ashimb9 and thanks @thomasjpfan for finishing it off! |
2d1b9e3
into
scikit-learn:master
This comment has been minimized.
This comment has been minimized.
|
yayyyy finally ;) |
This comment has been minimized.
This comment has been minimized.
|
Glad to see this one in, too!
Danilo
… Am 04.09.2019 um 6:43 PM schrieb Andreas Mueller ***@***.***>:
yayyyy finally ;)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub <#12852?email_source=notifications&email_token=AA5Z6RABX22LODMU7SHEIDDQH7QSZA5CNFSM4GL4N2AKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD54GS7A#issuecomment-527985020>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA5Z6RB4ZL7CGTRMJDWPFODQH7QSZANCNFSM4GL4N2AA>.
|
This comment has been minimized.
This comment has been minimized.
chitcode
commented
Sep 15, 2019
|
Very happy to see this. |


thomasjpfan commentedDec 21, 2018
•
edited by jnothman
Reference Issues/PRs
Continues #9348, and a part of #9212
Closes #2989
Closes #9348
Closes #9212
What does this implement/fix? Explain your changes.
This PR cleans up the work done on #9348 and #9212:
Edit:
KNNImputerhas been added this to PR with the following updates: