-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX Fixes unknown handling for str dtypes in OrdinalEncoder.transform #19888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Fixes unknown handling for str dtypes in OrdinalEncoder.transform #19888
Conversation
doc/whats_new/v1.0.rst
Outdated
@@ -347,6 +347,9 @@ Changelog | ||
supporting sparse matrix and raise the appropriate error message. | ||
:pr:`19879` by :user:`Guillaume Lemaitre <glemaitre>`. | ||
|
||
- |Fix| :meth:`preprocessing.OrdinalEncoder.transfrom` now correctly handles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you flag it for 0.24.2 instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM apart of moving the whats new
sklearn/preprocessing/_encoders.py
Outdated
@@ -150,6 +150,10 @@ def _transform(self, X, handle_unknown='error', force_all_finite=True, | ||
if (self.categories_[i].dtype.kind in ('U', 'S') | ||
and self.categories_[i].itemsize > Xi.itemsize): | ||
Xi = Xi.astype(self.categories_[i].dtype) | ||
elif (self.categories_[i].dtype.kind == 'O' and | ||
Xi.dtype.kind == 'U'): | ||
# categories are objects and Xi are numpy strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could maybe add that otherwise the string will be truncated
…scikit-learn#19888) * FIX Fixes unknown handling for str X in OrdinalEncoder.transform * DOC Adds whats new * DOC Move to 0.24.2 * DOC Adds reasoning in comment
…scikit-learn#19888) * FIX Fixes unknown handling for str X in OrdinalEncoder.transform * DOC Adds whats new * DOC Move to 0.24.2 * DOC Adds reasoning in comment
…#19888) * FIX Fixes unknown handling for str X in OrdinalEncoder.transform * DOC Adds whats new * DOC Move to 0.24.2 * DOC Adds reasoning in comment
Reference Issues/PRs
Fixes #19872
What does this implement/fix? Explain your changes.
When
Xi
is of dtype '<U1and
self.categories_[i][0]` is 'AA', the following would collapse 'AA' into 'A':This PR converts
Xi
into an object dtype when it sees thatself.categories_[i]
is an object dtype.