Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

FIX Fixes unknown handling for str dtypes in OrdinalEncoder.transform #19888

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 18, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #19872

What does this implement/fix? Explain your changes.

When Xi is of dtype '<U1andself.categories_[i][0]` is 'AA', the following would collapse 'AA' into 'A':

Xi[~valid_mask] = self.categories_[i][0]

This PR converts Xi into an object dtype when it sees that self.categories_[i] is an object dtype.

@thomasjpfan thomasjpfan changed the title FIX Fixes unknown handling for str X in OrdinalEncoder.transform FIX Fixes unknown handling for str dtypes in OrdinalEncoder.transform Apr 13, 2021
@@ -347,6 +347,9 @@ Changelog
supporting sparse matrix and raise the appropriate error message.
:pr:`19879` by :user:`Guillaume Lemaitre <glemaitre>`.

- |Fix| :meth:`preprocessing.OrdinalEncoder.transfrom` now correctly handles
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you flag it for 0.24.2 instead?

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM apart of moving the whats new

@@ -150,6 +150,10 @@ def _transform(self, X, handle_unknown='error', force_all_finite=True,
if (self.categories_[i].dtype.kind in ('U', 'S')
and self.categories_[i].itemsize > Xi.itemsize):
Xi = Xi.astype(self.categories_[i].dtype)
elif (self.categories_[i].dtype.kind == 'O' and
Xi.dtype.kind == 'U'):
# categories are objects and Xi are numpy strings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could maybe add that otherwise the string will be truncated

@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Apr 14, 2021
@thomasjpfan thomasjpfan added this to the 0.24.2 milestone Apr 14, 2021
@adrinjalali adrinjalali merged commit 5d8796b into scikit-learn:main Apr 18, 2021
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Apr 19, 2021
…scikit-learn#19888)

* FIX Fixes unknown handling for str X in OrdinalEncoder.transform

* DOC Adds whats new

* DOC Move to 0.24.2

* DOC Adds reasoning in comment
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 22, 2021
…scikit-learn#19888)

* FIX Fixes unknown handling for str X in OrdinalEncoder.transform

* DOC Adds whats new

* DOC Move to 0.24.2

* DOC Adds reasoning in comment
glemaitre pushed a commit that referenced this pull request Apr 28, 2021
…#19888)

* FIX Fixes unknown handling for str X in OrdinalEncoder.transform

* DOC Adds whats new

* DOC Move to 0.24.2

* DOC Adds reasoning in comment
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:preprocessing To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ordinal encoder is not able to raising error inspite of the handle_unknown = 'use_encoded_value
3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.