Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit ede436f

Browse filesBrowse files
committed
FIX Fixes unknown handling for str dtypes in OrdinalEncoder.transform (scikit-learn#19888)
* FIX Fixes unknown handling for str X in OrdinalEncoder.transform * DOC Adds whats new * DOC Move to 0.24.2 * DOC Adds reasoning in comment
1 parent 2916ec8 commit ede436f
Copy full SHA for ede436f

File tree

4 files changed

+30
-1
lines changed
Filter options

4 files changed

+30
-1
lines changed

‎doc/whats_new/v0.24.rst

Copy file name to clipboardExpand all lines: doc/whats_new/v0.24.rst
+3Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ Changelog
5353
`'use_encoded_value'` strategies.
5454
:pr:`19234` by `Guillaume Lemaitre <glemaitre>`.
5555

56+
- |Fix| :meth:`preprocessing.OrdinalEncoder.transfrom` correctly handles
57+
unknown values for string dtypes. :pr:`19888` by `Thomas Fan`_.
58+
5659
:mod:`sklearn.multioutput`
5760
..........................
5861

‎doc/whats_new/v1.0.rst

Copy file name to clipboardExpand all lines: doc/whats_new/v1.0.rst
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@ Changelog
311311
:pr:`18649` by `Leandro Hermida <hermidalc>` and
312312
`Rodion Martynov <marrodion>`.
313313

314-
- |Fix| The `fit` method of the successive halving parameter search
314+
- |Fix| The `fit` method of the successive halving parameter search
315315
(:class:`model_selection.HalvingGridSearchCV`, and
316316
:class:`model_selection.HalvingRandomSearchCV`) now correctly handles the
317317
`groups` parameter. :pr:`19847` by :user:`Xiaoyu Chai <xiaoyuchai>`.

‎sklearn/preprocessing/_encoders.py

Copy file name to clipboardExpand all lines: sklearn/preprocessing/_encoders.py
+6Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -167,6 +167,12 @@ def _transform(self, X, handle_unknown='error', force_all_finite=True,
167167
if (self.categories_[i].dtype.kind in ('U', 'S')
168168
and self.categories_[i].itemsize > Xi.itemsize):
169169
Xi = Xi.astype(self.categories_[i].dtype)
170+
elif (self.categories_[i].dtype.kind == 'O' and
171+
Xi.dtype.kind == 'U'):
172+
# categories are objects and Xi are numpy strings.
173+
# Cast Xi to an object dtype to prevent truncation
174+
# when setting invalid values.
175+
Xi = Xi.astype('O')
170176
else:
171177
Xi = Xi.copy()
172178

‎sklearn/preprocessing/tests/test_encoders.py

Copy file name to clipboardExpand all lines: sklearn/preprocessing/tests/test_encoders.py
+20Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1580,3 +1580,23 @@ def test_ordinal_encoder_sparse():
15801580
X_trans_sparse = sparse.csr_matrix(X_trans)
15811581
with pytest.raises(TypeError, match=err_msg):
15821582
encoder.inverse_transform(X_trans_sparse)
1583+
1584+
1585+
@pytest.mark.parametrize("X_train", [
1586+
[['AA', 'B']],
1587+
np.array([['AA', 'B']], dtype='O'),
1588+
np.array([['AA', 'B']], dtype='U'),
1589+
])
1590+
@pytest.mark.parametrize("X_test", [
1591+
[['A', 'B']],
1592+
np.array([['A', 'B']], dtype='O'),
1593+
np.array([['A', 'B']], dtype='U'),
1594+
])
1595+
def test_ordinal_encoder_handle_unknown_string_dtypes(X_train, X_test):
1596+
"""Checks that ordinal encoder transforms string dtypes. Non-regression
1597+
test for #19872."""
1598+
enc = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-9)
1599+
enc.fit(X_train)
1600+
1601+
X_trans = enc.transform(X_test)
1602+
assert_allclose(X_trans, [[-9, 0]])

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.