Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG?: using None as replacement value in replace() typically upcasts to object dtype #60284

Copy link
Copy link
Open
@jorisvandenbossche

Description

@jorisvandenbossche
Issue body actions

I noticed that in certain cases, when replacing a value with None, that we always cast to object dtype, regardless of whether the dtype of the calling series can actually hold None (at least, when considering None just as a generic "missing value" indicator).

For example, a float Series can hold None in the sense of holding missing values, which is how None is treated in setitem:

>>> ser = pd.Series([1, 2, 3], dtype="float")
>>> ser[1] = None
>>> ser
0    1.0
1    NaN
2    3.0
dtype: float64

However, when using replace() to change the value 2.0 with None, it depends on the exact way to specify the to_replace/value combo, but typically it will upcast to object:

# with list
>>> ser.replace([1, 2], [10, None])
0    10.0
1    None
2     3.0
dtype: object

# with Series -> here it gives NaN but that is because the Series constructor already coerces the None
>>> ser.replace(pd.Series({1: 10, 2: None}))
0    10.0
1     NaN
2     3.0
dtype: float64

# with scalar replacements
>>> ser.replace(1, 10).replace(2, None)
0    10.0
1    None
2     3.0
dtype: object

In all the above cases, when replacing None with np.nan, it of course just results in a float Series with NaN.

The reason for this is two-fold. First, in Block._replace_coerce there is a check specifically for value is None and in that case we always cast to object dtype:

if value is None:
# gh-45601, gh-45836, gh-46634
if mask.any():
has_ref = self.refs.has_reference()
nb = self.astype(np.dtype(object))

The above is used when replacing with a list of values. But for the scalar case, we also cast to object dtype because in this case we check for if self._can_hold_element(value) to do the replacement with a simple setitem (and if not cast to object dtype first before trying again). But it seems that can_hold_element(np.array([], dtype=float), None) gives False ..


Everything is tested with current main (3.0.0.dev), but I see the same behaviour on older releases (2.0 and 1.5)


Somewhat related issue:

Metadata

Metadata

Assignees

No one assigned

    Labels

    API - ConsistencyInternal Consistency of API/BehaviorInternal Consistency of API/BehaviorBugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatenp.nan, pd.NaT, pd.NA, dropna, isnull, interpolatereplacereplace methodreplace method

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.