Open
Description
Similar to #61099, but concerning lhs + rhs
. Alignment in general is heavily involved here as well. One thing to note is that unlike in comparisons operations, in arithmetic operations the lhs.index
dtype is favored, assuming no coercion is necessary.
dtypes = [
np.dtype(object),
pd.StringDtype("pyarrow", na_value=np.nan),
pd.StringDtype("python", na_value=np.nan),
pd.StringDtype("pyarrow", na_value=pd.NA),
pd.StringDtype("python", na_value=pd.NA),
pd.ArrowDtype(pa.string())
]
idx1 = pd.Series(["a", np.nan, "b"], dtype=dtypes[1])
idx2 = pd.Series(["a", np.nan, "b"], dtype=dtypes[3])
df1 = pd.DataFrame({"idx": idx1, "value": [1, 2, 3]}).set_index("idx")
df2 = pd.DataFrame({"idx": idx2, "value": [1, 2, 3]}).set_index("idx")
print(df1["value"] + df2["value"])
print(df2["value"] + df1["value"])
When concerning string dtypes, I've observed the following:
- NaN vs NA generally aligns, the value propagated is always NA
- NaN vs NA does not align when the NA arises from ArrowExtensionArray
- NaN vs None (object) aligns, the value propagated is from
lhs
- NA vs None does not align
- PyArrow-NA + ArrowExtensionArray results in object dtype (NAs do align)
- Python-NA + PyArrow-NA results in PyArrow-NA; contrary to the left being preferred
- Python-NA + PyArrow-NA results in object type (NAs do align)
- When
lhs
andrhs
have indices that are both object dtype:- NaN vs None aligns and propagates the
lhs
value. - NA vs None does not align
- NA vs NaN does not align
- NaN vs None aligns and propagates the
I think the main two things we need to decide are:
- How should NA vs NaN vs None align.
- When they do align, which value should be propagated.
A few properties I think are crucial:
- Alignment should only depend on value and left-vs-right operand, not storage.
- Alignment should be transitive.
If we do decide on aligning between different values, a natural order is None < NaN < NA
. However, the most backwards compatible would be to have None vs NaN be operand dependent with NA always propagating when present.
Metadata
Metadata
Assignees
Labels
Internal Consistency of API/BehaviorInternal Consistency of API/BehaviorRequires discussion from core team before further actionRequires discussion from core team before further actionString extension data type and string dataString extension data type and string data