Closed
Description
So consider this example of a small dataframe with a nullable integer column:
def recreate_df():
return pd.DataFrame({'int': [1, 2, 3], 'int2': [3, 4, 5],
'float': [.1, .2, .3],
'EA': pd.array([1, 2, None], dtype="Int64")
})
Assigning a new column with __setitem__
(df[col] = ...
) normally does not even preserve the dtype:
In [2]: df = recreate_df()
...: df['EA'] = np.array([1., 2., 3.])
...: df['EA'].dtype
Out[2]: dtype('float64')
In [3]: df = recreate_df()
...: df['EA'] = np.array([1, 2, 3])
...: df['EA'].dtype
Out[3]: dtype('int64')
When assigning a new nullable integer array, it of course keeps the dtype of the assigned values:
In [4]: df = recreate_df()
...: df['EA'] = pd.array([1, 2, 3], dtype="Int64")
...: df['EA'].dtype
Out[4]: Int64Dtype()
However, in this case you now also have the tricky side-effect of being in place:
In [5]: df = recreate_df()
...: original_arr = df.EA.array
...: df['EA'] = pd.array([1, 2, 3], dtype="Int64")
...: original_arr is df.EA.array
Out[5]: True
I don't think this behaviour should depend on the values being set, and setitem should always replace the array of the ExtensionBlock.
Because with the above way, you can unexpectedly alter the data with which you created the dataframe. See also a different example using Categorical of this at the original PR that introduced this: #32831 (comment)
Metadata
Metadata
Assignees
Labels
May be closeable, needs more eyeballsMay be closeable, needs more eyeballsRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version