Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

REGR: setting column with setitem should not modify existing array inplace #33457

Copy link
Copy link
Closed
@jorisvandenbossche

Description

@jorisvandenbossche
Issue body actions

So consider this example of a small dataframe with a nullable integer column:

def recreate_df():
    return pd.DataFrame({'int': [1, 2, 3], 'int2': [3, 4, 5],
                         'float': [.1, .2, .3],
                         'EA': pd.array([1, 2, None], dtype="Int64")
                        })

Assigning a new column with __setitem__ (df[col] = ...) normally does not even preserve the dtype:

In [2]: df = recreate_df() 
   ...: df['EA'] = np.array([1., 2., 3.]) 
   ...: df['EA'].dtype 
Out[2]: dtype('float64')

In [3]: df = recreate_df() 
   ...: df['EA'] = np.array([1, 2, 3]) 
   ...: df['EA'].dtype
Out[3]: dtype('int64')

When assigning a new nullable integer array, it of course keeps the dtype of the assigned values:

In [4]: df = recreate_df() 
   ...: df['EA'] = pd.array([1, 2, 3], dtype="Int64") 
   ...: df['EA'].dtype 
Out[4]: Int64Dtype()

However, in this case you now also have the tricky side-effect of being in place:

In [5]: df = recreate_df() 
   ...: original_arr = df.EA.array 
   ...: df['EA'] = pd.array([1, 2, 3], dtype="Int64") 
   ...: original_arr is df.EA.array  
Out[5]: True

I don't think this behaviour should depend on the values being set, and setitem should always replace the array of the ExtensionBlock.

Because with the above way, you can unexpectedly alter the data with which you created the dataframe. See also a different example using Categorical of this at the original PR that introduced this: #32831 (comment)

cc @jbrockmendel

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsCopy / view semanticsIndexingRelated to indexing on series/frames, not to indexes themselvesRelated to indexing on series/frames, not to indexes themselvesNeeds TestsUnit test(s) needed to prevent regressionsUnit test(s) needed to prevent regressionsRegressionFunctionality that used to work in a prior pandas versionFunctionality that used to work in a prior pandas version

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.