Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DEPR/API: tuples in labels #24688

Copy link
Copy link
Closed
Closed
Copy link
@h-vetinari

Description

@h-vetinari
Issue body actions

I haven't found a previous issue for this despite a bunch of searching, apologies if I overlooked something.

Yes, tuples are hashable and can therefore be used for indexing in principle, but not least with using df.loc[(lvl1, lvl2), col] for MultiIndexing, it becomes a big complexity burden to allow tuples as labels.

There are also a couple of places where some inputs take either labels or arrays (e.g. DataFrame.set_index), and it's a hassle to get all the corner cases right, because one needs to check if a tuple is a valid key (both for the frame and the MI.names), and interpret it as a list-like otherwise (or whatever the precedence might be within the method).

Tuples are also notoriously hard to get into an array, because numpy will often change the semantics if it's not is_scalar.

>>> pd.MultiIndex.from_arrays([[1, 2, 3], [1, (1, 2), 1]])
ValueError: setting an array element with a sequence.

Furthermore, there's lots of bugs hiding because many of these corner cases are not tested:

>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 1), (1, 2), (2, 2)], 'c': [10, 20, 30]})
>>> df
   a       b   c
0  1  (1, 1)  10
1  2  (1, 2)  20
2  3  (2, 2)  30
>>> df2 = df.set_index(['a', 'b'])
>>> df2
           c
a b
1 (1, 1)  10
2 (1, 2)  20
3 (2, 2)  30
>>>
>>> # should ALL be 10
>>> df2.iloc[0, 0]
10
>>> df2.loc[(1, (1, 1)), 'c']
Series([], Name: c, dtype: int64)
>>> df2.loc[1].c.loc[(1, 1)]
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
>>> df2.loc[1].loc[(1, 1), 'c']
KeyError: "None of [Int64Index([1, 1], dtype='int64', name='b')] are in the [index]"
>>> df2.loc[1].at[(1, 1)]
ValueError: At based indexing on an non-integer index can only have non-integer indexers
>>> df2.swaplevel().loc[((1, 1), 1)]
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
>>> df2.swaplevel().loc[(1, 1)].loc[1, 'c']  # Eureka! One that works!
10

I'd bet that there are 100s of similar cases that are hidden because the vast majority of tests does not check tuples as index entries.

@WillAyd just commented on an issue:

Using a tuple as a label is generally not supported, but if you want to take a look PRs are always welcome

Why allow it at all in this case? Why not just deprecate tuples in Index / column names / MultiIndex.names?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Closing CandidateMay be closeable, needs more eyeballsMay be closeable, needs more eyeballsDeprecateFunctionality to remove in pandasFunctionality to remove in pandasNeeds DiscussionRequires discussion from core team before further actionRequires discussion from core team before further actionNested DataData where the values are collections (lists, sets, dicts, objects, etc.).Data where the values are collections (lists, sets, dicts, objects, etc.).

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.