Description
I haven't found a previous issue for this despite a bunch of searching, apologies if I overlooked something.
Yes, tuples are hashable and can therefore be used for indexing in principle, but not least with using df.loc[(lvl1, lvl2), col]
for MultiIndexing, it becomes a big complexity burden to allow tuples as labels.
There are also a couple of places where some inputs take either labels or arrays (e.g. DataFrame.set_index
), and it's a hassle to get all the corner cases right, because one needs to check if a tuple is a valid key (both for the frame and the MI.names), and interpret it as a list-like otherwise (or whatever the precedence might be within the method).
Tuples are also notoriously hard to get into an array, because numpy will often change the semantics if it's not is_scalar
.
>>> pd.MultiIndex.from_arrays([[1, 2, 3], [1, (1, 2), 1]])
ValueError: setting an array element with a sequence.
Furthermore, there's lots of bugs hiding because many of these corner cases are not tested:
>>> df = pd.DataFrame({'a': [1, 2, 3], 'b': [(1, 1), (1, 2), (2, 2)], 'c': [10, 20, 30]})
>>> df
a b c
0 1 (1, 1) 10
1 2 (1, 2) 20
2 3 (2, 2) 30
>>> df2 = df.set_index(['a', 'b'])
>>> df2
c
a b
1 (1, 1) 10
2 (1, 2) 20
3 (2, 2) 30
>>>
>>> # should ALL be 10
>>> df2.iloc[0, 0]
10
>>> df2.loc[(1, (1, 1)), 'c']
Series([], Name: c, dtype: int64)
>>> df2.loc[1].c.loc[(1, 1)]
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
>>> df2.loc[1].loc[(1, 1), 'c']
KeyError: "None of [Int64Index([1, 1], dtype='int64', name='b')] are in the [index]"
>>> df2.loc[1].at[(1, 1)]
ValueError: At based indexing on an non-integer index can only have non-integer indexers
>>> df2.swaplevel().loc[((1, 1), 1)]
TypeError: cannot do label indexing on <class 'pandas.core.indexes.base.Index'> with these indexers [1] of <class 'int'>
>>> df2.swaplevel().loc[(1, 1)].loc[1, 'c'] # Eureka! One that works!
10
I'd bet that there are 100s of similar cases that are hidden because the vast majority of tests does not check tuples as index entries.
@WillAyd just commented on an issue:
Using a tuple as a label is generally not supported, but if you want to take a look PRs are always welcome
Why allow it at all in this case? Why not just deprecate tuples in Index
/ column names / MultiIndex.names
?