Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
## example A
import pandas as pd # 2.2.3
df = pd.DataFrame([[1, 2, 3]], columns=['a', ['b', 'c'], ['b', 'c']])
print(df.columns.drop_duplicates())
# Traceback (most recent call last):
# File "/home/cameron/.vim-excerpt", line 5, in <module>
# print(df.columns.drop_duplicates())
# ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
# File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 3117, in drop_duplicates
# if self.is_unique:
# ^^^^^^^^^^^^^^
# File "properties.pyx", line 36, in pandas._libs.properties.CachedProperty.__get__
# File "/home/cameron/repos/opensource/narwhals-dev/.venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 2346, in is_unique
# return self._engine.is_unique
# ^^^^^^^^^^^^^^^^^^^^^^
# File "index.pyx", line 266, in pandas._libs.index.IndexEngine.is_unique.__get__
# File "index.pyx", line 271, in pandas._libs.index.IndexEngine._do_unique_check
# File "index.pyx", line 333, in pandas._libs.index.IndexEngine._ensure_mapping_populated
# File "pandas/_libs/hashtable_class_helper.pxi", line 7115, in pandas._libs.hashtable.PyObjectHashTable.map_locations
# TypeError: unhashable type: 'list'
## --------
## example B
import pandas as pd # 2.2.3
df = pd.DataFrame([[1, 2, 3]], columns=['a', ['b', 'c'], ['b', 'c']])
# hasattr triggers a side effect where the `df.columns.drop_duplicates()` now works.
hasattr(df, 'hello_world')
print(df.columns.drop_duplicates())
# Index(['a', ['b', 'c']], dtype='object')
Issue Description
pandas.Index.drop_duplicates()
inconsistently raises TypeError: unhashable type: 'list'
when its values encompass a list. This error does not seem to prevent the underlying uniqueness computation from happening. In addition to the submitted reproducible example there is a direct causation here in the Index
object:
If we call .drop_duplicates
when the Index contains unhashable types, we observe a TypeError
.
import pandas as pd
idx = pd.Index(['a', ['b', 'c'], ['b', 'c']])
idx.drop_duplicates() # TypeError: unhashable type: 'list'
But for some reason if we simply ignore the error the first time and try .drop_duplicates()
again it works and removes the duplicated entities including the unhashable ones?
import pandas as pd
idx = pd.Index(['a', ['b', 'c'], ['b', 'c']])
try:
idx.drop_duplicates() # TypeError: unhashable type: 'list'
except TypeError:
pass
print(idx.drop_duplicates()) # Index(['a', ['b', 'c']], dtype='object')
Where we can see that the underlying Index implementation populates its hashtable mapping even though the original call to drop_duplicates
fails. We know this population is successful because the second attempt at .drop_duplicates
works.
import pandas as pd
idx = pd.Index(['a', ['b', 'c'], ['b', 'c']])
print(idx._engine.mapping) # None
try:
idx.drop_duplicates() # TypeError: unhashable type: 'list'
except TypeError:
pass
print(idx._engine.mapping) # <pandas._libs.hashtable.PyObjectHashTable>
print(idx.drop_duplicates()) # Index(['a', ['b', 'c']], dtype='object')
Finally, it appears that attribute checking on a pandas.DataFrame
causes the PyObjectHashTable
to be constructed for the column index. This is likely due to the shared code path between __getattr__
and __getitem__
.
import pandas as pd
df = pd.DataFrame([[1, 2, 3]], columns=['a', ['b', 'c'], ['b', 'c']])
print(df.columns._engine.mapping) # None
hasattr(df, 'hello_world')
print(df.columns._engine.mapping) # <pandas._libs.hashtable.PyObjectHashTable>
print(df.columns.drop_duplicates()) # Index(['a', ['b', 'c']], dtype='object')
Expected Behavior
I expect that Index.drop_duplicates()
should work regardless of whether an attribute has been checked or not. The following two snippets should produce equivalent results (whether that is to raise an error or to produce a result):
import pandas as pd # 2.2.3
df = pd.DataFrame([[1, 2, 3]], columns=['a', ['b', 'c'], ['b', 'c']])
print(df.columns.drop_duplicates()) # Currently produces → TypeError
import pandas as pd # 2.2.3
df = pd.DataFrame([[1, 2, 3]], columns=['a', ['b', 'c'], ['b', 'c']])
hasattr(df, 'hello_world')
print(df.columns.drop_duplicates()) # Currently produces → Index(['a', ['b', 'c']], dtype='object')
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.12.7
python-bits : 64
OS : Linux
OS-release : 6.6.52-1-lts
Version : #1 SMP PREEMPT_DYNAMIC Wed, 18 Sep 2024 19:02:04 +0000
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 2.2.2
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2025.2.0
html5lib : None
hypothesis : 6.125.3
gcsfs : None
jinja2 : 3.1.5
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : 19.0.0
pyreadstat : None
pytest : 8.3.4
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None