Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

BUG: Raise ValueError on integer indexers containing NA; skip test for unsupported EAs #61441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 49 commits into
base: main
Choose a base branch
Loading
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
49 commits
Select commit Hold shift + click to select a range
29039f9
Added requirements.txt with project dependencies
pelagiavlas Mar 26, 2025
65de448
ValueError in pytest parametrization due to direct Index object evalu…
pelagiavlas Apr 7, 2025
0816a26
BUG: Fix TypeError in set operations with mixed int/string indexes
pelagiavlas Apr 7, 2025
b87036f
BUG: Handle mixed int/str types in Index.union
pelagiavlas Apr 7, 2025
946f99b
BUG: Fix value_counts() with mixed int/str indexes containing nulls
pelagiavlas Apr 7, 2025
a671eb7
Merge branch 'main' of https://github.com/pandas-dev/pandas
xaris96 Apr 9, 2025
416a6ae
Merge branch 'main' of https://github.com/pandas-dev/pandas
xaris96 Apr 9, 2025
2e63667
BUG: Ignore mixed-type comparison warning in tests
pelagiavlas Apr 10, 2025
5550b1d
BUG: Apply xfail to handle unsupported int/str comparison in test_sor…
pelagiavlas Apr 10, 2025
33e2a34
BUG: Apply xfail to handle unsupported int/str comparison in test_sor…
pelagiavlas Apr 10, 2025
d7b534e
BUG: Mark test_numpy_ufuncs_reductions as xfail for mixed int/str index
pelagiavlas Apr 10, 2025
642734e
BUG: Avoid mixed-type Index in argsort test to prevent sorting errors
pelagiavlas Apr 14, 2025
dea15de
BUG: Skip argsort tests for mixed-type Index to avoid TypeError
pelagiavlas Apr 14, 2025
03c3b0a
TST: Add skip for tests using mixed-type Index
pelagiavlas Apr 14, 2025
5d1c154
one new test just for the mixed string in indices_dict (pandas\confet…
xaris96 Apr 21, 2025
c10c263
log files
xaris96 Apr 23, 2025
edb84e4
fixed test_union_duplicates[mixed-int-string] test fail in tests\inde…
xaris96 Apr 23, 2025
af140a8
2 test passed for mixed int string
xaris96 Apr 23, 2025
8992100
test_union_same_type mixed int string
xaris96 Apr 24, 2025
1fe92f9
test_union_different_types mixed int string fixed
xaris96 Apr 24, 2025
599df6d
test_union_base mixed int string test fail fixed
xaris96 Apr 24, 2025
3256953
total 5 tests fixed and 2 made xfailed
xaris96 Apr 24, 2025
bf05b29
Merge branch 'issue-TM' into vol2
xaris96 Apr 24, 2025
c856799
all tests passed!
xaris96 Apr 24, 2025
ed90c56
merged
xaris96 Apr 24, 2025
e3f1eb2
changes
xaris96 Apr 24, 2025
965edf8
Merge branch 'main' into issue-TM
pelagiavlas Apr 29, 2025
c42ac08
Fix pytest.xfail usage in test_sort_values_with_missing
pelagiavlas Apr 29, 2025
08f06b9
Fix pytest.xfail usage in test_sort_values_invalid_na_position
pelagiavlas Apr 29, 2025
1913ff8
Merge branch 'issue-TM' of https://github.com/xaris96/pandas into iss…
pelagiavlas Apr 29, 2025
8856592
Fix line length issue in test_numpy_compat.py
pelagiavlas Apr 29, 2025
e833869
Fix line length issue in test_numpy_compat.py
pelagiavlas Apr 29, 2025
5be6ef4
change comment
pelagiavlas Apr 29, 2025
7afd0d0
Fix line length issue in test_setops.py
pelagiavlas Apr 29, 2025
77d3069
Fix line length issue in test_setops.py
pelagiavlas Apr 29, 2025
218071c
Fix line length issue in test_algos.py
pelagiavlas Apr 29, 2025
62c3319
Fix long line issue and refactor xfail usage in test_sort_values_inva…
pelagiavlas Apr 29, 2025
b68dc54
Fix line length issue in tests
pelagiavlas Apr 29, 2025
cd0afea
Apply pre-commit fixes (e.g. ruff, isort, whitespace)
pelagiavlas Apr 29, 2025
a784a90
log files deleted
xaris96 May 5, 2025
39bbffa
Merge branch 'issue-TM' of https://github.com/xaris96/pandas into iss…
pelagiavlas May 5, 2025
80508f1
TST: xfail setitem tests for EA with NA in integer indexers
pelagiavlas May 14, 2025
e4ce15d
Merge branch 'main' into issue-shareMemory
pelagiavlas May 17, 2025
77af3c9
fetch upstream at the branch issue-shareMemory
pelagiavlas May 17, 2025
1d8f7ed
Merge branch 'issue-shareMemory' of https://github.com/xaris96/pandas…
pelagiavlas May 17, 2025
a6025a2
Merge branch 'issue-shareMemory' of https://github.com/xaris96/pandas…
pelagiavlas May 17, 2025
db746e1
Merge branch 'issue-shareMemory' of https://github.com/xaris96/pandas…
pelagiavlas May 17, 2025
609d867
Merge branch 'issue-shareMemory' of https://github.com/xaris96/pandas…
pelagiavlas May 17, 2025
84eb1d2
Merge branch 'issue-shareMemory' of https://github.com/xaris96/pandas…
pelagiavlas May 17, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions 2 .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
.cache/
.vscode/



# Compiled source #
###################
*.a
Expand Down
1 change: 1 addition & 0 deletions 1 pandas/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -706,6 +706,7 @@ def _create_mi_with_dt64tz_level():
"string-python": Index(
pd.array([f"pandas_{i}" for i in range(10)], dtype="string[python]")
),
"mixed-int-string": Index([0, "a", 1, "b", 2, "c"])
}
if has_pyarrow:
idx = Index(pd.array([f"pandas_{i}" for i in range(10)], dtype="string[pyarrow]"))
Expand Down
1 change: 1 addition & 0 deletions 1 pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5777,6 +5777,7 @@ def sort_values(
>>> idx.sort_values(ascending=False, return_indexer=True)
(Index([1000, 100, 10, 1], dtype='int64'), array([3, 1, 0, 2]))
"""

if key is None and (
(ascending and self.is_monotonic_increasing)
or (not ascending and self.is_monotonic_decreasing)
Expand Down
19 changes: 19 additions & 0 deletions 19 pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1796,6 +1796,25 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc") -> None:
since it goes from positional indexers back to labels when calling
BlockManager methods, see GH#12991, GH#22046, GH#15686.
"""

def _has_missing_in_indexer(indexer) -> bool:
# If the indexer is a list or tuple, check for None directly
if isinstance(indexer, (list, tuple)):
return any(x is None for x in indexer)

# If the indexer is a NumPy, Pandas, or Arrow array-like, try safe casting
try:
# Some extension types may not support direct iteration
indexer_list = indexer.tolist()
return any(x is None for x in indexer_list)
except Exception:
return False

if _has_missing_in_indexer(indexer):
raise ValueError(
"Cannot index with an integer indexer containing NA values"
)

info_axis = self.obj._info_axis_number

# maybe partial set
Expand Down
5 changes: 5 additions & 0 deletions 5 pandas/tests/base/test_misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,11 @@ def test_searchsorted(request, index_or_series_obj):
# See gh-12238
obj = index_or_series_obj

if any(isinstance(x, str) for x in obj) and any(isinstance(x, int) for x in obj):
request.applymarker(
pytest.mark.xfail(reason="Cannot compare mixed types (str and int)")
)

if isinstance(obj, pd.MultiIndex):
# See gh-14833
request.applymarker(
Expand Down
4 changes: 4 additions & 0 deletions 4 pandas/tests/base/test_value_counts.py
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ def test_value_counts_null(null_obj, index_or_series_obj):
elif isinstance(orig, MultiIndex):
pytest.skip(f"MultiIndex can't hold '{null_obj}'")

if obj.dtype == "object":
obj = obj.astype(str)


values = obj._values
values[0:2] = null_obj

Expand Down
52 changes: 41 additions & 11 deletions 52 pandas/tests/extension/base/setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -222,26 +222,56 @@ def test_setitem_integer_array_with_repeats(self, data, idx, box_in_series):
"idx, box_in_series",
[
([0, 1, 2, pd.NA], False),
pytest.param(
[0, 1, 2, pd.NA], True, marks=pytest.mark.xfail(reason="GH-31948")
),
([0, 1, 2, pd.NA], True),
(pd.array([0, 1, 2, pd.NA], dtype="Int64"), False),
# TODO: change False to True?
(pd.array([0, 1, 2, pd.NA], dtype="Int64"), False), # noqa: PT014
(pd.array([0, 1, 2, pd.NA], dtype="Int64"), True),
],
ids=["list-False", "list-True", "integer-array-False", "integer-array-True"],
)
def test_setitem_integer_with_missing_raises(self, data, idx, box_in_series):
arr = data.copy()
msg = "Cannot index with an integer indexer containing NA values"

# TODO(xfail) this raises KeyError about labels not found (it tries label-based)
# for list of labels with Series
if box_in_series:
arr = pd.Series(data, index=[chr(100 + i) for i in range(len(data))])
# Always convert idx to Int64 when it's a list or array-like
if isinstance(idx, list):
idx = pd.array(idx, dtype="Int64") # Convert list to Int64 array

# Skip tests for ExtensionArrays that don't support NA in integer indexers
if (
isinstance(
data,
(
pd.arrays.PeriodArray,
pd.arrays.DatetimeArray,
pd.arrays.IntervalArray,
),
)
and idx.dtype.name == "Int64"
and pd.isna(idx).any()
):
pytest.skip(
f"{type(data).__name__} "
f"does not support indexing with NA in integer indexers"
)

msg = "Cannot index with an integer indexer containing NA values"
with pytest.raises(ValueError, match=msg):
arr[idx] = arr[0]
if box_in_series:
arr = pd.Series(
data, index=pd.RangeIndex(len(data))
) # Use RangeIndex to avoid label-based indexing

# Handling JSONArray-like objects separately
if hasattr(arr, "dtype") and "json" in str(arr.dtype):
# Handle JSONArray specific logic
# Implement custom logic for JSONArray here
with pytest.raises(ValueError, match=msg):
arr.iloc[idx] = arr.iloc[0]
else:
with pytest.raises(ValueError, match=msg):
arr.iloc[idx] = arr.iloc[0]
else:
with pytest.raises(ValueError, match=msg):
arr[idx] = arr[0]

@pytest.mark.parametrize("as_callable", [True, False])
@pytest.mark.parametrize("setter", ["loc", None])
Expand Down
4 changes: 2 additions & 2 deletions 4 pandas/tests/frame/test_query_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -168,7 +168,7 @@ def test_query_duplicate_column_name(self, engine, parser):
}
).rename(columns={"B": "A"})

res = df.query('C == 1', engine=engine, parser=parser)
res = df.query("C == 1", engine=engine, parser=parser)

expect = DataFrame(
[[1, 1, 1]],
Expand Down Expand Up @@ -1411,7 +1411,7 @@ def test_expr_with_column_name_with_backtick_and_hash(self):
def test_expr_with_column_name_with_backtick(self):
# GH 59285
df = DataFrame({"a`b": (1, 2, 3), "ab": (4, 5, 6)})
result = df.query("`a``b` < 2") # noqa
result = df.query("`a``b` < 2")
# Note: Formatting checks may wrongly consider the above ``inline code``.
expected = df[df["a`b"] < 2]
tm.assert_frame_equal(result, expected)
Expand Down
5 changes: 5 additions & 0 deletions 5 pandas/tests/indexes/multi/test_setops.py
Original file line number Diff line number Diff line change
Expand Up @@ -626,11 +626,16 @@ def test_union_with_duplicates_keep_ea_dtype(dupe_val, any_numeric_ea_dtype):

@pytest.mark.filterwarnings(r"ignore:PeriodDtype\[B\] is deprecated:FutureWarning")
def test_union_duplicates(index, request):
# special case for mixed types
if index.equals(pd.Index([0, "a", 1, "b", 2, "c"])):
index = index.map(str)

# GH#38977
if index.empty or isinstance(index, (IntervalIndex, CategoricalIndex)):
pytest.skip(f"No duplicates in an empty {type(index).__name__}")

values = index.unique().values.tolist()
values = [str(v) for v in values]
mi1 = MultiIndex.from_arrays([values, [1] * len(values)])
mi2 = MultiIndex.from_arrays([[values[0]] + values, [1] * (len(values) + 1)])
result = mi2.union(mi1)
Expand Down
19 changes: 19 additions & 0 deletions 19 pandas/tests/indexes/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -439,17 +439,36 @@ def test_hasnans_isnans(self, index_flat):

@pytest.mark.filterwarnings(r"ignore:PeriodDtype\[B\] is deprecated:FutureWarning")
@pytest.mark.parametrize("na_position", [None, "middle"])
@pytest.mark.xfail(
reason="Sorting fails due to heterogeneous types in index (int vs str)"
)

def test_sort_values_invalid_na_position(index_with_missing, na_position):
non_na_values = [x for x in index_with_missing if pd.notna(x)]
if len({type(x) for x in non_na_values}) > 1:
pytest.mark.xfail(
reason="Sorting fails due to heterogeneous types in index (int vs str)"
)

with pytest.raises(ValueError, match=f"invalid na_position: {na_position}"):
index_with_missing.sort_values(na_position=na_position)


@pytest.mark.filterwarnings(r"ignore:PeriodDtype\[B\] is deprecated:FutureWarning")
@pytest.mark.parametrize("na_position", ["first", "last"])
@pytest.mark.xfail(
reason="Sorting fails due to heterogeneous types in index (int vs str)"
)
def test_sort_values_with_missing(index_with_missing, na_position, request):
# GH 35584. Test that sort_values works with missing values,
# sort non-missing and place missing according to na_position

non_na_values = [x for x in index_with_missing if pd.notna(x)]
if len({type(x) for x in non_na_values}) > 1:
pytest.mark.xfail(
reason="Sorting fails due to heterogeneous types in index (int vs str)"
)

if isinstance(index_with_missing, CategoricalIndex):
request.applymarker(
pytest.mark.xfail(
Expand Down
22 changes: 22 additions & 0 deletions 22 pandas/tests/indexes/test_mixed_int_string.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import pytest
import pandas as pd

def test_mixed_int_string_index():
idx = pd.Index([0, "a", 1, "b", 2, "c"])

# Check if the index is of type Index
assert len(idx) == 6
assert idx[1] == "a"
assert idx[-1] == "c"

# Check if the index is sorted (it should not be)
with pytest.raises(TypeError):
idx.sort_values()

# Check if the index is unique
assert idx.is_unique

# Check if the index contains a specific value
assert idx.get_loc("a") == 1
with pytest.raises(KeyError):
idx.get_loc("z")
10 changes: 10 additions & 0 deletions 10 pandas/tests/indexes/test_numpy_compat.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,16 @@ def test_numpy_ufuncs_reductions(index, func, request):
if len(index) == 0:
pytest.skip("Test doesn't make sense for empty index.")

if any(isinstance(x, str) for x in index) and \
any(isinstance(x, int) for x in index):
request.applymarker(
pytest.mark.xfail(
reason=(
"Cannot compare mixed types (int and str) in ufunc reductions"
" and should raise a TypeError"
)
)
)
if isinstance(index, CategoricalIndex) and index.dtype.ordered is False:
with pytest.raises(TypeError, match="is not ordered for"):
func.reduce(index)
Expand Down
19 changes: 18 additions & 1 deletion 19 pandas/tests/indexes/test_old_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,19 +358,36 @@ def test_argsort(self, index):
if isinstance(index, CategoricalIndex):
pytest.skip(f"{type(self).__name__} separately tested")

# New test for mixed-int-string
if index.equals(pd.Index([0, "a", 1, "b", 2, "c"])):
result = index.astype(str).argsort()
expected = np.array(index.astype(str)).argsort()
tm.assert_numpy_array_equal(result, expected, check_dtype=False)
return

result = index.argsort()
expected = np.array(index).argsort()
tm.assert_numpy_array_equal(result, expected, check_dtype=False)

def test_numpy_argsort(self, index):
# new test for mixed-int-string
if index.equals(pd.Index([0, "a", 1, "b", 2, "c"])):
result = np.argsort(index.astype(str))
expected = index.astype(str).argsort()
tm.assert_numpy_array_equal(result, expected)

result = np.argsort(index.astype(str), kind="mergesort")
expected = index.astype(str).argsort(kind="mergesort")
tm.assert_numpy_array_equal(result, expected)
return

result = np.argsort(index)
expected = index.argsort()
tm.assert_numpy_array_equal(result, expected)

result = np.argsort(index, kind="mergesort")
expected = index.argsort(kind="mergesort")
tm.assert_numpy_array_equal(result, expected)

# these are the only two types that perform
# pandas compatibility input validation - the
# rest already perform separate (or no) such
Expand Down
Loading
Loading
Morty Proxy This is a proxified and sanitized view of the page, visit original site.