Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit b519aa7

Browse filesBrowse files
authored
BUG: groupby.groups with NA categories fails (#61364)
* BUG: groupby.groups with NA categories fails * cleanup * whatsnew
1 parent 2121180 commit b519aa7
Copy full SHA for b519aa7

File tree

Expand file treeCollapse file tree

3 files changed

+35
-3
lines changed
Filter options
Expand file treeCollapse file tree

3 files changed

+35
-3
lines changed

‎doc/source/whatsnew/v3.0.0.rst

Copy file name to clipboardExpand all lines: doc/source/whatsnew/v3.0.0.rst
+1Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -806,6 +806,7 @@ Groupby/resample/rolling
806806
^^^^^^^^^^^^^^^^^^^^^^^^
807807
- Bug in :meth:`.DataFrameGroupBy.__len__` and :meth:`.SeriesGroupBy.__len__` would raise when the grouping contained NA values and ``dropna=False`` (:issue:`58644`)
808808
- Bug in :meth:`.DataFrameGroupBy.any` that returned True for groups where all Timedelta values are NaT. (:issue:`59712`)
809+
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupBy.groups` would fail when the groups were :class:`Categorical` with an NA value (:issue:`61356`)
809810
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument ``dropna`` (:issue:`55919`)
810811
- Bug in :meth:`.DataFrameGroupBy.median` where nat values gave an incorrect result. (:issue:`57926`)
811812
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)

‎pandas/core/groupby/grouper.py

Copy file name to clipboardExpand all lines: pandas/core/groupby/grouper.py
+17-3Lines changed: 17 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,16 @@
1212

1313
import numpy as np
1414

15+
from pandas._libs import (
16+
algos as libalgos,
17+
)
1518
from pandas._libs.tslibs import OutOfBoundsDatetime
1619
from pandas.errors import InvalidIndexError
1720
from pandas.util._decorators import cache_readonly
1821

1922
from pandas.core.dtypes.common import (
23+
ensure_int64,
24+
ensure_platform_int,
2025
is_list_like,
2126
is_scalar,
2227
)
@@ -38,7 +43,10 @@
3843
)
3944
from pandas.core.series import Series
4045

41-
from pandas.io.formats.printing import pprint_thing
46+
from pandas.io.formats.printing import (
47+
PrettyDict,
48+
pprint_thing,
49+
)
4250

4351
if TYPE_CHECKING:
4452
from collections.abc import (
@@ -668,8 +676,14 @@ def _codes_and_uniques(self) -> tuple[npt.NDArray[np.signedinteger], ArrayLike]:
668676
def groups(self) -> dict[Hashable, Index]:
669677
codes, uniques = self._codes_and_uniques
670678
uniques = Index._with_infer(uniques, name=self.name)
671-
cats = Categorical.from_codes(codes, uniques, validate=False)
672-
return self._index.groupby(cats)
679+
680+
r, counts = libalgos.groupsort_indexer(ensure_platform_int(codes), len(uniques))
681+
counts = ensure_int64(counts).cumsum()
682+
_result = (r[start:end] for start, end in zip(counts, counts[1:]))
683+
# map to the label
684+
result = {k: self._index.take(v) for k, v in zip(uniques, _result)}
685+
686+
return PrettyDict(result)
673687

674688
@property
675689
def observed_grouping(self) -> Grouping:

‎pandas/tests/groupby/test_categorical.py

Copy file name to clipboardExpand all lines: pandas/tests/groupby/test_categorical.py
+17Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -506,6 +506,23 @@ def test_observed_groups(observed):
506506
tm.assert_dict_equal(result, expected)
507507

508508

509+
def test_groups_na_category(dropna, observed):
510+
# https://github.com/pandas-dev/pandas/issues/61356
511+
df = DataFrame(
512+
{"cat": Categorical(["a", np.nan, "a"], categories=list("adb"))},
513+
index=list("xyz"),
514+
)
515+
g = df.groupby("cat", observed=observed, dropna=dropna)
516+
517+
result = g.groups
518+
expected = {"a": Index(["x", "z"])}
519+
if not dropna:
520+
expected |= {np.nan: Index(["y"])}
521+
if not observed:
522+
expected |= {"b": Index([]), "d": Index([])}
523+
tm.assert_dict_equal(result, expected)
524+
525+
509526
@pytest.mark.parametrize(
510527
"keys, expected_values, expected_index_levels",
511528
[

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.