Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

groupby().first() docs should explain distinction between nth and first #27578

Copy link
Copy link
Open
@kyleabeauchamp

Description

@kyleabeauchamp
Issue body actions

Problem description

The existing doc for groupby().first() (https://pandas-docs.github.io/pandas-docs-travis/reference/api/pandas.core.groupby.GroupBy.first.html?highlight=first#pandas.core.groupby.GroupBy.first) does not describe the behavior with respect to missing data. In particular, it does not mention the fact that the behavior is broadcasting columnwise.

The docs read: "Compute first of group values...Computed first of values within each group." I think the correct description is "For each column, compute the first non-null entry, possibly aggregating values from across multiple rows." We might also want a simple example to explain the behavior.

Code Sample, a copy-pastable example if possible

import pandas as pd
x = pd.DataFrame(dict(A=[1, 1, 3], B=[None, 5, 6], C=[1, 2, 3]))
print(x.groupby("A", as_index=False).first())
print(x.groupby("A", as_index=False).nth(0))
print(x.groupby("A", as_index=False).head(1))
[...]
   A    B  C
0  1  5.0  1
1  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3
   A    B  C
0  1  NaN  1
2  3  6.0  3

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.