Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
Currently Series.nunique
has a default parameter dropna=True
.
However Series.unique
does not accept the dropna
the parameter.
This can cause the unexpected behaviour when: s.nunique()
is not nessesarly equal to len(s.unique())
.
See example below:
>>> import pandas as pd
>>> s = pd.Series([pd.NA, 1, pd.NA])
>>> s.unique()
array([<NA>, 1], dtype=object)
>>> len(s.unique())
2
>>> s.nunique()
1
I believe it should be addressed to avoid implicit behaviour.
Feature Description
Simplest way to addess it would be to change the default parameter of Series.nunique
to dropna=False
.
Analogously the same default parameter for DataFrame.nunique
.
This would be consistent with current summary of the method:
Count number of distinct elements in specified axis.
Return Series with number of distinct elements. Can ignore NaN values.
"Can ignore NaN values.", hints that should be optional parameter not enabled by default.
Alternative Solutions
Another approach to force consistent NaN handling by default would be to addapt Series.unique
to accept dropna
and set it to True
by default.
Although possible, this is more laborious and more impactful change on Pandas API.
Additional Context
No response
EDIT: Typos