Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit ef61b25

Browse filesBrowse files
GH-77265: Document NaN handling in statistics functions that sort or count (#94676)
* Document NaN handling in functions that sort or count * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Update Doc/library/statistics.rst Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com> * Fix trailing whitespace and rewrap text Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
1 parent 264b3dd commit ef61b25
Copy full SHA for ef61b25

File tree

Expand file treeCollapse file tree

1 file changed

+29
-0
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+29
-0
lines changed

‎Doc/library/statistics.rst

Copy file name to clipboardExpand all lines: Doc/library/statistics.rst
+29Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,35 @@ and implementation-dependent. If your input data consists of mixed types,
3535
you may be able to use :func:`map` to ensure a consistent result, for
3636
example: ``map(float, input_data)``.
3737

38+
Some datasets use ``NaN`` (not a number) values to represent missing data.
39+
Since NaNs have unusual comparison semantics, they cause surprising or
40+
undefined behaviors in the statistics functions that sort data or that count
41+
occurrences. The functions affected are ``median()``, ``median_low()``,
42+
``median_high()``, ``median_grouped()``, ``mode()``, ``multimode()``, and
43+
``quantiles()``. The ``NaN`` values should be stripped before calling these
44+
functions::
45+
46+
>>> from statistics import median
47+
>>> from math import isnan
48+
>>> from itertools import filterfalse
49+
50+
>>> data = [20.7, float('NaN'),19.2, 18.3, float('NaN'), 14.4]
51+
>>> sorted(data) # This has surprising behavior
52+
[20.7, nan, 14.4, 18.3, 19.2, nan]
53+
>>> median(data) # This result is unexpected
54+
16.35
55+
56+
>>> sum(map(isnan, data)) # Number of missing values
57+
2
58+
>>> clean = list(filterfalse(isnan, data)) # Strip NaN values
59+
>>> clean
60+
[20.7, 19.2, 18.3, 14.4]
61+
>>> sorted(clean) # Sorting now works as expected
62+
[14.4, 18.3, 19.2, 20.7]
63+
>>> median(clean) # This result is now well defined
64+
18.75
65+
66+
3867
Averages and measures of central location
3968
-----------------------------------------
4069

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.