Open
Description
There isn't necessarily anything to fix here, but I thought it would be useful to open this for documentation, at least.
_weighted_percentile
added support for NaN in #29034 and support for array APIs in #29431.
Our implementation relys on sort
putting NaN values at the end:
scikit-learn/sklearn/utils/stats.py
Lines 70 to 74 in 8cfc72b
AFAICT (confirmed by @ev-br) array API specs do not specify how sort
should handle NaN, which means it is left to individual packages to determine.
- torch seems to follow numpy and sort NaN to the end (tested manually with
float('nan')
andtorch.nan
) but this is not mentioned in the docs. There is some discussion of ordering NaN as the largest value here: PyTorch NaN behavior and API design pytorch/pytorch#46544 (comment) and a related issue about negative NaN here: [MPS]sort
incorrectly handles 'negative' NaNs pytorch/pytorch#116567 - CuPy seems to follow numpy behaviour as well (relevant issues: different result between cupy.sort and numpy.sort with NaN cupy/cupy#3324, and they seem to have tests to check that their results are the same as numpy with nan sorting )
As everything works, I don't think we need to do anything here (especially as we ultimately want to drop maintaining our own quantile function), but just thought it would be useful to document.
Metadata
Metadata
Assignees
Labels
No labels