Sampling uncertainty on precision-recall and ROC curves

Describe the workflow you want to enable

We would like to add the possibility to plot sampling uncertainty on precision-recall and ROC curves.

Describe your proposed solution

We (@mbaak, @RUrlus, @ilanfri and I) published a paper in AISTAT 2023 called Pointwise sampling uncertainties on the Precision-Recall curve, where we compared multiple methods to compute and plot them.

We found out that a great way to compute them is to use profile likelihoods based on Wilks’ theorem.
It consists of the following steps:

Get the curve
Get the confusion matrix of each point of the curve
For each observed point of the curve, estimate a surrounding 6 (i.e. more than the desired number) sigmas uncertainty grid rectangle (based on first-order approximation of the covariance matrix, with the bivariate normal distribution assumption)
For each of these hypothesis point in the grid, compute the test static with the observed point, called the profile log likelihood ratio (using the fact that the confusion matrix follows a multinomial distribution).
Plot the 3 sigmas contour (i.e. isoline) for the observed points (using Wilks’ theorem stating that the profile log likelihood ratio is described asymptotically by a chi2 distribution)

We have a minimal pure Python implementation:
https://github.com/RUrlus/ModelMetricUncertaintyResearch/blob/sklearn_pull_request/notebooks/pr_ellipse_validation/demo_ROC_PR_curves_sklearn_pull_request.ipynb

And a C++ implementation: the paper is supported by our package ModelMetricUncertainty which has a C++ core with, optional, OpenMP support and Pybind11 bindings. Note that this package contains much more functionality than the above notebook. The core is binding agnostic allowing a switch to Cython if needed. Upside is that it is much faster (multiple orders) than the above Python implementation at the cost of complexity.

The pure Python implementation would look like this:

I’m also suggesting other visual improvements:

Add x and y axis limit: [0, 1], in sklearn axes currently start at ~-0.1
Modify the plotting frame: either remove the top and right lines to see the curve better when values are close to 1, or plot the frame with a dotted line
Fix aspect ratio to squared, since the two axes are the same scale.

With those it can look like this:

Remark: I set the contour color to lightblue, let me know if it is fine.

We need to align on the API integration. I suggest adding some parameters in PrecisionRecallDisplay and in RocCurveDisplay called:

uncertainty=True to enabel plot uncertainty band (or plot_uncertainty_style= ?)
uncertainty_n_std=3 to decide how +/- standard deviation the band should be
uncertainty_n_bins=100 to decide how fine-grained the band should be (see remark about running time)

Describe alternatives you've considered, if relevant

Other ways to compute uncertainties are evaluated in our paper.

We have noticed that there is open pull request on related topic: #21211
That is great, however cross-validation covers different sources of uncertainties, and has some limitations (a bias is introduced by overlapping training folds, introducing a correlation in the trained models. In addition, this uncertainty depends on the size of a fold, and is likely larger than on the test set, see ref.)

Additional context

Running time discussion

Here is an analysis of the running time of this pure Python method:

The execution time depends on the number of points (i.e. thresholds) plotte and on uncertainty_n_bins.
With a surrounding grid of uncertainty_n_bins=100 per point it is fast enough and fine enough.
There is barely any noticeable visual difference between 50 and 100 (or more) points (at least in this example), see curves.
For let’s say for a 100k set, it is too slow for ROC, because there is much more thresholds, but this is going to be fixed soon here #24668 . But anyway, in this case, the uncertainties are really small, so plotting them doesn’t really make.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Sampling uncertainty on precision-recall and ROC curves #25856

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Running time discussion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

Sampling uncertainty on precision-recall and ROC curves #25856

Description

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Running time discussion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions