Description
Describe the workflow you want to enable
We would like to add the possibility to plot sampling uncertainty on precision-recall and ROC curves.
Describe your proposed solution
We (@mbaak, @RUrlus, @ilanfri and I) published a paper in AISTAT 2023 called Pointwise sampling uncertainties on the Precision-Recall curve, where we compared multiple methods to compute and plot them.
We found out that a great way to compute them is to use profile likelihoods based on Wilks’ theorem.
It consists of the following steps:
- Get the curve
- Get the confusion matrix of each point of the curve
- For each observed point of the curve, estimate a surrounding 6 (i.e. more than the desired number) sigmas uncertainty grid rectangle (based on first-order approximation of the covariance matrix, with the bivariate normal distribution assumption)
- For each of these hypothesis point in the grid, compute the test static with the observed point, called the profile log likelihood ratio (using the fact that the confusion matrix follows a multinomial distribution).
- Plot the 3 sigmas contour (i.e. isoline) for the observed points (using Wilks’ theorem stating that the profile log likelihood ratio is described asymptotically by a chi2 distribution)
We have a minimal pure Python implementation:
https://github.com/RUrlus/ModelMetricUncertaintyResearch/blob/sklearn_pull_request/notebooks/pr_ellipse_validation/demo_ROC_PR_curves_sklearn_pull_request.ipynb
And a C++ implementation: the paper is supported by our package ModelMetricUncertainty which has a C++ core with, optional, OpenMP support and Pybind11 bindings. Note that this package contains much more functionality than the above notebook. The core is binding agnostic allowing a switch to Cython if needed. Upside is that it is much faster (multiple orders) than the above Python implementation at the cost of complexity.
The pure Python implementation would look like this:
I’m also suggesting other visual improvements:
- Add x and y axis limit: [0, 1], in sklearn axes currently start at ~-0.1
- Modify the plotting frame: either remove the top and right lines to see the curve better when values are close to 1, or plot the frame with a dotted line
- Fix aspect ratio to squared, since the two axes are the same scale.
With those it can look like this:
Remark: I set the contour color to lightblue, let me know if it is fine.
We need to align on the API integration. I suggest adding some parameters in PrecisionRecallDisplay
and in RocCurveDisplay
called:
uncertainty=True
to enabel plot uncertainty band (orplot_uncertainty_style=
?)uncertainty_n_std=3
to decide how +/- standard deviation the band should beuncertainty_n_bins=100
to decide how fine-grained the band should be (see remark about running time)
Describe alternatives you've considered, if relevant
Other ways to compute uncertainties are evaluated in our paper.
We have noticed that there is open pull request on related topic: #21211
That is great, however cross-validation covers different sources of uncertainties, and has some limitations (a bias is introduced by overlapping training folds, introducing a correlation in the trained models. In addition, this uncertainty depends on the size of a fold, and is likely larger than on the test set, see ref.)
Additional context
Running time discussion
Here is an analysis of the running time of this pure Python method:
The execution time depends on the number of points (i.e. thresholds) plotte and on uncertainty_n_bins
.
With a surrounding grid of uncertainty_n_bins=100
per point it is fast enough and fine enough.
There is barely any noticeable visual difference between 50 and 100 (or more) points (at least in this example), see curves.
For let’s say for a 100k set, it is too slow for ROC, because there is much more thresholds, but this is going to be fixed soon here #24668 . But anyway, in this case, the uncertainties are really small, so plotting them doesn’t really make.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status