Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 41fd298

Browse filesBrowse files
authored
DOC note on calibration impact on ranking (#25900)
1 parent 9679dcf commit 41fd298
Copy full SHA for 41fd298

File tree

2 files changed

+57
-38
lines changed
Filter options

2 files changed

+57
-38
lines changed

‎doc/developers/contributing.rst

Copy file name to clipboardExpand all lines: doc/developers/contributing.rst
+1-1Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -714,7 +714,7 @@ Building the documentation requires installing some additional packages:
714714

715715
pip install sphinx sphinx-gallery numpydoc matplotlib Pillow pandas \
716716
scikit-image packaging seaborn sphinx-prompt \
717-
sphinxext-opengraph plotly
717+
sphinxext-opengraph plotly pooch
718718

719719
To build the documentation, you need to be in the ``doc`` folder:
720720

‎doc/modules/calibration.rst

Copy file name to clipboardExpand all lines: doc/modules/calibration.rst
+56-37Lines changed: 56 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,24 @@ prediction.
2020
Well calibrated classifiers are probabilistic classifiers for which the output
2121
of the :term:`predict_proba` method can be directly interpreted as a confidence
2222
level.
23-
For instance, a well calibrated (binary) classifier should classify the samples
24-
such that among the samples to which it gave a :term:`predict_proba` value
25-
close to 0.8,
26-
approximately 80% actually belong to the positive class.
23+
For instance, a well calibrated (binary) classifier should classify the samples such
24+
that among the samples to which it gave a :term:`predict_proba` value close to, say,
25+
0.8, approximately 80% actually belong to the positive class.
26+
27+
Before we show how to re-calibrate a classifier, we first need a way to detect how
28+
good a classifier is calibrated.
29+
30+
.. note::
31+
Strictly proper scoring rules for probabilistic predictions like
32+
:func:`sklearn.metrics.brier_score_loss` and
33+
:func:`sklearn.metrics.log_loss` assess calibration (reliability) and
34+
discriminative power (resolution) of a model, as well as the randomness of the data
35+
(uncertainty) at the same time. This follows from the well-known Brier score
36+
decomposition of Murphy [1]_. As it is not clear which term dominates, the score is
37+
of limited use for assessing calibration alone (unless one computes each term of
38+
the decomposition). A lower Brier loss, for instance, does not necessarily
39+
mean a better calibrated model, it could also mean a worse calibrated model with much
40+
more discriminatory power, e.g. using many more features.
2741

2842
.. _calibration_curve:
2943

@@ -33,7 +47,7 @@ Calibration curves
3347
Calibration curves, also referred to as *reliability diagrams* (Wilks 1995 [2]_),
3448
compare how well the probabilistic predictions of a binary classifier are calibrated.
3549
It plots the frequency of the positive label (to be more precise, an estimation of the
36-
*conditional event probability* :math:`P(Y=1|\text{predict\_proba})`) on the y-axis
50+
*conditional event probability* :math:`P(Y=1|\text{predict_proba})`) on the y-axis
3751
against the predicted probability :term:`predict_proba` of a model on the x-axis.
3852
The tricky part is to get values for the y-axis.
3953
In scikit-learn, this is accomplished by binning the predictions such that the x-axis
@@ -62,7 +76,7 @@ by showing the number of samples in each predicted probability bin.
6276

6377
:class:`LogisticRegression` returns well calibrated predictions by default as it has a
6478
canonical link function for its loss, i.e. the logit-link for the :ref:`log_loss`.
65-
This leads to the so-called **balance property**, see [7]_ and
79+
This leads to the so-called **balance property**, see [8]_ and
6680
:ref:`Logistic_regression`.
6781
In contrast to that, the other shown models return biased probabilities; with
6882
different biases per model.
@@ -79,7 +93,7 @@ case in this dataset which contains 2 redundant features.
7993
:class:`RandomForestClassifier` shows the opposite behavior: the histograms
8094
show peaks at probabilities approximately 0.2 and 0.9, while probabilities
8195
close to 0 or 1 are very rare. An explanation for this is given by
82-
Niculescu-Mizil and Caruana [1]_: "Methods such as bagging and random
96+
Niculescu-Mizil and Caruana [3]_: "Methods such as bagging and random
8397
forests that average predictions from a base set of models can have
8498
difficulty making predictions near 0 and 1 because variance in the
8599
underlying base models will bias predictions that should be near zero or one
@@ -99,7 +113,7 @@ to 0 or 1 typically.
99113
.. currentmodule:: sklearn.svm
100114

101115
:class:`LinearSVC` (SVC) shows an even more sigmoid curve than the random forest, which
102-
is typical for maximum-margin methods (compare Niculescu-Mizil and Caruana [1]_), which
116+
is typical for maximum-margin methods (compare Niculescu-Mizil and Caruana [3]_), which
103117
focus on difficult to classify samples that are close to the decision boundary (the
104118
support vectors).
105119

@@ -167,29 +181,18 @@ fit the regressor. It is up to the user to
167181
make sure that the data used for fitting the classifier is disjoint from the
168182
data used for fitting the regressor.
169183

170-
:func:`sklearn.metrics.brier_score_loss` may be used to assess how
171-
well a classifier is calibrated. However, this metric should be used with care
172-
because a lower Brier score does not always mean a better calibrated model.
173-
This is because the Brier score metric is a combination of calibration loss
174-
and refinement loss. Calibration loss is defined as the mean squared deviation
175-
from empirical probabilities derived from the slope of ROC segments.
176-
Refinement loss can be defined as the expected optimal loss as measured by the
177-
area under the optimal cost curve. As refinement loss can change
178-
independently from calibration loss, a lower Brier score does not necessarily
179-
mean a better calibrated model.
180-
181-
:class:`CalibratedClassifierCV` supports the use of two 'calibration'
182-
regressors: 'sigmoid' and 'isotonic'.
184+
:class:`CalibratedClassifierCV` supports the use of two regression techniques
185+
for calibration via the `method` parameter: `"sigmoid"` and `"isotonic"`.
183186

184187
.. _sigmoid_regressor:
185188

186189
Sigmoid
187190
^^^^^^^
188191

189-
The sigmoid regressor is based on Platt's logistic model [3]_:
192+
The sigmoid regressor, `method="sigmoid"` is based on Platt's logistic model [4]_:
190193

191194
.. math::
192-
p(y_i = 1 | f_i) = \frac{1}{1 + \exp(A f_i + B)}
195+
p(y_i = 1 | f_i) = \frac{1}{1 + \exp(A f_i + B)} \,,
193196
194197
where :math:`y_i` is the true label of sample :math:`i` and :math:`f_i`
195198
is the output of the un-calibrated classifier for sample :math:`i`. :math:`A`
@@ -200,10 +203,10 @@ The sigmoid method assumes the :ref:`calibration curve <calibration_curve>`
200203
can be corrected by applying a sigmoid function to the raw predictions. This
201204
assumption has been empirically justified in the case of :ref:`svm` with
202205
common kernel functions on various benchmark datasets in section 2.1 of Platt
203-
1999 [3]_ but does not necessarily hold in general. Additionally, the
206+
1999 [4]_ but does not necessarily hold in general. Additionally, the
204207
logistic model works best if the calibration error is symmetrical, meaning
205208
the classifier output for each binary class is normally distributed with
206-
the same variance [6]_. This can be a problem for highly imbalanced
209+
the same variance [7]_. This can be a problem for highly imbalanced
207210
classification problems, where outputs do not have equal variance.
208211

209212
In general this method is most effective for small sample sizes or when the
@@ -213,7 +216,7 @@ high and low outputs.
213216
Isotonic
214217
^^^^^^^^
215218

216-
The 'isotonic' method fits a non-parametric isotonic regressor, which outputs
219+
The `method="isotonic"` fits a non-parametric isotonic regressor, which outputs
217220
a step-wise non-decreasing function, see :mod:`sklearn.isotonic`. It minimizes:
218221

219222
.. math::
@@ -226,10 +229,20 @@ calibrated classifier for sample :math:`i` (i.e., the calibrated probability).
226229
This method is more general when compared to 'sigmoid' as the only restriction
227230
is that the mapping function is monotonically increasing. It is thus more
228231
powerful as it can correct any monotonic distortion of the un-calibrated model.
229-
However, it is more prone to overfitting, especially on small datasets [5]_.
232+
However, it is more prone to overfitting, especially on small datasets [6]_.
230233

231234
Overall, 'isotonic' will perform as well as or better than 'sigmoid' when
232-
there is enough data (greater than ~ 1000 samples) to avoid overfitting [1]_.
235+
there is enough data (greater than ~ 1000 samples) to avoid overfitting [3]_.
236+
237+
.. note:: Impact on ranking metrics like AUC
238+
239+
It is generally expected that calibration does not affect ranking metrics such as
240+
ROC-AUC. However, these metrics might differ after calibration when using
241+
`method="isotonic"` since isotonic regression introduces ties in the predicted
242+
probabilities. This can be seen as within the uncertainty of the model predictions.
243+
In case, you strictly want to keep the ranking and thus AUC scores, use
244+
`method="logistic"` which is a strictly monotonic transformation and thus keeps
245+
the ranking.
233246

234247
Multiclass support
235248
^^^^^^^^^^^^^^^^^^
@@ -239,7 +252,7 @@ support 1-dimensional data (e.g., binary classification output) but are
239252
extended for multiclass classification if the `base_estimator` supports
240253
multiclass predictions. For multiclass predictions,
241254
:class:`CalibratedClassifierCV` calibrates for
242-
each class separately in a :ref:`ovr_classification` fashion [4]_. When
255+
each class separately in a :ref:`ovr_classification` fashion [5]_. When
243256
predicting
244257
probabilities, the calibrated probabilities for each class
245258
are predicted separately. As those probabilities do not necessarily sum to
@@ -254,36 +267,42 @@ one, a postprocessing is performed to normalize them.
254267

255268
.. topic:: References:
256269

257-
.. [1] `Predicting Good Probabilities with Supervised Learning
258-
<https://www.cs.cornell.edu/~alexn/papers/calibration.icml05.crc.rev3.pdf>`_,
259-
A. Niculescu-Mizil & R. Caruana, ICML 2005
270+
.. [1] Allan H. Murphy (1973).
271+
:doi:`"A New Vector Partition of the Probability Score"
272+
<10.1175/1520-0450(1973)012%3C0595:ANVPOT%3E2.0.CO;2>`
273+
Journal of Applied Meteorology and Climatology
260274
261275
.. [2] `On the combination of forecast probabilities for
262276
consecutive precipitation periods.
263277
<https://journals.ametsoc.org/waf/article/5/4/640/40179>`_
264278
Wea. Forecasting, 5, 640–650., Wilks, D. S., 1990a
265279
266-
.. [3] `Probabilistic Outputs for Support Vector Machines and Comparisons
280+
.. [3] `Predicting Good Probabilities with Supervised Learning
281+
<https://www.cs.cornell.edu/~alexn/papers/calibration.icml05.crc.rev3.pdf>`_,
282+
A. Niculescu-Mizil & R. Caruana, ICML 2005
283+
284+
285+
.. [4] `Probabilistic Outputs for Support Vector Machines and Comparisons
267286
to Regularized Likelihood Methods.
268287
<https://www.cs.colorado.edu/~mozer/Teaching/syllabi/6622/papers/Platt1999.pdf>`_
269288
J. Platt, (1999)
270289
271-
.. [4] `Transforming Classifier Scores into Accurate Multiclass
290+
.. [5] `Transforming Classifier Scores into Accurate Multiclass
272291
Probability Estimates.
273292
<https://dl.acm.org/doi/pdf/10.1145/775047.775151>`_
274293
B. Zadrozny & C. Elkan, (KDD 2002)
275294
276-
.. [5] `Predicting accurate probabilities with a ranking loss.
295+
.. [6] `Predicting accurate probabilities with a ranking loss.
277296
<https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4180410/>`_
278297
Menon AK, Jiang XJ, Vembu S, Elkan C, Ohno-Machado L.
279298
Proc Int Conf Mach Learn. 2012;2012:703-710
280299
281-
.. [6] `Beyond sigmoids: How to obtain well-calibrated probabilities from
300+
.. [7] `Beyond sigmoids: How to obtain well-calibrated probabilities from
282301
binary classifiers with beta calibration
283302
<https://projecteuclid.org/euclid.ejs/1513306867>`_
284303
Kull, M., Silva Filho, T. M., & Flach, P. (2017).
285304
286-
.. [7] Mario V. Wüthrich, Michael Merz (2023).
305+
.. [8] Mario V. Wüthrich, Michael Merz (2023).
287306
:doi:`"Statistical Foundations of Actuarial Learning and its Applications"
288307
<10.1007/978-3-031-12409-9>`
289308
Springer Actuarial

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.