-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH add CAP curve #28972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH add CAP curve #28972
Conversation
@JosephBARBIERDARNAL Could you please address all the comments from #28752? |
Yes, I'm ready to do it, but I won't be able to get back to it quickly (2 to 3 months) unfortunately. Hope that's okay |
Working on this PR again. Here are the current updates:
Some (likely non-exhaustive) issues:
import matplotlib.pyplot as plt
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import CapCurveDisplay
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
clf = LogisticRegression(max_iter=1000)
clf.fit(X_train, y_train)
y_scores = clf.decision_function(X_test)
fig, ax = plt.subplots(ncols=2, dpi=300, figsize=(12,12))
display = CapCurveDisplay.from_predictions(
ax=ax[0],
y_true=y_test,
y_pred=y_scores,
name='normalize_scale=False',
normalize_scale=False,
plot_chance_level=True
)
display = CapCurveDisplay.from_predictions(
ax=ax[1],
y_true=y_test,
y_pred=y_scores,
name='normalize_scale=True',
normalize_scale=True,
plot_chance_level=True
)
|
I'll come back on this PR after the release. This will be one of my priority to be merged for 1.7. |
Ok cool. There's still a few things I need to do anyway before a review (cov test and adding the #30023 check). |
sklearn/metrics/_plot/cap_curve.py
Outdated
|
||
# compute cumulative sums for true positives and all cases | ||
y_true_cumulative = np.cumsum(y_true_sorted * sample_weight_sorted) | ||
cumulative_total = np.cumsum(sample_weight_sorted) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For information, there was a concurrent PR to fix the lack of cumsum
of the sample_weight
to define the x-axis of the Lorenz curves in one of our regression examples. To check that this was the correct fix, we ran a quick check on synthetic data with integer valued sample weights to check that there is an exact equivalence between repeated data points and reweighting them by exposure:
Maybe this PR could be expanded to test for this property also holds for CapCurveDisplay.from_predictions
with non-constant, integer-valued sample_weight
.
EDIT: I am not entirely sure how to write such a test, but possibly we could numpy.interp1d
the CAP curve computed on the repeated for the x-axis location of the points of the weighted CAP curve and check that the two curves match up to a small eps at those locations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are several review comments in #28752 that have not yet been addressed.
I would like to make sure that we do not forget about them when iterating on this PR.
@JosephBARBIERDARNAL to sort things out, please reply in the threads of the review of #28752 and make it explicit which comments have been addressed in #28972 and how, and then mark those threads as resolved.
Also, we need some test and update an existing example, where we compare ROC, DET and CAP curves on the same classifier. I suppose this example is the best candidate:
https://scikit-learn.org/stable/auto_examples/model_selection/plot_det.html
@ogrisel That sounds good to me. I won't be able to work on it immediately, but I’ll definitely be able to get to it within the next few weeks. I'll ping you and/or @glemaitre for review |
2 things are important for me:
|
|
I resolved most of the conversations there. I didn't touch some of them if I wasn't sure. Feel free to ping me if any changes are needed. I'm just not sure what |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a really quick review. I'll continue tomorrow.
if sample_weight is None: | ||
sample_weight = np.ones_like(y_true, dtype=np.float64) | ||
|
||
sorted_indices = np.argsort(y_pred) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not see this comment in the GitHub frontend but it seems important to be addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @JosephBARBIERDARNAL. This new feature is definitely very valuable. Here's a batch of comments regarding the modified example.
examples/model_selection/plot_det.py
Outdated
@@ -4,17 +4,22 @@ | ||
==================================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this name for the example would be pertinent anymore. Maybe we can go with something like "Compare ROC, DET and CAP curves". Then we would have to rename the section a bit further below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me! Let me know when this is approved by others.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the title needs to be updated.
I am not so sure about using the infinitive form of a verb in a title, though. Maybe "Comparing ROC, DET and CAP curves". Or "Visual assessment of the performance of binary classifiers with ROC, DET and CAP curves".
I also think we should add the Precision-Recall curve to that example in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Comparing ROC, DET and CAP curves" seems good to me. That what I've put, but no strong opinion.
ax=ax_roc, | ||
name=name, | ||
pos_label=pos_label, | ||
plot_chance_level=is_last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we already have the DummyClassifier
, this line is overlapping. I think back in the day we decided that we better avoid the mention to "chance level" if possible
plot_chance_level=is_last, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to keep plot_chance_level=is_last
for CAP and ROC.
It's interesting to see that the theoretical/expected "chance level" curve exactly matches the empirical constant predictor curve for CAP while both match exactly for ROC (and DET, also we do not have the option yet for DET).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we decided that we better avoid the mention to "chance level" if possible
Only in the text of the headers and paragraphs: I prefer discussing the concept with more precise phrasing and then define what we name "chance level" in the API and on the legend of the plot actually is.
# The diagonal black-dotted lines named "chance level" in the plots above | ||
# correspond to a the expected value of a non-informative classifier on an | ||
# infinite evaluation set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I we go with avoiding the term "chance level" I would rather remove this whole paragraph and only mention in the paragraph about the DummyClassifier
below that non-informative classifiers can be used as baseline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph was added intentionally. The two curves (expected curve vs finite sample curve) do not match exactly for the CAP case (contrary to ROC and DET), hence the decision to make this explicit in the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The important word above is "expected" (the empirical curve converges to the expected curve in the large sample limit).
Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com>
A few things that I think need to be addressed:
|
I think we should rename to class ROCCurveDisplay(...):
...
# Backward compat alias to keep code implemented for scikit-learn 1.7 and earlier working.
RocCurveDisplay = ROCCurveDisplay We could introduce a deprecation warning, but I am worried that this will break educational resources (e.g. code in blog posts, tutorials, books) for little benefit, so I would rather go with "soft" deprecation: rename and keep a long term backward compat alias without warnings. But let's not do that as part of this PR and rather do the renaming/deprecation in a follow-up instead. |
It's already part of the API, so we have to stick with it for the API. My earlier comment is not to put it in section headers or documentation paragraphs without explanations. I prefer introducing the concept with more precise terms such as "The expected curve of a non-informative or constant classifier whose predictions do not depend on the input features". Note that for ROC and DET, the expected curve of the non-informative classifier in the infinite sample limit (what we name "chance level" in the API) matches exactly the curve of the constant predictor obtained on a finite size test set. For CAP it's more subtle: the "chance level" line only approximately matches the curve of the constant predictor obtained on a finite size test set. The two would only match in the large sample size limit. I think this is an interesting point and I would like to highlight in this section of the example. |
I'm a bit short of time at the moment, but to sum up the TODO:
Feel free to add any other missing elements. |
This PR is a dupplicate from #28752. I'm recreating a PR because I accidentally deleted my fork with the old one.
Reference Issue
fix for #10003
What does this implement/fix?
creation of a CumulativeAccuracyDisplay class for plots
"The CAP of a model represents the cumulative number of positive outcomes along the y-axis versus the corresponding cumulative number of a classifying parameter along the x-axis. The output is called a CAP curve.[1] The CAP is distinct from the receiver operating characteristic (ROC) curve, which plots the true-positive rate against the false-positive rate." (wikipedia definition)
It's mainly inspired from the
RocCurveDisplay
class.other
It's currently a work in progress.
TODO
Binary classification
ValueError
infrom_estimator
if the estimator is not fitted or is a classifier that was fitted with more than 3 classes;pos_label
handling when the positive class;response_method="decision_function"
andresponse_method="predict_proba"
for aLogisticRegression
classifier fit with string labels and for all 3 possible values ofpos_label
;test_display_from_estimator_and_from_prediction
;y_true_cumulative
andcumulative_total
have the same dtype asy_pred
in the test aboutfrom_predictions
. We can test fory_pred
passed either asnp.float32
ornp.float64
.CAPCurveDisplay.from_estimator(LinearSVC().fit(X, y), ...)
works (even if it does not have apredict_proba
method. This should cover one of the line reported as uncovered by codecov.test_common_curve_display.py
to reuse some generic tests onCAPCurveDisplay
and maybe remove redundant tests on invalid inputs fromtest_cap_curve_display.py
if any;despine
argument?*Display
classes in scikit-learn. Feel free to open an issue to discuss this with screen shots e.g. on ROC or PR curves and your analysis of pros and cons.despine
keyword for ROC and PR curves #26367). I'm not sure it makes much sense forConfusionMatrixDisplay
(?). I'll open an issue (when this PR will be merged) forCAPCurveDisplay
,PredictionErrorDisplay
andDetCurveDisplay
because I think they're the only ones that don't have this option.Regression
ValueError
with an informative error message ify_true
has negative values;ValueError
if ally_true
are zeros (the plot would be degenerate and would raise a low leveldivide by zero
warning whithnormalize_scale=True
);y_true
are zeros, it will be considered a case of classificationPoissonRegressor
) and check that the regressor curve lie between the "chance level" and "perfect" curves;examples/linear_model/plot_tweedie_regression_insurance_claims.py
andexamples/linear_model/plot_poisson_regression_non_normal_loss.py
) to use theCAPCurveDisplay
class instead of manually plotting the Lorenz curves.Other
doc/whats_new/upcoming_changes/
doc/visualization
) to reference this new tool.DetCurveDisplay
,RocCurveDisplay
and thenCAPCurveDisplay
, which seems inconsistent.Nice to have