Closed
Description
Describe the bug
I just tried to upgrade the package from version 0.24.2 to the latest release. Doing so, my integration tests would start to fail, claiming that there would not be enough samples for at least one class. This only occurs if I use string-based targets instead of integers.
As far as I have seen, there is no API change documented inside the changelog. Doing some testing, it seems like version 1.0 introduced the breaking change.
Steps/Code to Reproduce
import sklearn; sklearn.show_versions()
from sklearn.calibration import CalibratedClassifierCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC
pipeline = Pipeline([
('vect', TfidfVectorizer()),
('clf', CalibratedClassifierCV(LinearSVC(), cv=3)),
])
pipeline.fit(
['word0 word1 word3 word4'] + ['word0 word1 word2 word3 word4'] * 10 + ['word5 word6 word7 word8 word9'] * 10,
[1] + [1] * 10 + [2] * 10,
)
pipeline = Pipeline([
('vect', TfidfVectorizer()),
('clf', CalibratedClassifierCV(LinearSVC(), cv=3)),
])
pipeline.fit(
['word0 word1 word3 word4'] + ['word0 word1 word2 word3 word4'] * 10 + ['word5 word6 word7 word8 word9'] * 10,
['1'] + ['1'] * 10 + ['2'] * 10,
)
Expected Results
Both pipelines (once with integer targets, once with string targets) can be trained without issues.
Actual Results
Traceback (most recent call last):
File "/home/stefan/aaa/run.py", line 25, in <module>
pipeline.fit(
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/pipeline.py", line 475, in fit
self._final_estimator.fit(Xt, y, **last_step_params["fit"])
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/calibration.py", line 394, in fit
raise ValueError(
ValueError: Requesting 3-fold cross-validation but provided less than 3 examples for at least one class.
Versions
Failing:
System:
python: 3.9.18 (main, Sep 06 2023, 07:49:32) [GCC]
executable: /home/stefan/aaa/venv/bin/python
machine: Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31
Python dependencies:
sklearn: 1.4.2
pip: 22.2.2
setuptools: 68.2.2
numpy: 1.26.4
scipy: 1.13.0
Cython: None
pandas: None
matplotlib: None
joblib: 1.4.0
threadpoolctl: 3.4.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 8
prefix: libgomp
filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
version: 0.3.23.dev
threading_layer: pthreads
architecture: Haswell
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/scipy.libs/libopenblasp-r0-24bff013.3.26.dev.so
version: 0.3.26.dev
threading_layer: pthreads
architecture: Haswell
Working:
System:
python: 3.9.18 (main, Sep 06 2023, 07:49:32) [GCC]
executable: /home/stefan/aaa/venv/bin/python
machine: Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31
Python dependencies:
pip: 22.2.2
setuptools: 68.2.2
sklearn: 0.24.2
numpy: 1.26.4
scipy: 1.13.0
Cython: None
pandas: None
matplotlib: None
joblib: 1.4.0
threadpoolctl: 3.4.0
Built with OpenMP: True