Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Version 1.0 breaks cross-validation with string targets #28841

Copy link
Copy link
Closed
@stefan6419846

Description

@stefan6419846
Issue body actions

Describe the bug

I just tried to upgrade the package from version 0.24.2 to the latest release. Doing so, my integration tests would start to fail, claiming that there would not be enough samples for at least one class. This only occurs if I use string-based targets instead of integers.

As far as I have seen, there is no API change documented inside the changelog. Doing some testing, it seems like version 1.0 introduced the breaking change.

Steps/Code to Reproduce

import sklearn; sklearn.show_versions()

from sklearn.calibration import CalibratedClassifierCV
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.pipeline import Pipeline
from sklearn.svm import LinearSVC


pipeline = Pipeline([
    ('vect', TfidfVectorizer()),
    ('clf', CalibratedClassifierCV(LinearSVC(), cv=3)),
])

pipeline.fit(
    ['word0 word1 word3 word4'] + ['word0 word1 word2 word3 word4'] * 10 + ['word5 word6 word7 word8 word9'] * 10,
    [1] + [1] * 10 + [2] * 10,
)


pipeline = Pipeline([
    ('vect', TfidfVectorizer()),
    ('clf', CalibratedClassifierCV(LinearSVC(), cv=3)),
])

pipeline.fit(
    ['word0 word1 word3 word4'] + ['word0 word1 word2 word3 word4'] * 10 + ['word5 word6 word7 word8 word9'] * 10,
    ['1'] + ['1'] * 10 + ['2'] * 10,
)

Expected Results

Both pipelines (once with integer targets, once with string targets) can be trained without issues.

Actual Results

Traceback (most recent call last):
  File "/home/stefan/aaa/run.py", line 25, in <module>
    pipeline.fit(
  File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/pipeline.py", line 475, in fit
    self._final_estimator.fit(Xt, y, **last_step_params["fit"])
  File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/base.py", line 1474, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/stefan/aaa/venv/lib64/python3.9/site-packages/sklearn/calibration.py", line 394, in fit
    raise ValueError(
ValueError: Requesting 3-fold cross-validation but provided less than 3 examples for at least one class.

Versions

Failing:

System:
    python: 3.9.18 (main, Sep 06 2023, 07:49:32) [GCC]
executable: /home/stefan/aaa/venv/bin/python
   machine: Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

Python dependencies:
      sklearn: 1.4.2
          pip: 22.2.2
   setuptools: 68.2.2
        numpy: 1.26.4
        scipy: 1.13.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.4.0
threadpoolctl: 3.4.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libgomp
       filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /home/stefan/aaa/venv/lib64/python3.9/site-packages/scipy.libs/libopenblasp-r0-24bff013.3.26.dev.so
        version: 0.3.26.dev
threading_layer: pthreads
   architecture: Haswell

Working:

System:
    python: 3.9.18 (main, Sep 06 2023, 07:49:32) [GCC]
executable: /home/stefan/aaa/venv/bin/python
   machine: Linux-5.14.21-150400.24.100-default-x86_64-with-glibc2.31

Python dependencies:
          pip: 22.2.2
   setuptools: 68.2.2
      sklearn: 0.24.2
        numpy: 1.26.4
        scipy: 1.13.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.4.0
threadpoolctl: 3.4.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.