Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[BUG] SVMSMOTE sample_strategy not equalizing samples of different classes #1130

Copy link
Copy link
Open
@coughlin-devin

Description

@coughlin-devin
Issue body actions

Describe the bug

When using SVMSMOTE the resampled data is not equal among the different classes when sampling_strategy is set to 'not majority' or when passing a dict of values.

Steps/Code to Reproduce

import numpy as np
from sklearn.svm import SVC
from imblearn.over_sampling import SVMSMOTE
from collections import Counter

# create data
x = np.random.normal(0, 0.5, 1000)
y = np.random.normal(0, 0.5, 1000)
clss = np.minimum(np.random.geometric(0.5, 1000), 7)

# check original class distribution
Counter(clss)
num_majority = Counter(clss).get(1)

arr = np.array((x,y)).T
svc = SVC(C=10, kernel='rbf', gamma='scale', class_weight='balanced', random_state=2024)
svc.fit(arr, clss)
not_majority = SVMSMOTE(sampling_strategy='not majority', k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)

# check resampled class distribution with 'not majority' sample_strategy
a, b = not_majority.fit_resample(arr, clss)
Counter(b)

sampling_strategy = sampling_strategy = {1:num_majority, 2:num_majority, 3:num_majority, 4:num_majority, 5:num_majority, 6:num_majority, 7:num_majority}
dict_strat = SVMSMOTE(sampling_strategy=sampling_strategy, k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)

# check resampled class distribution with dictionary sample_strategy
c, d = dict_strat.fit_resample(arr, clss)
Counter(d)

Expected Results

I would expect the resampled classes to all have the same number of samples like below:

Counter(b)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})

Counter(d)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})

Actual Results

But after resampling the minority classes have fewer samples than the majority class.

Counter(b)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})

Counter(d)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})

Versions

System:
    python: 3.9.5 (tags/v3.9.5:0a7dcbd, May  3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
executable: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\python.exe
   machine: Windows-10-10.0.26100-SP0

Python dependencies:
      sklearn: 1.5.1
          pip: 24.0
   setuptools: 65.6.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: None
       pandas: 2.2.3
   matplotlib: 3.8.0
       joblib: 1.3.1
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\numpy.libs\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\scipy.libs\libopenblas_v0.3.27--3aa239bc726cfb0bd8e5330d8d4c15c6.dll
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiomp5md.dll
        version: None
    num_threads: 6

       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiompstubs5md.dll
        version: None
    num_threads: 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.