Open
Description
Describe the bug
When using SVMSMOTE the resampled data is not equal among the different classes when sampling_strategy is set to 'not majority' or when passing a dict of values.
Steps/Code to Reproduce
import numpy as np
from sklearn.svm import SVC
from imblearn.over_sampling import SVMSMOTE
from collections import Counter
# create data
x = np.random.normal(0, 0.5, 1000)
y = np.random.normal(0, 0.5, 1000)
clss = np.minimum(np.random.geometric(0.5, 1000), 7)
# check original class distribution
Counter(clss)
num_majority = Counter(clss).get(1)
arr = np.array((x,y)).T
svc = SVC(C=10, kernel='rbf', gamma='scale', class_weight='balanced', random_state=2024)
svc.fit(arr, clss)
not_majority = SVMSMOTE(sampling_strategy='not majority', k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)
# check resampled class distribution with 'not majority' sample_strategy
a, b = not_majority.fit_resample(arr, clss)
Counter(b)
sampling_strategy = sampling_strategy = {1:num_majority, 2:num_majority, 3:num_majority, 4:num_majority, 5:num_majority, 6:num_majority, 7:num_majority}
dict_strat = SVMSMOTE(sampling_strategy=sampling_strategy, k_neighbors=7, m_neighbors=14, svm_estimator=svc, out_step=0.25, random_state=2024)
# check resampled class distribution with dictionary sample_strategy
c, d = dict_strat.fit_resample(arr, clss)
Counter(d)
Expected Results
I would expect the resampled classes to all have the same number of samples like below:
Counter(b)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})
Counter(d)
Counter({1: 497, 2: 497, 4: 497, 7: 497, 6: 497, 3: 497, 5: 497})
Actual Results
But after resampling the minority classes have fewer samples than the majority class.
Counter(b)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})
Counter(d)
Counter({1: 497, 2: 391, 4: 228, 7: 249, 6: 209, 3: 305, 5: 275})
Versions
System:
python: 3.9.5 (tags/v3.9.5:0a7dcbd, May 3 2021, 17:27:52) [MSC v.1928 64 bit (AMD64)]
executable: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\python.exe
machine: Windows-10-10.0.26100-SP0
Python dependencies:
sklearn: 1.5.1
pip: 24.0
setuptools: 65.6.0
numpy: 1.26.4
scipy: 1.13.1
Cython: None
pandas: 2.2.3
matplotlib: 3.8.0
joblib: 1.3.1
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
prefix: vcomp
filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\sklearn\.libs\vcomp140.dll
version: None
num_threads: 12
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\numpy.libs\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
version: 0.3.23.dev
threading_layer: pthreads
architecture: Haswell
num_threads: 12
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\scipy.libs\libopenblas_v0.3.27--3aa239bc726cfb0bd8e5330d8d4c15c6.dll
version: 0.3.27
threading_layer: pthreads
architecture: Haswell
num_threads: 12
user_api: openmp
internal_api: openmp
prefix: libiomp
filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiomp5md.dll
version: None
num_threads: 6
user_api: openmp
internal_api: openmp
prefix: libiomp
filepath: C:\Users\Devin Coughlin\AppData\Local\Programs\Python\Python39\Lib\site-packages\torch\lib\libiompstubs5md.dll
version: None
num_threads: 1
Metadata
Metadata
Assignees
Labels
No labels