Open
Description
Describe the bug
Running clustering algorithm with n_jobs parameter set to more than 1 thread causes memory leak each time algorithm is run.
This simple code causes additional memory leak at each loop cycle. The issue will not occur if i replace manifold reduction algorithm with precomputed features.
Steps/Code to Reproduce
import gc
import numpy as np
from sklearn.manifold import TSNE
from sklearn.cluster import OPTICS
import psutil
process = psutil.Process()
def main():
data = np.random.random((100, 100))
for _i in range(1, 50):
points = TSNE().fit_transform(data)
prediction = OPTICS(n_jobs=2).fit_predict(points) # n_jobs!=1
points = None
prediction = None
del prediction
del points
gc.collect()
print(f"{process.memory_info().rss / 1e6:.1f} MB")
main()
Expected Results
Program's memory usage nearly constant between loop cycles
Actual Results
Program's memory usage increases infinitely
Versions
System:
python: 3.10.6 (tags/v3.10.6:9c7b4bd, Aug 1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
executable: .venv\Scripts\python.exe
machine: Windows-10-10.0.26100-SP0
Python dependencies:
sklearn: 1.6.0
pip: 24.3.1
setuptools: 63.2.0
numpy: 1.25.2
scipy: 1.14.1
Cython: None
pandas: 2.2.3
matplotlib: 3.10.0
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 16
prefix: vcomp
filepath: .venv\Lib\site-packages\sklearn\.libs\vcomp140.dll
version: None
user_api: blas
internal_api: openblas
num_threads: 16
prefix: libopenblas
filepath: .venv\Lib\site-packages\numpy\.libs\libopenblas64__v0.3.23-246-g3d31191b-gcc_10_3_0.dll
version: 0.3.23.dev
threading_layer: pthreads
architecture: Cooperlake
user_api: blas
internal_api: openblas
num_threads: 16
prefix: libscipy_openblas
filepath: .venv\Lib\site-packages\scipy.libs\libscipy_openblas-5b1ec8b915dfb81d11cebc0788069d2d.dll
version: 0.3.27.dev
threading_layer: pthreads
architecture: Cooperlake