Description
Describe the bug
Issue Description
When running code involving GaussianMixture (or KMeans), a UserWarning about a known memory leak on Windows with MKL is raised, even after implementing the suggested workaround (OMP_NUM_THREADS=1 or 2). The warning persists across multiple environments and configurations, indicating the issue may require further investigation.
Warning Message:
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Steps to Reproduce
1-Code Example:
import os
os.environ["OMP_NUM_THREADS"] = "1" # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture
# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)
# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X) # Warning triggered here
Environment:
OS: Windows 11
Python: 3.10.12
scikit-learn: 1.3.2
numpy: 1.26.0 (linked to MKL via Anaconda)
Installation Method: Anaconda (conda install scikit-learn).
Expected vs. Actual Behavior
Expected: Setting OMP_NUM_THREADS should suppress the warning and resolve the memory leak.
Actual: The warning persists despite environment variable configurations, reinstalls, and thread-limiting methods.
Attempted Fixes
Set OMP_NUM_THREADS=1 or 2 in code and system environment variables.
Limited threads via threadpoolctl:
code:
from threadpoolctl import threadpool_limits
with threadpool_limits(limits=1, user_api='blas'):
gmm.fit(X)
Reinstalled numpy and scipy with OpenBLAS instead of MKL.
Tested in fresh conda environments.
Updated all packages to latest versions.
None of these resolved the warning.
Additional Context:
The warning appears even when using GaussianMixture, which indirectly relies on KMeans-related code.
The issue is specific to Windows + MKL. No warnings on Linux/Mac.
Full error log: [Attach log if available].
Questions for Maintainers:
Is there a deeper configuration or bug causing this warning to persist?
Are there alternative workarounds for Windows users?
Is this issue being tracked in ongoing development?
Thank you for your time and support!
Let me know if further details are needed.
Steps/Code to Reproduce
import os
os.environ["OMP_NUM_THREADS"] = "1" # Also tested with "2"
os.environ["MKL_NUM_THREADS"] = "1"
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.mixture import GaussianMixture
# Generate synthetic 3D data
X, _ = make_blobs(n_samples=300, n_features=3, centers=3, random_state=42)
# Train GMM model
gmm = GaussianMixture(n_components=3, random_state=42)
gmm.fit(X) # Warning triggered here
Expected Results
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Actual Results
C:\ProgramData\anaconda3\Lib\site-packages\sklearn\cluster\_kmeans.py:1429: UserWarning: KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=2.
warnings.warn(
Versions
scikit-learn: 1.3.2
numpy: 1.26.0 (linked to MKL via Anaconda)