Open
Description
Describe the bug
Calling dbscan always triggers an efficiency warning. There is no apparent way to either call it correctly or disable the warning.
This was originally reported as an issue in SemiBin, which uses DBSCAN under the hood: BigDataBiology/SemiBin#175
Steps/Code to Reproduce
import numpy as np
from sklearn.cluster import dbscan
from sklearn.neighbors import kneighbors_graph, sort_graph_by_row_values
f = np.random.randn(10_000, 240)
dist_matrix = kneighbors_graph(
f,
n_neighbors=200,
mode='distance',
p=2,
n_jobs=3)
_, labels = dbscan(dist_matrix,
eps=0.1, min_samples=5, n_jobs=4, metric='precomputed')
dist_matrix = sort_graph_by_row_values(dist_matrix)
_, labels = dbscan(dist_matrix,
eps=0.1, min_samples=5, n_jobs=4, metric='precomputed')
Expected Results
No warning, at least in second call
Actual Results
/home/luispedro/.mambaforge/envs/py3.11/lib/python3.11/site-packages/sklearn/neighbors/_base.py:248: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.
warnings.warn(
/home/luispedro/.mambaforge/envs/py3.11/lib/python3.11/site-packages/sklearn/neighbors/_base.py:248: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.
warnings.warn(
Versions
I tested on the current main branch, 5cdbbf15e3fade7cc2462ef66dc4ea0f37f390e3, but it has been going on for a while (see original SemiBin report from September 2024):
System:
python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
executable: /home/luispedro/.mambaforge/envs/py3.11/bin/python3.11
machine: Linux-6.8.0-55-generic-x86_64-with-glibc2.39
Python dependencies:
sklearn: 1.7.dev0
pip: 24.0
setuptools: 70.0.0
numpy: 1.26.4
scipy: 1.13.1
Cython: None
pandas: 2.2.2
matplotlib: 3.8.4
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 16
prefix: libopenblas
filepath: /home/luispedro/.mambaforge/envs/py3.11/lib/libopenblasp-r0.3.27.so
version: 0.3.27
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 16
prefix: libgomp
filepath: /home/luispedro/.mambaforge/envs/py3.11/lib/libgomp.so.1.0.0
version: None