Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

DBSCAN always triggers and EfficiencyWarning #31030

Copy link
Copy link
Open
@luispedro

Description

@luispedro
Issue body actions

Describe the bug

Calling dbscan always triggers an efficiency warning. There is no apparent way to either call it correctly or disable the warning.

This was originally reported as an issue in SemiBin, which uses DBSCAN under the hood: BigDataBiology/SemiBin#175

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster import dbscan
from sklearn.neighbors import kneighbors_graph, sort_graph_by_row_values

f = np.random.randn(10_000, 240)
dist_matrix = kneighbors_graph(
    f,
    n_neighbors=200,
    mode='distance',
    p=2,
    n_jobs=3)

_, labels = dbscan(dist_matrix,
        eps=0.1, min_samples=5, n_jobs=4, metric='precomputed')


dist_matrix = sort_graph_by_row_values(dist_matrix)
_, labels = dbscan(dist_matrix,
        eps=0.1, min_samples=5, n_jobs=4, metric='precomputed')

Expected Results

No warning, at least in second call

Actual Results

/home/luispedro/.mambaforge/envs/py3.11/lib/python3.11/site-packages/sklearn/neighbors/_base.py:248: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.
  warnings.warn(
/home/luispedro/.mambaforge/envs/py3.11/lib/python3.11/site-packages/sklearn/neighbors/_base.py:248: EfficiencyWarning: Precomputed sparse input was not sorted by row values. Use the function sklearn.neighbors.sort_graph_by_row_values to sort the input by row values, with warn_when_not_sorted=False to remove this warning.
  warnings.warn(

Versions

I tested on the current main branch, 5cdbbf15e3fade7cc2462ef66dc4ea0f37f390e3, but it has been going on for a while (see original SemiBin report from September 2024):


System:
    python: 3.11.9 | packaged by conda-forge | (main, Apr 19 2024, 18:36:13) [GCC 12.3.0]
executable: /home/luispedro/.mambaforge/envs/py3.11/bin/python3.11
   machine: Linux-6.8.0-55-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.7.dev0
          pip: 24.0
   setuptools: 70.0.0
        numpy: 1.26.4
        scipy: 1.13.1
       Cython: None
       pandas: 2.2.2
   matplotlib: 3.8.4
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 16
         prefix: libopenblas
       filepath: /home/luispedro/.mambaforge/envs/py3.11/lib/libopenblasp-r0.3.27.so
        version: 0.3.27
threading_layer: pthreads
   architecture: Haswell

       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libgomp
       filepath: /home/luispedro/.mambaforge/envs/py3.11/lib/libgomp.so.1.0.0
        version: None

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions

    Morty Proxy This is a proxified and sanitized view of the page, visit original site.