Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit ffd2c7c

Browse filesBrowse files
committed
Stop EfficiencyWarnings in DBSCAN
As per #31030, DBSCAN consistently triggers efficiency warnings due to explicitly setting the diagonal of the X matrix in fitting neighborhoods. This stems from not sorting the precomputed sparse matrix by row values. Here, we instead update the neighborhoods variable after the initial fitting to avoid this. It may also be possible to simply add X = sort_graph_by_row_values(X, warn_when_not_sorted=False) after the original code's X.setdiag(X.diagonal()), but (1) this way seems more efficient and (2) I am not sure if this reordering would potentially affect the data in an undesired manner.
1 parent a69849a commit ffd2c7c
Copy full SHA for ffd2c7c

File tree

1 file changed

+7
-8
lines changed
Filter options

1 file changed

+7
-8
lines changed

‎sklearn/cluster/_dbscan.py

Copy file name to clipboardExpand all lines: sklearn/cluster/_dbscan.py
+7-8Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -399,14 +399,6 @@ def fit(self, X, y=None, sample_weight=None):
399399
# Calculate neighborhood for all samples. This leaves the original
400400
# point in, which needs to be considered later (i.e. point i is in the
401401
# neighborhood of point i. While True, its useless information)
402-
if self.metric == "precomputed" and sparse.issparse(X):
403-
# set the diagonal to explicit values, as a point is its own
404-
# neighbor
405-
X = X.copy() # copy to avoid in-place modification
406-
with warnings.catch_warnings():
407-
warnings.simplefilter("ignore", sparse.SparseEfficiencyWarning)
408-
X.setdiag(X.diagonal())
409-
410402
neighbors_model = NearestNeighbors(
411403
radius=self.eps,
412404
algorithm=self.algorithm,
@@ -420,6 +412,13 @@ def fit(self, X, y=None, sample_weight=None):
420412
# This has worst case O(n^2) memory complexity
421413
neighborhoods = neighbors_model.radius_neighbors(X, return_distance=False)
422414

415+
# Each point is its own neighbor, so update the neighborhoods
416+
# accordingly after the initial fitting
417+
if self.metric == "precomputed" and sparse.issparse(X):
418+
for i, neighborhood in enumerate(neighborhoods):
419+
if i not in neighborhoods[i]:
420+
neighborhoods[i] = np.append(neighborhood, i)
421+
423422
if sample_weight is None:
424423
n_neighbors = np.array([len(neighbors) for neighbors in neighborhoods])
425424
else:

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.