Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit 731370a

Browse filesBrowse files
DOC Motivate the signature for DistanceMetric.{dist_csr, rdist_csr}
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
1 parent a83887c commit 731370a
Copy full SHA for 731370a

File tree

Expand file treeCollapse file tree

1 file changed

+37
-6
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+37
-6
lines changed

‎sklearn/metrics/_dist_metrics.pyx.tp

Copy file name to clipboardExpand all lines: sklearn/metrics/_dist_metrics.pyx.tp
+37-6Lines changed: 37 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -415,9 +415,41 @@ cdef class DistanceMetric{{name_suffix}}:
415415
The implementation of this method in subclasses must be robust to the
416416
presence of explicit zeros in the CSR representation.
417417

418-
All the parameters are passed as to not use memoryview slicing
419-
because it is currently known to slow down execution as it
420-
takes the GIL. See: https://github.com/scikit-learn/scikit-learn/issues/17299
418+
An alternative signature would be:
419+
420+
cdef DTYPE_t dist_csr(
421+
self,
422+
const {{INPUT_DTYPE_t}}[:] x1_data,
423+
const SPARSE_INDEX_TYPE_t[:] x1_indices,
424+
const {{INPUT_DTYPE_t}}[:] x2_data,
425+
const SPARSE_INDEX_TYPE_t[:] x2_indices,
426+
) nogil except -1:
427+
428+
Where calles would use slicing on the original CSR data and indices
429+
memoryview:
430+
431+
x1_start = X1_csr.indices_ptr[i]
432+
x1_end = X1_csr.indices_ptr[i+1]
433+
x2_start = X2_csr.indices_ptr[j]
434+
x2_end = X2_csr.indices_ptr[j+1]
435+
436+
self.dist_csr(
437+
x1_data[x1_start:x1_end],
438+
x1_indices[x1_start:x1_end],
439+
x2_data[x2_start:x2_end],
440+
x2_indices[x2_start:x2_end],
441+
)
442+
443+
Yet, slicing on memoryview slows down execution as it takes the GIL.
444+
See: https://github.com/scikit-learn/scikit-learn/issues/17299
445+
446+
Hence, to avoid slicing the data and indices arrays of the sparse
447+
matrices containing respectively x1 and x2 (namely x{1,2}_{data,indice})
448+
are passed as well as their indice pointers (namely x{1,2}_{start,end}).
449+
450+
For reference about the CSR format, see section 3.4 of
451+
Saad, Y. (2003), Iterative Methods for Sparse Linear Systems, SIAM.
452+
https://www-users.cse.umn.edu/~saad/IterMethBook_2ndEd.pdf
421453
"""
422454
return -999
423455

@@ -447,9 +479,8 @@ cdef class DistanceMetric{{name_suffix}}:
447479
The implementation of this method in subclasses must be robust to the
448480
presence of explicit zeros in the CSR representation.
449481

450-
All the parameters are passed as to not use memoryview slicing
451-
because it is currently known to slow down execution as it
452-
takes the GIL. See: https://github.com/scikit-learn/scikit-learn/issues/17299
482+
More information about the motives for this method signature is given
483+
in the docstring of dist_csr.
453484
"""
454485
return self.dist_csr(
455486
x1_data,

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.