@@ -415,9 +415,41 @@ cdef class DistanceMetric{{name_suffix}}:
415
415
The implementation of this method in subclasses must be robust to the
416
416
presence of explicit zeros in the CSR representation.
417
417
418
- All the parameters are passed as to not use memoryview slicing
419
- because it is currently known to slow down execution as it
420
- takes the GIL. See: https://github.com/scikit-learn/scikit-learn/issues/17299
418
+ An alternative signature would be:
419
+
420
+ cdef DTYPE_t dist_csr(
421
+ self,
422
+ const {{INPUT_DTYPE_t}}[:] x1_data,
423
+ const SPARSE_INDEX_TYPE_t[:] x1_indices,
424
+ const {{INPUT_DTYPE_t}}[:] x2_data,
425
+ const SPARSE_INDEX_TYPE_t[:] x2_indices,
426
+ ) nogil except -1:
427
+
428
+ Where calles would use slicing on the original CSR data and indices
429
+ memoryview:
430
+
431
+ x1_start = X1_csr.indices_ptr[i]
432
+ x1_end = X1_csr.indices_ptr[i+1]
433
+ x2_start = X2_csr.indices_ptr[j]
434
+ x2_end = X2_csr.indices_ptr[j+1]
435
+
436
+ self.dist_csr(
437
+ x1_data[x1_start:x1_end],
438
+ x1_indices[x1_start:x1_end],
439
+ x2_data[x2_start:x2_end],
440
+ x2_indices[x2_start:x2_end],
441
+ )
442
+
443
+ Yet, slicing on memoryview slows down execution as it takes the GIL.
444
+ See: https://github.com/scikit-learn/scikit-learn/issues/17299
445
+
446
+ Hence, to avoid slicing the data and indices arrays of the sparse
447
+ matrices containing respectively x1 and x2 (namely x{1,2}_{data,indice})
448
+ are passed as well as their indice pointers (namely x{1,2}_{start,end}).
449
+
450
+ For reference about the CSR format, see section 3.4 of
451
+ Saad, Y. (2003), Iterative Methods for Sparse Linear Systems, SIAM.
452
+ https://www-users.cse.umn.edu/~saad/IterMethBook_2ndEd.pdf
421
453
"""
422
454
return -999
423
455
@@ -447,9 +479,8 @@ cdef class DistanceMetric{{name_suffix}}:
447
479
The implementation of this method in subclasses must be robust to the
448
480
presence of explicit zeros in the CSR representation.
449
481
450
- All the parameters are passed as to not use memoryview slicing
451
- because it is currently known to slow down execution as it
452
- takes the GIL. See: https://github.com/scikit-learn/scikit-learn/issues/17299
482
+ More information about the motives for this method signature is given
483
+ in the docstring of dist_csr.
453
484
"""
454
485
return self.dist_csr(
455
486
x1_data,
0 commit comments