-
Notifications
You must be signed in to change notification settings - Fork 1
MAINT Division-less alternative for fused sparse-dense support #15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Division-less alternative for fused sparse-dense support #15
Conversation
The modulo trick is costly because it uses integer division under the hood which are up to 80 times slower. An alternative is to pass the address of the `indices` array and shift this address at the callers level with the information that we have there so that the dereferencing works properly. This commit replaces the modulo trick by this alternative and adapt documentation and comments in this regard. See: scikit-learn#23585 (comment)
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
6633d2f
to
a3cf4d8
Compare
This alternative performs better than the one used in the base branch. 🎉 Here are the results using a10dbcb.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, thanks for the benchmark results.
Just a detail in the comment to fix below:
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
I think that I would wait for @thomasjpfan's opinion before merge this PR as it's easier to review this diff independently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the ping!
sklearn/metrics/_pairwise_distances_reduction/_datasets_pair.pyx
Outdated
Show resolved
Hide resolved
BTW I think we can merge this directly into |
MNT Pushing data up instead of indices
I merged a few commit in scikit-learn#23585, IIRC and we still need to perform benchmark against So let's merge this first in this scikit-learn#23585 first. |
To use the a more efficient implementation for future callers implementations. See: #15
To use the a more efficient implementation for future callers implementations. See: #15
Reference Issues/PRs
Test trick for scikit-learn#23585.
What does this implement/fix? Explain your changes.
The modulo trick is costly because it uses integer division
under the hood which uses
idiv
instruction that takes up to80 CPU cycles each.
An alternative is to pass the address of the
indices
array and shiftthis address at the callers level with the information that we have
there so that the dereferencing works properly.
This commit replaces the modulo trick by this alternative and adapt
documentation and comments in this regard.
See: scikit-learn#23585 (comment)