Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

MAINT Division-less alternative for fused sparse-dense support #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Aug 11, 2022

Conversation

jjerphan
Copy link
Owner

@jjerphan jjerphan commented Jul 21, 2022

Reference Issues/PRs

Test trick for scikit-learn#23585.

What does this implement/fix? Explain your changes.

The modulo trick is costly because it uses integer division
under the hood which uses idiv instruction that takes up to
80 CPU cycles each.

An alternative is to pass the address of the indices array and shift
this address at the callers level with the information that we have
there so that the dereferencing works properly.

This commit replaces the modulo trick by this alternative and adapt
documentation and comments in this regard.

See: scikit-learn#23585 (comment)

The modulo trick is costly because it uses integer division
under the hood which are up to 80 times slower.

An alternative is to pass the address of the `indices` array and shift
this address at the callers level with the information that we have
there so that the dereferencing works properly.

This commit replaces the modulo trick by this alternative and adapt
documentation and comments in this regard.

See:
scikit-learn#23585 (comment)
@jjerphan jjerphan changed the base branch from feat/pdr-sparse-support to maint/pdr-sparse-support July 21, 2022 15:05
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan jjerphan force-pushed the alt/feat/pdr-sparse-support branch from 6633d2f to a3cf4d8 Compare August 9, 2022 07:18
@jjerphan
Copy link
Owner Author

jjerphan commented Aug 9, 2022

This alternative performs better than the one used in the base branch. 🎉

Here are the results using a10dbcb.

· Creating environments
· Discovering benchmarks..
·· Uninstalling from conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl...
·· Building a10dbcbd <benchmarks/alt/feat/pdr-sparse-support> for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl................................
·· Installing a10dbcbd <benchmarks/alt/feat/pdr-sparse-support> into conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl.
· Running 2 total benchmarks (2 commits * 1 environments * 1 benchmarks)
[  0.00%] · For scikit-learn commit a10dbcbd <benchmarks/alt/feat/pdr-sparse-support> (round 1/1):
[  0.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[ 50.00%] ··· pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin                                                                                                                                              
ok
[ 50.00%] ··· ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================
              --                                                                                                metric / strategy
              ------------------------------- -----------------------------------------------------------------------------------------------------------------------------------------------------
               n_train   n_test   n_features   euclidean / auto   euclidean / parallel_on_X   euclidean / parallel_on_Y   manhattan / auto   manhattan / parallel_on_X   manhattan / parallel_on_Y
              ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================
                 1000     1000       100           106±20ms                58.4±2ms                    144±20ms               103±20ms                55.8±3ms                    104±20ms
                 1000    10000       100          1.21±0.04s               63.0±4ms                   1.19±0.03s             1.19±0.02s               58.0±6ms                   1.19±0.02s
                 1000    100000      100          11.7±0.06s               274±1ms                    11.8±0.02s             11.6±0.01s              231±0.8ms                   11.6±0.05s
                10000     1000       100           115±6ms                 473±4ms                     121±5ms                120±7ms                 445±5ms                     122±10ms
                10000    10000       100          1.00±0.04s               474±3ms                    1.08±0.03s             1.08±0.06s               441±5ms                    1.08±0.03s
                10000    100000      100          10.00±0.1s              2.11±0.04s                  10.0±0.1s              10.7±0.4s               2.07±0.04s                  9.92±0.1s
                100000    1000       100           246±7ms                 4.51±0s                     256±7ms                279±30ms                4.22±0s                     256±30ms
                100000   10000       100          2.41±0.1s               4.59±0.03s                  2.12±0.08s             2.08±0.1s               4.22±0.01s                  2.35±0.2s
                100000   100000      100          21.7±0.7s               20.2±0.1s                   23.2±0.6s              22.5±0.9s               19.3±0.02s                  21.5±0.7s
              ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================

[ 50.00%] · For scikit-learn commit 5570f71d <feat/pdr-sparse-support> (round 1/1):
[ 50.00%] ·· Building for conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl..................................
[ 50.00%] ·· Benchmarking conda-py3.9-cython-joblib-numpy-scipy-threadpoolctl
[100.00%] ··· pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin                                                                                                                                              
ok
[100.00%] ··· ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================
              --                                                                                                metric / strategy
              ------------------------------- -----------------------------------------------------------------------------------------------------------------------------------------------------
               n_train   n_test   n_features   euclidean / auto   euclidean / parallel_on_X   euclidean / parallel_on_Y   manhattan / auto   manhattan / parallel_on_X   manhattan / parallel_on_Y
              ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================
                 1000     1000       100           316±5ms                 241±4ms                     316±4ms                327±5ms                 254±3ms                     325±4ms
                 1000    10000       100          2.95±0.01s               234±5ms                     2.94±0s               3.08±0.2s                245±5ms                    3.03±0.01s
                 1000    100000      100          28.9±0.07s               889±50ms                   28.8±0.4s              29.9±0.02s              1.03±0.09s                  29.8±0.4s
                10000     1000       100           294±8ms                 2.23±0s                     301±10ms               313±10ms                2.32±0s                     313±8ms
                10000    10000       100          2.69±0.03s               2.22±0s                    2.92±0.04s             2.98±0.03s               2.32±0s                    3.01±0.03s
                10000    100000      100          27.3±0.2s               8.83±0.2s                   26.5±0.2s              28.1±0.01s              9.44±0.1s                   28.5±0.09s
                100000    1000       100          1.04±0.03s              22.1±0.01s                  1.05±0.04s              933±70ms               23.0±0.01s                  1.08±0.01s
                100000   10000       100          10.8±0.2s               22.1±0.01s                  10.3±0.3s              9.24±0.5s               23.1±0.04s                  10.2±0.4s
                100000   100000      100          1.51±0.01m              1.51±0.03m                   1.51±0m               1.58±0.01m              1.57±0.06m                  1.59±0.04m
              ========= ======== ============ ================== =========================== =========================== ================== =========================== ===========================

.       before           after         ratio
     [5570f71d]       [a10dbcbd]
     <feat/pdr-sparse-support>       <benchmarks/alt/feat/pdr-sparse-support>
-         316±4ms         144±20ms     0.45  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'euclidean', 'parallel_on_Y')
-       28.8±0.4s       11.8±0.02s     0.41  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'euclidean', 'parallel_on_Y')
-      2.95±0.01s       1.21±0.04s     0.41  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'euclidean', 'auto')
-      28.9±0.07s       11.7±0.05s     0.41  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'euclidean', 'auto')
-         2.94±0s       1.19±0.03s     0.41  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'euclidean', 'parallel_on_Y')
-        301±10ms          121±5ms     0.40  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'euclidean', 'parallel_on_Y')
-      3.03±0.01s       1.19±0.02s     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'manhattan', 'parallel_on_Y')
-         294±8ms          115±6ms     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'euclidean', 'auto')
-       29.8±0.4s       11.6±0.05s     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'manhattan', 'parallel_on_Y')
-      29.9±0.02s       11.6±0.01s     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'manhattan', 'auto')
-         313±8ms         122±10ms     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'manhattan', 'parallel_on_Y')
-       3.08±0.2s       1.19±0.02s     0.39  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'manhattan', 'auto')
-        313±10ms          120±7ms     0.38  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'manhattan', 'auto')
-      28.1±0.01s        10.7±0.4s     0.38  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'manhattan', 'auto')
-       26.5±0.2s        10.0±0.1s     0.38  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'euclidean', 'parallel_on_Y')
-      2.69±0.03s       1.00±0.04s     0.37  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'euclidean', 'auto')
-      2.92±0.04s       1.08±0.03s     0.37  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'euclidean', 'parallel_on_Y')
-       27.3±0.2s       10.00±0.1s     0.37  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'euclidean', 'auto')
-      2.98±0.03s       1.08±0.06s     0.36  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'manhattan', 'auto')
-      3.01±0.03s       1.08±0.03s     0.36  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'manhattan', 'parallel_on_Y')
-      28.5±0.09s        9.92±0.1s     0.35  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'manhattan', 'parallel_on_Y')
-         316±5ms         106±20ms     0.34  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'euclidean', 'auto')
-         325±4ms         104±20ms     0.32  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'manhattan', 'parallel_on_Y')
-         327±5ms         103±20ms     0.32  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'manhattan', 'auto')
-        889±50ms          274±1ms     0.31  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'euclidean', 'parallel_on_X')
-        933±70ms         279±30ms     0.30  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'manhattan', 'auto')
-         234±5ms         63.0±4ms     0.27  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'euclidean', 'parallel_on_X')
-         1.51±0m        23.2±0.6s     0.26  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'euclidean', 'parallel_on_Y')
-      1.05±0.04s          256±7ms     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'euclidean', 'parallel_on_Y')
-         241±4ms         58.4±2ms     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'euclidean', 'parallel_on_X')
-      1.51±0.01m        21.7±0.7s     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'euclidean', 'auto')
-       8.83±0.2s       2.11±0.04s     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'euclidean', 'parallel_on_X')
-      1.04±0.03s          246±7ms     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'euclidean', 'auto')
-         245±5ms         58.0±6ms     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 10000, 100, 'manhattan', 'parallel_on_X')
-      1.08±0.01s         256±30ms     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'manhattan', 'parallel_on_Y')
-      1.58±0.01m        22.5±0.9s     0.24  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'manhattan', 'auto')
-       10.2±0.4s        2.35±0.2s     0.23  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'manhattan', 'parallel_on_Y')
-       9.24±0.5s        2.08±0.1s     0.23  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'manhattan', 'auto')
-      1.03±0.09s        231±0.8ms     0.23  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 100000, 100, 'manhattan', 'parallel_on_X')
-      1.59±0.04m        21.5±0.7s     0.22  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'manhattan', 'parallel_on_Y')
-       10.8±0.2s        2.41±0.1s     0.22  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'euclidean', 'auto')
-      1.51±0.03m        20.2±0.1s     0.22  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'euclidean', 'parallel_on_X')
-         254±3ms         55.8±3ms     0.22  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(1000, 1000, 100, 'manhattan', 'parallel_on_X')
-       9.44±0.1s       2.07±0.04s     0.22  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 100000, 100, 'manhattan', 'parallel_on_X')
-         2.22±0s          474±3ms     0.21  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'euclidean', 'parallel_on_X')
-         2.23±0s          473±4ms     0.21  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'euclidean', 'parallel_on_X')
-      22.1±0.01s       4.59±0.03s     0.21  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'euclidean', 'parallel_on_X')
-       10.3±0.3s       2.12±0.08s     0.21  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'euclidean', 'parallel_on_Y')
-      1.57±0.06m       19.3±0.02s     0.21  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 100000, 100, 'manhattan', 'parallel_on_X')
-      22.1±0.01s          4.51±0s     0.20  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'euclidean', 'parallel_on_X')
-         2.32±0s          445±5ms     0.19  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 1000, 100, 'manhattan', 'parallel_on_X')
-         2.32±0s          441±5ms     0.19  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(10000, 10000, 100, 'manhattan', 'parallel_on_X')
-      23.0±0.01s          4.22±0s     0.18  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 1000, 100, 'manhattan', 'parallel_on_X')
-      23.1±0.04s       4.22±0.01s     0.18  pairwise_distances_reductions.PairwiseDistancesReductionsBenchmark.time_PairwiseDistancesArgKmin(100000, 10000, 100, 'manhattan', 'parallel_on_X')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@jjerphan jjerphan marked this pull request as ready for review August 9, 2022 09:39
@jjerphan jjerphan changed the title MAINT Test second alternative for sparse-dense support MAINT Division-less alternative for sparse-dense support Aug 9, 2022
@jjerphan jjerphan changed the title MAINT Division-less alternative for sparse-dense support MAINT Division-less alternative for fused sparse-dense support Aug 9, 2022
Copy link

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks for the benchmark results.

Just a detail in the comment to fix below:

sklearn/metrics/_dist_metrics.pyx.tp Outdated Show resolved Hide resolved
sklearn/metrics/_dist_metrics.pyx.tp Outdated Show resolved Hide resolved
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan
Copy link
Owner Author

jjerphan commented Aug 10, 2022

I think that I would wait for @thomasjpfan's opinion before merge this PR as it's easier to review this diff independently.

Copy link

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the ping!

@thomasjpfan
Copy link

thomasjpfan commented Aug 11, 2022

BTW I think we can merge this directly into main as a performance improvement for sparse-dense.

@jjerphan
Copy link
Owner Author

jjerphan commented Aug 11, 2022

I merged a few commit in scikit-learn#23585, IIRC and we still need to perform benchmark against main (the previous ones were made between this alternative and the previous one).

So let's merge this first in this scikit-learn#23585 first.

@jjerphan jjerphan merged commit bd48ef0 into maint/pdr-sparse-support Aug 11, 2022
@jjerphan jjerphan deleted the alt/feat/pdr-sparse-support branch August 11, 2022 20:42
jjerphan added a commit that referenced this pull request Aug 22, 2022
To use the a more efficient implementation for
future callers implementations.

See: #15
jjerphan added a commit that referenced this pull request Aug 22, 2022
To use the a more efficient implementation for
future callers implementations.

See: #15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.