BUG: matmul (@ overload) and dot significant performance differences for non-contiguous arrays (noblas)

Describe the issue:

The implementation of matmul does some basic checks to verify if the array can be passed to gemm and if it deems that this is not posssible it will fallback on a noblas routine which has no regard for memory cache resulting in ~100 slowdowns. Here is the offending bit of source code.

dot on the other hand is a lot more flexible, attempting to make a copy before passing the array to the blas routine. For small arrays the difference is not significant but for large arrays this results in much better performance. This behavior is explicitly seen in the source code (line 234 and 243 are the bad-stride copies) and can be confirmed by profiling as shown by hpaulj.

Neither of the docs pages for dot or matmul reference this behavior. On the contrary the dot page states: If both a and b are 2-D arrays, it is matrix multiplication, but using matmul or a @ b is preferred.

There are a handful of similar issues open:

I am not sure what the best fix is. Updating the docs would be a good start. Perhaps matmul should try to copy too? at least for large arrays...

Reproduce the code example:

import numpy as np
from timeit import timeit
N = 2600
xx = np.random.randn(N, N) 
yy = np.random.randn(N, N) 

x = xx[::2, ::2]
y = yy[::2, ::2]
assert np.shares_memory(x, xx)
assert np.shares_memory(y, yy)

dot = timeit('np.dot(x,y)', number = 10, globals = globals())
matmul = timeit('np.matmul(x,y)', number = 10, globals = globals())

print('time for np.matmul: ', matmul)
print('time for np.dot: ', dot)

Error message:

time for np.matmul:  29.04214559996035
time for np.dot:  0.2680714999441989

Runtime information:

> import sys, numpy; print(numpy.__version__); print(sys.version)
1.24.2
3.11.2 (tags/v3.11.2:878ead1, Feb  7 2023, 16:38:35) [MSC v.1934 64 bit (AMD64)]
> print(numpy.show_runtime())
[{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
                      'found': ['SSSE3',
                                'SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2'],
                      'not_found': ['AVX512F',
                                    'AVX512CD',
                                    'AVX512_SKX',
                                    'AVX512_CLX',
                                    'AVX512_CNL',
                                    'AVX512_ICL']}},
 {'architecture': 'Prescott',
  'filepath': 'C:\\Users\\______\\AppData\\Local\\Programs\\Python\\Python311\\Lib\\site-packages\\numpy\\.libs\\libopenblas64__v0.3.21-gcc_10_3_0.dll',
  'internal_api': 'openblas',
  'num_threads': 24,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.21'}]
None

Context for the issue:

The scientific computing community regularly uses the @ shorthand on large arrays. Developers expect that with Numpy they do not need to think about CPU cache and expect operations to run close to CPU limits rather than memory bandwidth limits.

At the very least a warning should be provided to users such that they know why their code is running orders of magnitude slower than expected.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: matmul (@ overload) and dot significant performance differences for non-contiguous arrays (noblas) #23588

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

BUG: matmul (@ overload) and dot significant performance differences for non-contiguous arrays (noblas) #23588

Description

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions