BUG: Matrix multiplication with integers not using the cache friendly loop order

Describe the issue:

I was comparing the performance of numpy and numba for matrix multiplication. I was using integers so that numpy doesn't use BLAS and the comparison would be more fair (both numpy and numba would only be using vectorization).
Based on my discussion with another user, it would seem that the loop order being used by @TYPE@_matmul_inner_noblas isn't the one that is most cache friendly.

Reproduce the code example:

import numpy as np
from numba import njit, prange

@njit
def matrix_multiplication(A, B):
  m, n = A.shape
  _, p = B.shape
  C = np.zeros((m, p))
  for i in range(m):
    for j in range(n):
      for k in range(p):
        C[i, k] += A[i, j] * B[j, k]
  return C

@njit
def matrix_multiplication2(A, B): # loop order equivalent to numpy
  m, n = A.shape
  _, p = B.shape
  C = np.zeros((m, p))
  for i in prange(m):
    for k in range(p):
      for j in range(n):
        C[i, k] += A[i, j] * B[j, k]
  return C

m = 1000
n = 1000
p = 1000
A = np.random.randint(1, 100, size=(m, n))
B = np.random.randint(1, 100, size=(n, p))

# compile function
matrix_multiplication(A, B)
matrix_multiplication2(A, B)


%timeit matrix_multiplication(A, B)
%timeit matrix_multiplication2(A, B)
%timeit A @ B 
# numpy is a little slower than matrix_multiplication but faster than matrix_multiplication2
# 1.62 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 2.48 s ± 157 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
# 2 s ± 37.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Error message:

No response

Runtime information:

1.23.5
3.10.0 (tags/v3.10.0:b494f59, Oct 4 2021, 19:00:18) [MSC v.1929 64 bit (AMD64)]

Context for the issue:

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Matrix multiplication with integers not using the cache friendly loop order #23260

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search code, repositories, users, issues, pull requests...

Uh oh!

BUG: Matrix multiplication with integers not using the cache friendly loop order #23260

Description

Describe the issue:

Reproduce the code example:

Error message:

Runtime information:

Context for the issue:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions