Open
Description
As discussed in arrayfire-python #145, af::convolve is currently falling back to fft+ifft for huger array sizes even if the kernel is very small (e.g. 3x3x3). This results in a performance loss for small sized convolution kernels.
As a benchmark, I timed a convolution with a 3x3x3 kernel against a single fft and plotted the quotient over the number of elements per dimension nn:
For the code producing this plot please refer to the thread linked above.