Open
Description
Hi
The perf of max vs ragged max is not as good as it was 2 years ago:
#2786
showing about the same time.
As today:
ArrayFire v3.8.2 (CUDA, 64-bit Linux, build a9b6b0e)
Platform: CUDA Runtime 11.4, Driver: 495.29.05
[0] Quadro RTX 3000, 5935 MB, CUDA Compute 7.5
Max vs ragged max
M max ragmax
8 0.00241453 0.00872827
16 0.0023666 0.00861896
32 0.00237074 0.00865682
64 0.00245299 0.0087447
128 0.00240028 0.00879951
256 0.00414862 0.00866826
512 0.00701668 0.0096099
Description
- Did you build ArrayFire yourself or did you use the official installers: myself
- Which backend is experiencing this issue? (CPU, CUDA, OpenCL): CUDA
- Do you have a workaround? no
- Can the bug be reproduced reliably on your system? yes
Reproducible Code
af::array *a, *b;
void doMax() {
af::max(*a, 0).eval();
}
void doRMax() {
af::array val;
af::array idx;
af::max(val, idx, *a, *b, 0);
}
void raggedMax(const double REPEAT = 20) {
BENCH("Max vs ragged max\n");
BENCH("M");
BENCH("max");
BENCH("ragmax");
BENCH(std::endl);
for (s = 8; s <= 512; s *= 2) {
BENCH(s);
a = new af::array(s, s);
BENCH(1000 * af::timeit(doMax));
af::array seqlen = af::constant((unsigned)s / 2, 1, s, u32);
b = &seqlen;
BENCH(1000 * af::timeit(doRMax));
delete a;
BENCH(std::endl);
}
}
System Information
ArrayFire Version: 3.8.2
Device: RTX
Operating System: ubuntu
Driver version: 495
Checklist
- [ X ] I have read timing ArrayFire