Open
Description
I would try to reproduce via a simple main.cpp with direct AF but these few lines make AFCUDA totally leaking at least with AF 3.6.2 (AF_MEM_DEBUG=1, no garbage collecting):
// left and right are just 3d arrays
const unsigned ns = left.dims(2); // usually 20 slices
for (unsigned z = 0; z < ns; ++z) {
o += af::matmul(left.slice(z), right.slice(z), transposeLeft ? AF_MAT_TRANS : AF_MAT_NONE, transposeRight ? AF_MAT_TRANS : AF_MAT_NONE);
}
when calling getMemInfo() before and after :
BEFORE : AllocBytes=5083372752 AllocBuffers=1391 LockBytes=5083372752 LockBuffers=1391
...
AFTER : AllocBytes=5104344272 AllocBuffers=1411 LockBytes=5104344272 LockBuffers=1411
AF has clearly created 20 buffers for the 20 slices and these will never been freed leading to OOM.