Simple C code optimisation examples: vectorisation, unroll loop, intrinsic ...
In these examples we compute the sum element by element of two 2D array (matrix) and store the result in a pre allocated matrix. We test different ways to compute these sums and benchmark the number of CPU cycles by element.
$prompt> ./buils.sh$prompt> ./run.sh