Dear author, I have some question with metal programming about thread number and thread group number setting. I change the thredsPerGroup from (1,1,1) to (32,1,1), and change the threadGroups from
(1,1,1) to (number,1,1), where number is (vectorcount+31)/32. But I didn't see any change or improve at processing time. I wonder know did i do the right setting? thanks.