(perf) generate_mask functions optimizations #3203
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request Template
Checklist
cargo run-checks
command has been executed.There were 8 failing tests on my system (Asus PN50-E1, Ryzen 4700U) which seem unrelated to these changes
run-checks errors
Related Issues/PRs
No particular issue, this as it is a simple performance related PR
Changes
There are 2 perf changes:
generate_autoregressive_mask
: as per the TODO, use triangular tensors and expand instead of manually populating the maskgenerate_padding_mask
: avoidsplit_off
calls and instead only take up tomax_size
elements when iteratingAs explained in the commit messages, I am no expert in terms of code generation (for the autoregressive change). I don't really know if it ends up being better or not.
Testing
I have mostly generated a simple script to confirm that the tensors stay are the same.