Inject custom attention op into MultiHeadAttention by JRosenkranz · Pull Request #408 · foundation-model-stack/foundation-model-stack

JRosenkranz · May 12, 2025

This PR removes the assumption that fms using SDPA as the backend for MultiHeadAttention by introducing an AttentionKwargs TypedDict construct to forward. It has complete backwards compatibility with the prior API as mask and attn_algorithm are part of the SDPAAttentionKwargs (extended AttentionKwargs) which can be passed in the same by name.

Within attention, the attention implementation is chosen based on a get_attention_type call (providing the AttentionKwargs input). Attention types can be registered through the register_attention_op method (This will give other repos including fms the ability to introduce their own attention type)

…on SDPA Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

fms/models/roberta.py

ani300 · May 16, 2025

fms/utils/generation.py

    mask = torch.where(mask.logical_not(), -torch.inf, 0.0)
+
    padding_kwargs["mask"] = mask
+    # FIXME: this method should be per attn type (for now default it)


seems like it is already dependent on attn type?

yes, I think my thoughts here was we could eventually add an attn_name param to pad_input_ids which will provide you with the right prefill padding metadata per attention type. Did not include that as part of this PR

fms/modules/attention.py

ani300

mostly lgtm, a few more comments documenting the API better and some variables renaming will bring it to a good place

Signed-off-by: Joshua Rosenkranz <jmrosenk@us.ibm.com>

JRosenkranz requested a review from ani300 May 12, 2025 18:30

JRosenkranz mentioned this pull request May 12, 2025

Inject custom attention op into MultiHeadAttention #396

Closed

JRosenkranz marked this pull request as ready for review May 15, 2025 02:20

JRosenkranz requested review from andrea-fasoli and lchu6 May 15, 2025 02:20

JRosenkranz mentioned this pull request May 16, 2025

Only include attn_algorithm in forward if specified foundation-model-stack/aiu-fms-testing-utils#51

Merged

Refactor attention interfaces to be more generic and not only depend …

a9f34a3

…on SDPA Co-authored-by: Joshua Rosenkranz <jmrosenk@us.ibm.com> Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>

ani300 force-pushed the paged_attn_mock_api_change2 branch from 143bb1b to a9f34a3 Compare May 16, 2025 13:53

ani300 added 2 commits May 16, 2025 13:56

Merge branch 'main' into paged_attn_mock_api_change2

20b1dce

Re-create roberta expectations

84c7697

Signed-off-by: Antoni Viros i Martin <aviros@ibm.com>