Ring Attention First Pass Implementation by pulitha13 · Pull Request #407 · foundation-model-stack/foundation-model-stack

pulitha13 · May 9, 2025

This is a first pass in using context parallelism for FMS based on the ring attention paper here. We are treating this pull request as a work in progress. We would be happy to make any changes or fixes as IBM sees fit

Known issues

We are not perfectly generating 1:1 output logits when compared to the default configuration of llama 7b. We aim to figure out why our results are deviating yet producing "close enough" output.

Future improvements:

Online softmaxing during ring attention

adding fixed blocksize implementation of ring attention

f0bd796

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Ring Attention First Pass Implementation#407

pulitha13 commented May 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Search code, repositories, users, issues, pull requests...

Comments

Conversation

pulitha13 commented May 9, 2025

Known issues

Future improvements:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant