Add Granite Speech multimodal speech-to-text implementation#499
Open
gsmoon97 wants to merge 11 commits intofoundation-model-stack:mainfoundation-model-stack/foundation-model-stack:mainfrom
columbia-hpml-granite:granite-speech-devcolumbia-hpml-granite/foundation-model-stack:granite-speech-devCopy head branch name to clipboard
Open
Add Granite Speech multimodal speech-to-text implementation#499gsmoon97 wants to merge 11 commits intofoundation-model-stack:mainfoundation-model-stack/foundation-model-stack:mainfrom columbia-hpml-granite:granite-speech-devcolumbia-hpml-granite/foundation-model-stack:granite-speech-devCopy head branch name to clipboard
gsmoon97 wants to merge 11 commits intofoundation-model-stack:mainfoundation-model-stack/foundation-model-stack:mainfrom
columbia-hpml-granite:granite-speech-devcolumbia-hpml-granite/foundation-model-stack:granite-speech-devCopy head branch name to clipboard
Conversation
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: Aneesh Durai <126147060+aneeshdurai@users.noreply.github.com> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Co-authored-by: Zachary Zusin <zacharyzusin@gmail.com> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
Co-authored-by: In Keun Kim <ik2619@columbia.edu> Signed-off-by: Geonsik Moon <gsmoon97@gmail.com>
0cc74e0 to
50080ac
Compare
Signed-off-by: In Keun Kim <ik2619@columbia.edu>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add Granite Speech multimodal speech-to-text implementation
This PR integrates IBM's Granite Speech 3.3 model into FMS, developed as part of a
Columbia University course project (COMSE6998 High Performance Machine Learning) in collaboration with IBM Research.
Components
Integration
utils.pyFeatures
Testing & Validation
notebooks/granite_speech_inference.ipynb) demonstrating real audio transcription with LibriSpeech datasetReview Notes
We appreciate your time reviewing this contribution. All tests pass on CPU, with GPU-dependent equivalence tests marked for CUDA environments. The implementation follows FMS conventions and includes inline comments for maintainability.
Please let us know if any changes are needed or if you have questions about the implementation approach.
For more details about the project, please refer here.
Acknowledgements
Special thanks to our mentors @rzbhatti and @kaoutar55 from IBM Research for their invaluable guidance, technical insights, and support throughout this project. Their expertise in FMS architecture and multimodal models was instrumental in achieving a production-ready implementation.
Team:
Course: COMSE6998 High Performance Machine Learning (Fall 2025)
Supervisors: